Re: Implementing bulk export to Parquet

13 Jul 2020

      Maybe the easiest would be to use the COPY INTO ON CLIENT interface and
do all the encoding on the client side.
In mclient you can use something like
COPY (query) INTO 'some file' ON CLIENT;

What this does is send the data to the client which is the responsible
for writing the file.  The client gets the name of the file from the server.

This is currently only implemented for mclient using the mapi.c library.
 But this could also be implemented in other front ends.  And of course,
there is nothing preventing the front end from formatting the data as
something other than CSV.

The client tells the server during initial connect that it is capable of
doing this, and when time comes, the server tells the client that it
needs to write a file and gives it the file name (the string from the
COPY command) and the data.

On 11/07/2020 15.26, Daniel Glöckner wrote:
...
Hi,
the project which I'm involved in would require us to integrate MonetDB
with "big data technologies" so exporting data from MonetDB to Parquet
for further processing in the big data tools would be an obvious choice.
Can anyone guide me what it would mean to extend MonetDB with a writer
for Parquet?
It would be awesome if COPY INTO would be able to produce Parquet
efficiently, e.g. using Arrow
library: https://arrow.apache.org/docs/cpp/parquet.html
Kind regards,
Daniel
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
-- 
Sjoerd Mullender