Efficient bulk load via JDBC?
Hello, Is there (or is there a plan for) efficient bulk load via JDBC? (I'm currently running Oct2014 and JDBC 2.13) I looked at: https://www.monetdb.org/book/export/html/340 As far as I understand, the most efficient way currently available in JDBC is to use batches of inserts via prepared statements (with autocommit off). Unfortunately this does not get even close to be fast enough. The speed of a COPY INTO is what I am looking for. But, this is not supported via JDBC. Is there a specific reason, or is it simply not implemented? I do know about the workaround of performing the COPY INTO via mapi protocol (like in http://dev.monetdb.org/hg/MonetDB/file/tip/java/example/SQLcopyinto.java). This provides good speed indeed. However, it is not transaction-safe. When I use this method, and some SQL transaction happens to read (only read!) from the same tables at the same time, then I often get data corruption: the COPY INTO seems to end well (and the data gets actually in place), but new SQL queries on those tables fail with a "BATproject: does not match always" GDK error (checking with gdb I see that the right side of a fetchjoin has count 0). Any idea?
Roberto, I have used COPY INTO via JDBC all the time. The caveats are the file must be on the server and it must use monetdb as the user(although I remember a recent thread about maybe changing that). You can look at the ETL tool Pentaho Data Integration for “built in” bulk load steps (which generate a copy into under the covers.) Bryan *From:* users-list [mailto:users-list-bounces+monetdb=openbi.com@monetdb.org] *On Behalf Of *Roberto Cornacchia *Sent:* Wednesday, April 29, 2015 11:35 AM *To:* Communication channel for MonetDB users *Subject:* Efficient bulk load via JDBC? Hello, Is there (or is there a plan for) efficient bulk load via JDBC? (I'm currently running Oct2014 and JDBC 2.13) I looked at: https://www.monetdb.org/book/export/html/340 As far as I understand, the most efficient way currently available in JDBC is to use batches of inserts via prepared statements (with autocommit off). Unfortunately this does not get even close to be fast enough. The speed of a COPY INTO is what I am looking for. But, this is not supported via JDBC. Is there a specific reason, or is it simply not implemented? I do know about the workaround of performing the COPY INTO via mapi protocol (like in http://dev.monetdb.org/hg/MonetDB/file/tip/java/example/SQLcopyinto.java). This provides good speed indeed. However, it is not transaction-safe. When I use this method, and some SQL transaction happens to read (only read!) from the same tables at the same time, then I often get data corruption: the COPY INTO seems to end well (and the data gets actually in place), but new SQL queries on those tables fail with a "BATproject: does not match always" GDK error (checking with gdb I see that the right side of a fetchjoin has count 0). Any idea? -- ------------------------------ This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
Hi Bryan,
Thanks for your input.
Yes, you are right, I omitted to that detail. I was referring to a
"streamed" bulk load. Loading from a local file on the server is not really
an option for us.
On 29 April 2015 at 17:39, Bryan Senseman
Roberto,
I have used COPY INTO via JDBC all the time. The caveats are the file must be on the server and it must use monetdb as the user(although I remember a recent thread about maybe changing that).
You can look at the ETL tool Pentaho Data Integration for “built in” bulk load steps (which generate a copy into under the covers.)
Bryan
*From:* users-list [mailto:users-list-bounces+monetdb= openbi.com@monetdb.org] *On Behalf Of *Roberto Cornacchia *Sent:* Wednesday, April 29, 2015 11:35 AM *To:* Communication channel for MonetDB users *Subject:* Efficient bulk load via JDBC?
Hello,
Is there (or is there a plan for) efficient bulk load via JDBC? (I'm currently running Oct2014 and JDBC 2.13)
I looked at: https://www.monetdb.org/book/export/html/340
As far as I understand, the most efficient way currently available in JDBC is to use batches of inserts via prepared statements (with autocommit off).
Unfortunately this does not get even close to be fast enough.
The speed of a COPY INTO is what I am looking for. But, this is not supported via JDBC. Is there a specific reason, or is it simply not implemented?
I do know about the workaround of performing the COPY INTO via mapi protocol (like in http://dev.monetdb.org/hg/MonetDB/file/tip/java/example/SQLcopyinto.java). This provides good speed indeed.
However, it is not transaction-safe. When I use this method, and some SQL transaction happens to read (only read!) from the same tables at the same time, then I often get data corruption: the COPY INTO seems to end well (and the data gets actually in place), but new SQL queries on those tables fail with a "BATproject: does not match always" GDK error (checking with gdb I see that the right side of a fetchjoin has count 0).
Any idea?
------------------------------
This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi,
On 29 Apr 2015, at 17:39, Bryan Senseman
wrote: However, it is not transaction-safe. When I use this method, and some SQL transaction happens to read (only read!) from the same tables at the same time, then I often get data corruption: the COPY INTO seems to end well (and the data gets actually in place), but new SQL queries on those tables fail with a "BATproject: does not match always" GDK error (checking with gdb I see that the right side of a fetchjoin has count 0).
If this is an issue and you can reproduce it, could you please file a bug report. This should not happen. Hannes
Hannes,
That GDK error is an issue that keeps popping up since we switched to
Oct2014 (never occurred in Feb2013) and is definitely related to /
triggered from concurrency.
Unfortunately it is really hard to make it reproducible for others, as it
happens at the end of a long ETL process, and not all the time.
The best I could do is to provide a gdb session ready where the error
occurs. But I doubt there is a problem where the error occurs, it seems it
simply stops there because at some previous moment data corruption occurred.
So, really hard to debug.
It seems related to this
https://www.monetdb.org/bugzilla/show_bug.cgi?id=3450 (as I had commented
last year). The bug is marked as fixed, though I don't see any detail about
the fix.
Roberto
On 29 April 2015 at 17:50, Hannes Mühleisen
Hi,
On 29 Apr 2015, at 17:39, Bryan Senseman
wrote: However, it is not transaction-safe. When I use this method, and some SQL transaction happens to read (only read!) from the same tables at the same time, then I often get data corruption: the COPY INTO seems to end well (and the data gets actually in place), but new SQL queries on those tables fail with a "BATproject: does not match always" GDK error (checking with gdb I see that the right side of a fetchjoin has count 0).
If this is an issue and you can reproduce it, could you please file a bug report. This should not happen.
Hannes
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Bryan Senseman
-
Hannes Mühleisen
-
Roberto Cornacchia