How to perform copy into in multiple clients at the same time?
Hi, I'm new to MonetDB and I think this is a great project! I have a problem the same as this one: https://stackoverflow.com/questions/9179265/how-would-monetdb-perform-with-m... , namely copy data into tables from multiple clients at the same time. As the stackoverflow question suggests, MonetDB uses Optimistic Concurrency Control, but the behaviour is different from my expectation. For example, say I have a table `t` without any constraints, and I run the following queries: client1 client2 ------------------------------------------------------------------- start transaction; insert into t values (1); start transaction; insert into t values (2); commit; commit; I would expect both commits succeed since the rows to be inserted do not conflict with each other and do not change original rows in the table. And this is the behaviour of some databases with OCC, e.g. TiDB (a MySQL compatible database). But MonetDB would rollback the second transaction. So my question is: is it possible to change this behaviour by modifying the code? If so, could you give me some guidance or suggestion? Thanks in advance! Best Regards, Yinjie Lin
On 27 Dec 2018, at 07:59, Yinjie Lin
wrote: Hi,
I'm new to MonetDB and I think this is a great project!
Hello Yinjie, Thanks for using MonetDB.
I have a problem the same as this one: https://stackoverflow.com/questions/9179265/how-would-monetdb-perform-with-m... , namely copy data into tables from multiple clients at the same time.
As the stackoverflow question suggests, MonetDB uses Optimistic Concurrency Control, but the behaviour is different from my expectation. For example, say I have a table `t` without any constraints, and I run the following queries:
client1 client2 ------------------------------------------------------------------- start transaction; insert into t values (1); start transaction; insert into t values (2); commit; commit;
I would expect both commits succeed since the rows to be inserted do not conflict with each other and do not change original rows in the table. And this is the behaviour of some databases with OCC, e.g. TiDB (a MySQL compatible database). But MonetDB would rollback the second transaction.
This is because MonetDB doesn’t allow concurrent updates on the same table. Here is some more information: https://www.monetdb.org/blog/monetdb-sql-transaction-management-scheme
So my question is: is it possible to change this behaviour by modifying the code? If so, could you give me some guidance or suggestion?
Changing this behaviour basically means changing the whole transaction manager of MonetDB... Our recommendation is simply don’t do this. The best is to serialise the inserts at the client side. Such concurrency will give you nothing but unnecessary resource contention. BTW, if data loading speed is important for what you want to do, please try to avoid using INSERT INTO. COPY INTO is a much faster alternative, and it is highly parallelised internally (which renders concurrent COPY INTO unnecessary). Regards, Jennie
Thanks in advance!
Best Regards, Yinjie Lin _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
Hi Jennie,
Thanks for your detailed explanation.
Actually I'm using Mapi to copy data and send queries to MonetDB. I
send `COPY INTO ... FROM STDIN` to the server to transfer data.
I know COPY INTO is much faster, but it requires the file to be local
on the server. Currently I can only send data from the clients, and
this
is why I want to do it through multiple connections. I hope it can
speed up the copy process.
I'm trying to hack the code so that some transaction checks can be
ignored, though it is not easy...
Another question is that I'm afraid, for example, the low-level
storage of data does not support concurrent copy into one table even
if I changed the code.
So is that possible theoretically?
Regards,
Yinjie Lin
On Thu, Dec 27, 2018 at 10:33 PM Ying Zhang
On 27 Dec 2018, at 07:59, Yinjie Lin
wrote: Hi,
I'm new to MonetDB and I think this is a great project!
Hello Yinjie,
Thanks for using MonetDB.
I have a problem the same as this one: https://stackoverflow.com/questions/9179265/how-would-monetdb-perform-with-m... , namely copy data into tables from multiple clients at the same time.
As the stackoverflow question suggests, MonetDB uses Optimistic Concurrency Control, but the behaviour is different from my expectation. For example, say I have a table `t` without any constraints, and I run the following queries:
client1 client2 ------------------------------------------------------------------- start transaction; insert into t values (1); start transaction; insert into t values (2); commit; commit;
I would expect both commits succeed since the rows to be inserted do not conflict with each other and do not change original rows in the table. And this is the behaviour of some databases with OCC, e.g. TiDB (a MySQL compatible database). But MonetDB would rollback the second transaction.
This is because MonetDB doesn’t allow concurrent updates on the same table. Here is some more information: https://www.monetdb.org/blog/monetdb-sql-transaction-management-scheme
So my question is: is it possible to change this behaviour by modifying the code? If so, could you give me some guidance or suggestion?
Changing this behaviour basically means changing the whole transaction manager of MonetDB... Our recommendation is simply don’t do this. The best is to serialise the inserts at the client side. Such concurrency will give you nothing but unnecessary resource contention.
BTW, if data loading speed is important for what you want to do, please try to avoid using INSERT INTO. COPY INTO is a much faster alternative, and it is highly parallelised internally (which renders concurrent COPY INTO unnecessary).
Regards, Jennie
Thanks in advance!
Best Regards, Yinjie Lin _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
Yinjie,
Have you looked into this?
https://dev.monetdb.org/hg/monetdb-java/file/tip/tests/SQLcopyinto.java
It shows how, from Java, you can use mapi to perform a COPY INTO from stdin
(so the file stays on the client side).
I use this approach on production code, it is pretty fast.
Also, this may be of interest to you. I have not tried it, but it sounds
like it allows a COPY INTO from a file on the client side:
https://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=0b4846d2089c
Lastly, I most definitely second Jennie's suggestion to not pursue the way
of disabling transaction checks. It is most certainly a recipe for
disaster. I say this as an avid user who has known the internals for 15
years.
Best, Roberto
On Thu, 27 Dec 2018 at 16:13 Yinjie Lin
Hi Jennie,
Thanks for your detailed explanation.
Actually I'm using Mapi to copy data and send queries to MonetDB. I send `COPY INTO ... FROM STDIN` to the server to transfer data. I know COPY INTO is much faster, but it requires the file to be local on the server. Currently I can only send data from the clients, and this is why I want to do it through multiple connections. I hope it can speed up the copy process.
I'm trying to hack the code so that some transaction checks can be ignored, though it is not easy... Another question is that I'm afraid, for example, the low-level storage of data does not support concurrent copy into one table even if I changed the code. So is that possible theoretically?
Regards, Yinjie Lin
On Thu, Dec 27, 2018 at 10:33 PM Ying Zhang
wrote: On 27 Dec 2018, at 07:59, Yinjie Lin
wrote: Hi,
I'm new to MonetDB and I think this is a great project!
Hello Yinjie,
Thanks for using MonetDB.
I have a problem the same as this one:
, namely copy data into tables from multiple clients at the same time.
As the stackoverflow question suggests, MonetDB uses Optimistic Concurrency Control, but the behaviour is different from my expectation. For example, say I have a table `t` without any constraints, and I run the following queries:
client1 client2 ------------------------------------------------------------------- start transaction; insert into t values (1); start transaction; insert into t values (2); commit; commit;
I would expect both commits succeed since the rows to be inserted do not conflict with each other and do not change original rows in the table. And this is the behaviour of some databases with OCC, e.g. TiDB (a MySQL compatible database). But MonetDB would rollback the second transaction.
This is because MonetDB doesn’t allow concurrent updates on the same
https://www.monetdb.org/blog/monetdb-sql-transaction-management-scheme
So my question is: is it possible to change this behaviour by modifying the code? If so, could you give me some guidance or suggestion?
Changing this behaviour basically means changing the whole transaction manager of MonetDB... Our recommendation is simply don’t do this. The best is to serialise
Such concurrency will give you nothing but unnecessary resource contention.
BTW, if data loading speed is important for what you want to do, please
https://stackoverflow.com/questions/9179265/how-would-monetdb-perform-with-m... table. Here is some more information: the inserts at the client side. try to avoid using INSERT INTO.
COPY INTO is a much faster alternative, and it is highly parallelised internally (which renders concurrent COPY INTO unnecessary).
Regards, Jennie
Thanks in advance!
Best Regards, Yinjie Lin _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
Hi Roberto,
Thanks for your reply. What you provided is really helpful, and I gave
up the idea of hacking the code. Then I came up with a workaround: I
create a merge table with multiple partition tables, and data are
copied into partition tables so I can use multiple connections
simultaneously. Although it makes things a bit complicated, but it is
better!
Regards,
Yinjie Lin
On Fri, Dec 28, 2018 at 12:04 AM Roberto Cornacchia
Yinjie,
Have you looked into this? https://dev.monetdb.org/hg/monetdb-java/file/tip/tests/SQLcopyinto.java
It shows how, from Java, you can use mapi to perform a COPY INTO from stdin (so the file stays on the client side). I use this approach on production code, it is pretty fast.
Also, this may be of interest to you. I have not tried it, but it sounds like it allows a COPY INTO from a file on the client side: https://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=0b4846d2089c
Lastly, I most definitely second Jennie's suggestion to not pursue the way of disabling transaction checks. It is most certainly a recipe for disaster. I say this as an avid user who has known the internals for 15 years.
Best, Roberto
On Thu, 27 Dec 2018 at 16:13 Yinjie Lin
wrote: Hi Jennie,
Thanks for your detailed explanation.
Actually I'm using Mapi to copy data and send queries to MonetDB. I send `COPY INTO ... FROM STDIN` to the server to transfer data. I know COPY INTO is much faster, but it requires the file to be local on the server. Currently I can only send data from the clients, and this is why I want to do it through multiple connections. I hope it can speed up the copy process.
I'm trying to hack the code so that some transaction checks can be ignored, though it is not easy... Another question is that I'm afraid, for example, the low-level storage of data does not support concurrent copy into one table even if I changed the code. So is that possible theoretically?
Regards, Yinjie Lin
On Thu, Dec 27, 2018 at 10:33 PM Ying Zhang
wrote: On 27 Dec 2018, at 07:59, Yinjie Lin
wrote: Hi,
I'm new to MonetDB and I think this is a great project!
Hello Yinjie,
Thanks for using MonetDB.
I have a problem the same as this one: https://stackoverflow.com/questions/9179265/how-would-monetdb-perform-with-m... , namely copy data into tables from multiple clients at the same time.
As the stackoverflow question suggests, MonetDB uses Optimistic Concurrency Control, but the behaviour is different from my expectation. For example, say I have a table `t` without any constraints, and I run the following queries:
client1 client2 ------------------------------------------------------------------- start transaction; insert into t values (1); start transaction; insert into t values (2); commit; commit;
I would expect both commits succeed since the rows to be inserted do not conflict with each other and do not change original rows in the table. And this is the behaviour of some databases with OCC, e.g. TiDB (a MySQL compatible database). But MonetDB would rollback the second transaction.
This is because MonetDB doesn’t allow concurrent updates on the same table. Here is some more information: https://www.monetdb.org/blog/monetdb-sql-transaction-management-scheme
So my question is: is it possible to change this behaviour by modifying the code? If so, could you give me some guidance or suggestion?
Changing this behaviour basically means changing the whole transaction manager of MonetDB... Our recommendation is simply don’t do this. The best is to serialise the inserts at the client side. Such concurrency will give you nothing but unnecessary resource contention.
BTW, if data loading speed is important for what you want to do, please try to avoid using INSERT INTO. COPY INTO is a much faster alternative, and it is highly parallelised internally (which renders concurrent COPY INTO unnecessary).
Regards, Jennie
Thanks in advance!
Best Regards, Yinjie Lin _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
Look at [1] to see how you can do COPY INTO ... FROM STDIN with a CSV file on the client side. In the next feature release (no schedule yet), there will be a possibility to indicate that the file to be loaded is on the client side (COPY INTO t FROM 'some file' ON CLIENT). This feature is implemented in mclient (in that upcoming feature release) but not in any other clients. [1] https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData On 28/12/2018 07.25, Yinjie Lin wrote:
Hi Roberto,
Thanks for your reply. What you provided is really helpful, and I gave up the idea of hacking the code. Then I came up with a workaround: I create a merge table with multiple partition tables, and data are copied into partition tables so I can use multiple connections simultaneously. Although it makes things a bit complicated, but it is better!
Regards, Yinjie Lin
On Fri, Dec 28, 2018 at 12:04 AM Roberto Cornacchia
wrote: Yinjie,
Have you looked into this? https://dev.monetdb.org/hg/monetdb-java/file/tip/tests/SQLcopyinto.java
It shows how, from Java, you can use mapi to perform a COPY INTO from stdin (so the file stays on the client side). I use this approach on production code, it is pretty fast.
Also, this may be of interest to you. I have not tried it, but it sounds like it allows a COPY INTO from a file on the client side: https://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=0b4846d2089c
Lastly, I most definitely second Jennie's suggestion to not pursue the way of disabling transaction checks. It is most certainly a recipe for disaster. I say this as an avid user who has known the internals for 15 years.
Best, Roberto
On Thu, 27 Dec 2018 at 16:13 Yinjie Lin
wrote: Hi Jennie,
Thanks for your detailed explanation.
Actually I'm using Mapi to copy data and send queries to MonetDB. I send `COPY INTO ... FROM STDIN` to the server to transfer data. I know COPY INTO is much faster, but it requires the file to be local on the server. Currently I can only send data from the clients, and this is why I want to do it through multiple connections. I hope it can speed up the copy process.
I'm trying to hack the code so that some transaction checks can be ignored, though it is not easy... Another question is that I'm afraid, for example, the low-level storage of data does not support concurrent copy into one table even if I changed the code. So is that possible theoretically?
Regards, Yinjie Lin
On Thu, Dec 27, 2018 at 10:33 PM Ying Zhang
wrote: On 27 Dec 2018, at 07:59, Yinjie Lin
wrote: Hi,
I'm new to MonetDB and I think this is a great project!
Hello Yinjie,
Thanks for using MonetDB.
I have a problem the same as this one: https://stackoverflow.com/questions/9179265/how-would-monetdb-perform-with-m... , namely copy data into tables from multiple clients at the same time.
As the stackoverflow question suggests, MonetDB uses Optimistic Concurrency Control, but the behaviour is different from my expectation. For example, say I have a table `t` without any constraints, and I run the following queries:
client1 client2 ------------------------------------------------------------------- start transaction; insert into t values (1); start transaction; insert into t values (2); commit; commit;
I would expect both commits succeed since the rows to be inserted do not conflict with each other and do not change original rows in the table. And this is the behaviour of some databases with OCC, e.g. TiDB (a MySQL compatible database). But MonetDB would rollback the second transaction.
This is because MonetDB doesn’t allow concurrent updates on the same table. Here is some more information: https://www.monetdb.org/blog/monetdb-sql-transaction-management-scheme
So my question is: is it possible to change this behaviour by modifying the code? If so, could you give me some guidance or suggestion?
Changing this behaviour basically means changing the whole transaction manager of MonetDB... Our recommendation is simply don’t do this. The best is to serialise the inserts at the client side. Such concurrency will give you nothing but unnecessary resource contention.
BTW, if data loading speed is important for what you want to do, please try to avoid using INSERT INTO. COPY INTO is a much faster alternative, and it is highly parallelised internally (which renders concurrent COPY INTO unnecessary).
Regards, Jennie
Thanks in advance!
Best Regards, Yinjie Lin _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
-- Sjoerd Mullender
participants (4)
-
Roberto Cornacchia
-
Sjoerd Mullender
-
Ying Zhang
-
Yinjie Lin