Dear all, I am currently testing an application running on MonetDB that experiences intermittent errors when under high concurrency. I believe these are aborted transactions (the mapi error code is -4). The applications loads a small input table using the mapi interface, runs two queries internally (including writing to an intermediate table) and then reads the result set. I have tried to produce a similar minimal example using the Python API which does produce concurrency errors and I am wondering what the correct thing to do on the client side is. Should I simply resubmit the failed query? The Python script used is shown below: import monetdb.sql import sys from multiprocessing import Process import uuid def f(): dbh = monetdb.sql.Connection(port=int(sys.argv[1]),database=sys.argv[2],hostname=sys.argv[3],autocommit=True) for i in range (0,20): tablename = "input_"+str(uuid.uuid1()).replace("-","_") cursor = dbh.cursor() #cursor.execute('create local temporary table %s (term int, p float) on commit preserve rows;' % tablename) cursor.execute('declare table %s (term int, p float);' % tablename) cursor.execute('insert into %s values (1,0.1);' % tablename) cursor.execute('select * from %s;' % tablename) print(cursor.fetchall()) cursor.execute('drop table %s;' % tablename) dbh.close() procs=[] for t in range(0,int(sys.argv[4])): p = Process(target=f) procs.append(p) p.start() for proc in procs: proc.join()? Running this locally with "python2.7 concurrency.py 50000 test_db 127.0.0.1 10" results in a number of error messages like: ?ProgrammingError: 40000!COMMIT: transaction is aborted because of concurency conflicts, will ROLLBACK instead What is the right thing to do? It seems that anything that writes to the db even when they are separate tables causes these messages. Best regards and thanks, Alastair
Hi Alastair,
unfortunately you cannot do this the way you want in MonetDB, since it
is targeting OLAP workloads, where concurrency is not a dimension to
optimize. Thus, MonetDB implements an optimistic concurrency control,
just to offer the ACIDity properties that one expects from a DBMS.
Even though each process writes to a separate table, create queries
still have to update the catalog, and I think this is where you get
the conflict.
In summary, try to serialize in the client-side. :)
Hope that helps,
Babis
On Tue, Apr 8, 2014 at 6:01 PM, Alastair McKinley
Dear all,
I am currently testing an application running on MonetDB that experiences intermittent errors when under high concurrency.
I believe these are aborted transactions (the mapi error code is -4).
The applications loads a small input table using the mapi interface, runs two queries internally (including writing to an intermediate table) and then reads the result set.
I have tried to produce a similar minimal example using the Python API which does produce concurrency errors and I am wondering what the correct thing to do on the client side is. Should I simply resubmit the failed query?
The Python script used is shown below:
import monetdb.sql import sys from multiprocessing import Process import uuid
def f(): dbh = monetdb.sql.Connection(port=int(sys.argv[1]),database=sys.argv[2],hostname=sys.argv[3],autocommit=True) for i in range (0,20): tablename = "input_"+str(uuid.uuid1()).replace("-","_") cursor = dbh.cursor() #cursor.execute('create local temporary table %s (term int, p float) on commit preserve rows;' % tablename) cursor.execute('declare table %s (term int, p float);' % tablename) cursor.execute('insert into %s values (1,0.1);' % tablename) cursor.execute('select * from %s;' % tablename) print(cursor.fetchall()) cursor.execute('drop table %s;' % tablename)
dbh.close()
procs=[] for t in range(0,int(sys.argv[4])): p = Process(target=f) procs.append(p) p.start()
for proc in procs: proc.join()
Running this locally with "python2.7 concurrency.py 50000 test_db 127.0.0.1 10" results in a number of error messages like:
ProgrammingError: 40000!COMMIT: transaction is aborted because of concurency conflicts, will ROLLBACK instead
What is the right thing to do? It seems that anything that writes to the db even when they are separate tables causes these messages.
Best regards and thanks,
Alastair
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Babis,
Thanks for the response. I thought that there might have been some mechanism that I was missing.
I'm wondering though, how would serialising the client side scale to multiple clients?
My use case is essentially a join + grouped aggregate of a large read only table (10s-100s of millions of rows) with a small table unique to that session (100s of rows).
I also saw that is not just create queries that fail, also insert into queries. Does that make sense?
Best regards,
Alastair
________________________________________
From: users-list
Dear all,
I am currently testing an application running on MonetDB that experiences intermittent errors when under high concurrency.
I believe these are aborted transactions (the mapi error code is -4).
The applications loads a small input table using the mapi interface, runs two queries internally (including writing to an intermediate table) and then reads the result set.
I have tried to produce a similar minimal example using the Python API which does produce concurrency errors and I am wondering what the correct thing to do on the client side is. Should I simply resubmit the failed query?
The Python script used is shown below:
import monetdb.sql import sys from multiprocessing import Process import uuid
def f(): dbh = monetdb.sql.Connection(port=int(sys.argv[1]),database=sys.argv[2],hostname=sys.argv[3],autocommit=True) for i in range (0,20): tablename = "input_"+str(uuid.uuid1()).replace("-","_") cursor = dbh.cursor() #cursor.execute('create local temporary table %s (term int, p float) on commit preserve rows;' % tablename) cursor.execute('declare table %s (term int, p float);' % tablename) cursor.execute('insert into %s values (1,0.1);' % tablename) cursor.execute('select * from %s;' % tablename) print(cursor.fetchall()) cursor.execute('drop table %s;' % tablename)
dbh.close()
procs=[] for t in range(0,int(sys.argv[4])): p = Process(target=f) procs.append(p) p.start()
for proc in procs: proc.join()
Running this locally with "python2.7 concurrency.py 50000 test_db 127.0.0.1 10" results in a number of error messages like:
ProgrammingError: 40000!COMMIT: transaction is aborted because of concurency conflicts, will ROLLBACK instead
What is the right thing to do? It seems that anything that writes to the db even when they are separate tables causes these messages.
Best regards and thanks,
Alastair
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On Wed, Apr 9, 2014 at 1:01 AM, Alastair McKinley
Hi Babis,
Thanks for the response. I thought that there might have been some mechanism that I was missing.
I'm wondering though, how would serialising the client side scale to multiple clients?
My use case is essentially a join + grouped aggregate of a large read only table (10s-100s of millions of rows) with a small table unique to that session (100s of rows).
Excuse me for not being very precise on that. When I was mentioning "concurrency as a dimension" to optimize, I meant concurrency in the style of OLTP queries, in which you often have updates. In OLAP, you focus on read-only queries, and the optimistic concurrency control in that case is the lightest implementation you can have, from a database's engine point of view. In that case (like it is for you -- join + aggregate queries), you do not need to serialize anything. You need to serialize your queries in the client side only if they contain update operations.
I also saw that is not just create queries that fail, also insert into queries. Does that make sense?
Well, if multiple processes insert into the same table, it is reasonable. If not, then it might be the case that one insert overlaps with a schema update query of another process. Otherwise, i.e., we have only inserts in different tables by different processes, I presume that inserts modify the catalog for some reason, for which we need a MonetDB expert to confirm/answer. :) Regards, Babis
Best regards,
Alastair
________________________________________ From: users-list
on behalf of Charalampos Nikolaou Sent: 08 April 2014 16:36 To: Communication channel for MonetDB users Subject: Re: Properly handling of concurrency conflicts Hi Alastair,
unfortunately you cannot do this the way you want in MonetDB, since it is targeting OLAP workloads, where concurrency is not a dimension to optimize. Thus, MonetDB implements an optimistic concurrency control, just to offer the ACIDity properties that one expects from a DBMS. Even though each process writes to a separate table, create queries still have to update the catalog, and I think this is where you get the conflict.
In summary, try to serialize in the client-side. :)
Hope that helps, Babis
On Tue, Apr 8, 2014 at 6:01 PM, Alastair McKinley
wrote: Dear all,
I am currently testing an application running on MonetDB that experiences intermittent errors when under high concurrency.
I believe these are aborted transactions (the mapi error code is -4).
The applications loads a small input table using the mapi interface, runs two queries internally (including writing to an intermediate table) and then reads the result set.
I have tried to produce a similar minimal example using the Python API which does produce concurrency errors and I am wondering what the correct thing to do on the client side is. Should I simply resubmit the failed query?
The Python script used is shown below:
import monetdb.sql import sys from multiprocessing import Process import uuid
def f(): dbh = monetdb.sql.Connection(port=int(sys.argv[1]),database=sys.argv[2],hostname=sys.argv[3],autocommit=True) for i in range (0,20): tablename = "input_"+str(uuid.uuid1()).replace("-","_") cursor = dbh.cursor() #cursor.execute('create local temporary table %s (term int, p float) on commit preserve rows;' % tablename) cursor.execute('declare table %s (term int, p float);' % tablename) cursor.execute('insert into %s values (1,0.1);' % tablename) cursor.execute('select * from %s;' % tablename) print(cursor.fetchall()) cursor.execute('drop table %s;' % tablename)
dbh.close()
procs=[] for t in range(0,int(sys.argv[4])): p = Process(target=f) procs.append(p) p.start()
for proc in procs: proc.join()
Running this locally with "python2.7 concurrency.py 50000 test_db 127.0.0.1 10" results in a number of error messages like:
ProgrammingError: 40000!COMMIT: transaction is aborted because of concurency conflicts, will ROLLBACK instead
What is the right thing to do? It seems that anything that writes to the db even when they are separate tables causes these messages.
Best regards and thanks,
Alastair
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Alastair McKinley
-
Charalampos Nikolaou