Hello, sorry for my late replay, I have been really busy in the last few days.
Hello Stefano,
Which MonetDB version do you use? On which OS?
Sorry, I forgot to mention this. I am using ubuntu 16.04 and I am running MonetDB inside a Docker container (which also runs ubuntu).
On 4 Jul 2017, at 16:32, Stefano Piani
wrote: Dear monetdb users,
I started to use MonetDB a few weeks ago and, until now, I have been able to use it proficiently. Unfortunately, I have faced a small issue that I have not been able to figure out how to fix.
I collect data generated by different users. A unique id is associated to each user. I have a specific table (called "user") where I save the name and some other information about the user. In this table, the ID is a primary key. When I collect the data, I want to check if the specific user who generated that data is already saved in the user table. If this is not the case, I want to add a row to the user table for this user. Unfortunately, I have not been able to accomplish this operation in one SQL query and, therefore, I have been forced to query for the user id and, if the answer is negative, add the user using another query. The following is the python script that perform the operations.
def save_user_in_database(user, cursor): db_line = 'SELECT 1 FROM users WHERE user_id = {}'.format(user.id) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
answer = cursor.fetchall() LOGGER.info('Get this answer %s', answer) if len(answer) == 0: LOGGER.info('Inserting new user')
db_line = sql_cmd_from_template( USER_TABLE_INSERT_TEMPLATE, user.id, user.name, user.description, user.location ) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
Do you execute the queries in a different schema than the default ?sys? schema (which is recommended for database users)? Because ?users? is a special VIEW in the ?sys? schema.
Yes, I should be in a different schema. In any case, I changed the name of the table just to be on the safe way.
This function is executed inside a single threaded application. Moreover, there are no "DELETE" or similar operations inside this script (it only adds rows to the database). No user (human or software) but this script connects to my MonetDB
instance.
Unfortunately, after I while that the script is running (let's say one or two days), something unexpected happens. For example, this is the log generated for user 42847029 (there are a lot of other lines in between that I removed). As you can see, after a while, my check fails and I don't understand why.
2017/06/13 09:30:01 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 09:30:01 - INFO - save_user_in_database: Get this answer
[(1,)]
2017/06/13 10:00:04 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:00:04 - INFO - save_user_in_database: Get this answer
[(1,)]
2017/06/13 10:02:33 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:02:33 - INFO - save_user_in_database: Get this answer
[(1,)]
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:30:36 - INFO - save_user_in_database: Get this answer [] 2017/06/13 10:30:36 - INFO - save_user_in_database: Inserting new user
This shouldn?t happen, as is shown by the error below that a user_id of 42847029 still exists; also, as you mentioned, there hasn?t been any delete queries.
This might be a bug in MonetDB.
Is this problem repeatable? If yes, can you give us a sequence of queries to reproduce this problem. We?d like to look into it.
I will try to reproduce the problem with some scripts as soon as possible. Unfortunately, I am quite busy in this period and therefore this will require some time. In any case, this happens only when the system is busy and the table is big enough. I think that the key to reproduce the problem is to stress the database with several operations per second. In this situation, I have noticed that the db starts to use a lot of RAM (about 15 GB for a database that occupies 3 GB on the disk). At that point, the db becomes extremely slow (for example, for a query that usually takes less than 10 ms I have to wait for more than 2 seconds) and the unique key problem start to appear.
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: INSERT INTO users(user_id, user_name, user_description, user_location) VALUES (42847029, 'XXXX', 'XXXX', 'XXXX') 2017/06/13 10:30:36 - WARNING - on_data: Error elaborating data Traceback (most recent call last): File "data_collector.py", line 267, in on_data return self.elaborate_data(data) File "data_collector.py", line 354, in elaborate_data save_user_in_database(user, cursor) File "data_collector.py", line 234, in save_user_in_database cursor.execute(db_line) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.
line 164, in execute block = self.connection.execute(query) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1. 0.4-py3.6.egg/pymonetdb/sql/connections.py", line 133, in execute return self.command('s' + query + ';') File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1. 0.4-py3.6.egg/pymonetdb/sql/connections.py", line 138, in command return self.mapi.cmd(command) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1. 0.4-py3.6.egg/pymonetdb/mapi.py", line 235, in cmd raise exception(string) pymonetdb.exceptions.IntegrityError: INSERT INTO: PRIMARY KEY constraint 'users.users_user_id_pkey' violated
Have you any advice? Is there a way to perform the same check in just one SQL line? Is this way atomic (if a would ever move to a multithreading script)?
It is not atomic, but you can make it atomic by wrapping both queries into one transaction.
Am I doing something wrong with my current code? Please, take into account that I do not have a deep experience with other SQL tools so it is perfectly possible that I miss something that for you is extremely obvious.
No, you?re not doing anything wrong. But there are better alternatives:
1. If this is a standalone action (i.e. not part of a transaction), the check for existing user_id is not necessary. The simplest and most efficient way is to always do the insert and ignore the ?primary key constraint ? violated? >error (e.g. use a try-catch block).
2. If this insert is going to be part of a (big, long) transaction, then you won?t want the insert to fail. In that case, you can use a stored
0.4-py3.6.egg/pymonetdb/sql/cursors.py", procedure:
create procedure insert_new (id int, name varchar(10), description
varchar(10), location varchar(10))
begin declare cnt int; set cnt = (select count(*) from myusers where user_id=id);
if (cnt = 0 ) then insert into myusers values (id, name, description, location); end if; end;
Then call the procedure: call insert_new (42847029, 'XXXX', 'XXXX', 'XXXX?)
I hope this helps.
II tried the solution number 2, but unfortunately the problem is still there. I get an Integrity error even calling the procedure that you described before. What is really strange is that, after a while, execute the procedure that you have showed me requires more than 3 seconds on my system. If I restart the monetdb instance, it takes a few milliseconds again. I think that maybe there is something broken in my setup. It could be that I have misconfigured something or I have a problem with my machine. Unfortunately, taking also into account that I am using Docker (which adds a layer of complexity), there are plenty of things that could not work as expected. Thank you for your help!
Regards, Jennie
Thank you in advance!
Best regards,
Stefano Piani