Re: Re: Add entry in a table only if that entry does not already exist
Hello, sorry for my late replay, I have been really busy in the last few days.
Hello Stefano,
Which MonetDB version do you use? On which OS?
Sorry, I forgot to mention this. I am using ubuntu 16.04 and I am running MonetDB inside a Docker container (which also runs ubuntu).
On 4 Jul 2017, at 16:32, Stefano Piani
wrote: Dear monetdb users,
I started to use MonetDB a few weeks ago and, until now, I have been able to use it proficiently. Unfortunately, I have faced a small issue that I have not been able to figure out how to fix.
I collect data generated by different users. A unique id is associated to each user. I have a specific table (called "user") where I save the name and some other information about the user. In this table, the ID is a primary key. When I collect the data, I want to check if the specific user who generated that data is already saved in the user table. If this is not the case, I want to add a row to the user table for this user. Unfortunately, I have not been able to accomplish this operation in one SQL query and, therefore, I have been forced to query for the user id and, if the answer is negative, add the user using another query. The following is the python script that perform the operations.
def save_user_in_database(user, cursor): db_line = 'SELECT 1 FROM users WHERE user_id = {}'.format(user.id) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
answer = cursor.fetchall() LOGGER.info('Get this answer %s', answer) if len(answer) == 0: LOGGER.info('Inserting new user')
db_line = sql_cmd_from_template( USER_TABLE_INSERT_TEMPLATE, user.id, user.name, user.description, user.location ) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
Do you execute the queries in a different schema than the default ?sys? schema (which is recommended for database users)? Because ?users? is a special VIEW in the ?sys? schema.
Yes, I should be in a different schema. In any case, I changed the name of the table just to be on the safe way.
This function is executed inside a single threaded application. Moreover, there are no "DELETE" or similar operations inside this script (it only adds rows to the database). No user (human or software) but this script connects to my MonetDB
instance.
Unfortunately, after I while that the script is running (let's say one or two days), something unexpected happens. For example, this is the log generated for user 42847029 (there are a lot of other lines in between that I removed). As you can see, after a while, my check fails and I don't understand why.
2017/06/13 09:30:01 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 09:30:01 - INFO - save_user_in_database: Get this answer
[(1,)]
2017/06/13 10:00:04 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:00:04 - INFO - save_user_in_database: Get this answer
[(1,)]
2017/06/13 10:02:33 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:02:33 - INFO - save_user_in_database: Get this answer
[(1,)]
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:30:36 - INFO - save_user_in_database: Get this answer [] 2017/06/13 10:30:36 - INFO - save_user_in_database: Inserting new user
This shouldn?t happen, as is shown by the error below that a user_id of 42847029 still exists; also, as you mentioned, there hasn?t been any delete queries.
This might be a bug in MonetDB.
Is this problem repeatable? If yes, can you give us a sequence of queries to reproduce this problem. We?d like to look into it.
I will try to reproduce the problem with some scripts as soon as possible. Unfortunately, I am quite busy in this period and therefore this will require some time. In any case, this happens only when the system is busy and the table is big enough. I think that the key to reproduce the problem is to stress the database with several operations per second. In this situation, I have noticed that the db starts to use a lot of RAM (about 15 GB for a database that occupies 3 GB on the disk). At that point, the db becomes extremely slow (for example, for a query that usually takes less than 10 ms I have to wait for more than 2 seconds) and the unique key problem start to appear.
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: INSERT INTO users(user_id, user_name, user_description, user_location) VALUES (42847029, 'XXXX', 'XXXX', 'XXXX') 2017/06/13 10:30:36 - WARNING - on_data: Error elaborating data Traceback (most recent call last): File "data_collector.py", line 267, in on_data return self.elaborate_data(data) File "data_collector.py", line 354, in elaborate_data save_user_in_database(user, cursor) File "data_collector.py", line 234, in save_user_in_database cursor.execute(db_line) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.
line 164, in execute block = self.connection.execute(query) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1. 0.4-py3.6.egg/pymonetdb/sql/connections.py", line 133, in execute return self.command('s' + query + ';') File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1. 0.4-py3.6.egg/pymonetdb/sql/connections.py", line 138, in command return self.mapi.cmd(command) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1. 0.4-py3.6.egg/pymonetdb/mapi.py", line 235, in cmd raise exception(string) pymonetdb.exceptions.IntegrityError: INSERT INTO: PRIMARY KEY constraint 'users.users_user_id_pkey' violated
Have you any advice? Is there a way to perform the same check in just one SQL line? Is this way atomic (if a would ever move to a multithreading script)?
It is not atomic, but you can make it atomic by wrapping both queries into one transaction.
Am I doing something wrong with my current code? Please, take into account that I do not have a deep experience with other SQL tools so it is perfectly possible that I miss something that for you is extremely obvious.
No, you?re not doing anything wrong. But there are better alternatives:
1. If this is a standalone action (i.e. not part of a transaction), the check for existing user_id is not necessary. The simplest and most efficient way is to always do the insert and ignore the ?primary key constraint ? violated? >error (e.g. use a try-catch block).
2. If this insert is going to be part of a (big, long) transaction, then you won?t want the insert to fail. In that case, you can use a stored
0.4-py3.6.egg/pymonetdb/sql/cursors.py", procedure:
create procedure insert_new (id int, name varchar(10), description
varchar(10), location varchar(10))
begin declare cnt int; set cnt = (select count(*) from myusers where user_id=id);
if (cnt = 0 ) then insert into myusers values (id, name, description, location); end if; end;
Then call the procedure: call insert_new (42847029, 'XXXX', 'XXXX', 'XXXX?)
I hope this helps.
II tried the solution number 2, but unfortunately the problem is still there. I get an Integrity error even calling the procedure that you described before. What is really strange is that, after a while, execute the procedure that you have showed me requires more than 3 seconds on my system. If I restart the monetdb instance, it takes a few milliseconds again. I think that maybe there is something broken in my setup. It could be that I have misconfigured something or I have a problem with my machine. Unfortunately, taking also into account that I am using Docker (which adds a layer of complexity), there are plenty of things that could not work as expected. Thank you for your help!
Regards, Jennie
Thank you in advance!
Best regards,
Stefano Piani
On 17 Jul 2017, at 12:10, Stefano Piani
wrote: Hello, sorry for my late replay, I have been really busy in the last few days.
Hello Stefano,
Which MonetDB version do you use? On which OS?
Sorry, I forgot to mention this. I am using ubuntu 16.04 and I am running MonetDB inside a Docker container (which also runs ubuntu).
Hai Stefano, Which MonetDB version did you install in the Docker container? How much memory do you allocate to the container? Do you run any other applications on the machine (your laptop/desktop?)?
On 4 Jul 2017, at 16:32, Stefano Piani
wrote: Dear monetdb users,
I started to use MonetDB a few weeks ago and, until now, I have been able to use it proficiently. Unfortunately, I have faced a small issue that I have not been able to figure out how to fix.
I collect data generated by different users. A unique id is associated to each user. I have a specific table (called "user") where I save the name and some other information about the user. In this table, the ID is a primary key. When I collect the data, I want to check if the specific user who generated that data is already saved in the user table. If this is not the case, I want to add a row to the user table for this user. Unfortunately, I have not been able to accomplish this operation in one SQL query and, therefore, I have been forced to query for the user id and, if the answer is negative, add the user using another query. The following is the python script that perform the operations.
def save_user_in_database(user, cursor): db_line = 'SELECT 1 FROM users WHERE user_id = {}'.format(user.id) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
answer = cursor.fetchall() LOGGER.info('Get this answer %s', answer) if len(answer) == 0: LOGGER.info('Inserting new user')
db_line = sql_cmd_from_template( USER_TABLE_INSERT_TEMPLATE, user.id, user.name, user.description, user.location ) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
Do you execute the queries in a different schema than the default ?sys? schema (which is recommended for database users)? Because ?users? is a special VIEW in the ?sys? schema.
Yes, I should be in a different schema. In any case, I changed the name of the table just to be on the safe way.
This function is executed inside a single threaded application. Moreover, there are no "DELETE" or similar operations inside this script (it only adds rows to the database). No user (human or software) but this script connects to my MonetDB instance.
Unfortunately, after I while that the script is running (let's say one or two days), something unexpected happens. For example, this is the log generated for user 42847029 (there are a lot of other lines in between that I removed). As you can see, after a while, my check fails and I don't understand why.
2017/06/13 09:30:01 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 09:30:01 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:00:04 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:00:04 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:02:33 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:02:33 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:30:36 - INFO - save_user_in_database: Get this answer [] 2017/06/13 10:30:36 - INFO - save_user_in_database: Inserting new user
This shouldn?t happen, as is shown by the error below that a user_id of 42847029 still exists; also, as you mentioned, there hasn?t been any delete queries.
This might be a bug in MonetDB.
Is this problem repeatable? If yes, can you give us a sequence of queries to reproduce this problem. We?d like to look into it.
I will try to reproduce the problem with some scripts as soon as possible. Unfortunately, I am quite busy in this period and therefore this will require some time. In any case, this happens only when the system is busy
What exactly do you mean with “the system is busy”? The Docker container or the machine on which the Docker container is running? Is MonetDB keeping the system busy, or other applications?
and the table is big enough.
how large is the table: data size and number of tuples?
I think that the key to reproduce the problem is to stress the database with several operations per second. In this situation, I have noticed that the db starts to use a lot of RAM (about 15 GB for a database that occupies 3 GB on the disk).
This doesn’t sound right, it feels like something is being piled up. We’ll need a reproducible script or something similar to be able to look deeper into this.
At that point, the db becomes extremely slow (for example, for a query that usually takes less than 10 ms I have to wait for more than 2 seconds) and the unique key problem start to appear.
What other queries do you also run? Are you sure that there are not other queries that insert into or delete from the users table?
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: INSERT INTO users(user_id, user_name, user_description, user_location) VALUES (42847029, 'XXXX', 'XXXX', 'XXXX') 2017/06/13 10:30:36 - WARNING - on_data: Error elaborating data Traceback (most recent call last): File "data_collector.py", line 267, in on_data return self.elaborate_data(data) File "data_collector.py", line 354, in elaborate_data save_user_in_database(user, cursor) File "data_collector.py", line 234, in save_user_in_database cursor.execute(db_line) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/cursors.py", line 164, in execute block = self.connection.execute(query) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 133, in execute return self.command('s' + query + ';') File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 138, in command return self.mapi.cmd(command) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/mapi.py", line 235, in cmd raise exception(string) pymonetdb.exceptions.IntegrityError: INSERT INTO: PRIMARY KEY constraint 'users.users_user_id_pkey' violated
Have you any advice? Is there a way to perform the same check in just one SQL line? Is this way atomic (if a would ever move to a multithreading script)?
It is not atomic, but you can make it atomic by wrapping both queries into one transaction.
Am I doing something wrong with my current code? Please, take into account that I do not have a deep experience with other SQL tools so it is perfectly possible that I miss something that for you is extremely obvious.
No, you?re not doing anything wrong. But there are better alternatives:
1. If this is a standalone action (i.e. not part of a transaction), the check for existing user_id is not necessary. The simplest and most efficient way is to always do the insert and ignore the ?primary key constraint ? violated? >error (e.g. use a try-catch block).
2. If this insert is going to be part of a (big, long) transaction, then you won?t want the insert to fail. In that case, you can use a stored procedure:
create procedure insert_new (id int, name varchar(10), description varchar(10), location varchar(10)) begin declare cnt int; set cnt = (select count(*) from myusers where user_id=id);
if (cnt = 0 ) then insert into myusers values (id, name, description, location); end if; end;
Then call the procedure: call insert_new (42847029, 'XXXX', 'XXXX', 'XXXX?)
I hope this helps.
II tried the solution number 2, but unfortunately the problem is still there. I get an Integrity error even calling the procedure that you described before.
Did you get the same error? That really shouldn’t happen. However, if there are multiple queries that call “insert_new” simultaneously, then some transactions can get aborted due to conflict. I can’t reproduce this with my simple setup: i) created a dummy table “myusers", and the function “insert_new": sql>\d myusers CREATE TABLE "sys"."myusers" ( "user_id" INTEGER NOT NULL, "name" VARCHAR(20), "description" VARCHAR(20), "location" VARCHAR(20), CONSTRAINT "myusers_user_id_pkey" PRIMARY KEY ("user_id") ); sql>select id, name , func from functions where name='insert_new'; +------+------------+----------------------------------------------------------+ | id | name | func | +======+============+==========================================================+ | 8218 | insert_new | create procedure insert_new (id int, name varchar(10), d | : : : escription varchar(10), location varchar(10)) begin decl : : : : are cnt int; set cnt = (select count(*) from myusers whe : : : : re user_id=id); if (cnt = 0 ) then insert into myusers v : : : : alues (id, name, description, location); end if; end; : +------+------------+----------------------------------------------------------+ 1 tuple (2.204ms) ii) then I let the following command run for ~1 hour: $ while true ; do ID=$((RANDOM/10)) ; mclient -s "call insert_new($ID,'XXXX', 'XXXX', 'XXXX');" ; echo $ID ; done The values of $ID range from 0 to 3276. I didn’t get an integrity error.
What is really strange is that, after a while, execute the procedure that you have showed me requires more than 3 seconds on my system. If I restart the monetdb instance, it takes a few milliseconds again. I think that maybe there is something broken in my setup. It could be that I have misconfigured something or I have a problem with my machine. Unfortunately, taking also into account that I am using Docker (which adds a layer of complexity), there are plenty of things that could not work as expected. Thank you for your help!
Using Docker is not necessarily a problem. But please do give us the information I asked above about the environment in which you’re running the container. Best, Jennie
Regards, Jennie
Thank you in advance!
Best regards,
Stefano Piani
participants (2)
-
Stefano Piani
-
Ying Zhang