Add entry in a table only if that entry does not already exist
Dear monetdb users, I started to use MonetDB a few weeks ago and, until now, I have been able to use it proficiently. Unfortunately, I have faced a small issue that I have not been able to figure out how to fix. I collect data generated by different users. A unique id is associated to each user. I have a specific table (called "user") where I save the name and some other information about the user. In this table, the ID is a primary key. When I collect the data, I want to check if the specific user who generated that data is already saved in the user table. If this is not the case, I want to add a row to the user table for this user. Unfortunately, I have not been able to accomplish this operation in one SQL query and, therefore, I have been forced to query for the user id and, if the answer is negative, add the user using another query. The following is the python script that perform the operations. def save_user_in_database(user, cursor): db_line = 'SELECT 1 FROM users WHERE user_id = {}'.format(user.id) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line) answer = cursor.fetchall() LOGGER.info('Get this answer %s', answer) if len(answer) == 0: LOGGER.info('Inserting new user') db_line = sql_cmd_from_template( USER_TABLE_INSERT_TEMPLATE, user.id, user.name, user.description, user.location ) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line) This function is executed inside a single threaded application. Moreover, there are no "DELETE" or similar operations inside this script (it only adds rows to the database). No user (human or software) but this script connects to my MonetDB instance. Unfortunately, after I while that the script is running (let's say one or two days), something unexpected happens. For example, this is the log generated for user 42847029 (there are a lot of other lines in between that I removed). As you can see, after a while, my check fails and I don't understand why. 2017/06/13 09:30:01 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 09:30:01 - INFO - save_user_in_database: Get this answer [(1,)] 2017/06/13 10:00:04 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:00:04 - INFO - save_user_in_database: Get this answer [(1,)] 2017/06/13 10:02:33 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:02:33 - INFO - save_user_in_database: Get this answer [(1,)] 2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:30:36 - INFO - save_user_in_database: Get this answer [] 2017/06/13 10:30:36 - INFO - save_user_in_database: Inserting new user 2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: INSERT INTO users(user_id, user_name, user_description, user_location) VALUES (42847029, 'XXXX', 'XXXX', 'XXXX') 2017/06/13 10:30:36 - WARNING - on_data: Error elaborating data Traceback (most recent call last): File "data_collector.py", line 267, in on_data return self.elaborate_data(data) File "data_collector.py", line 354, in elaborate_data save_user_in_database(user, cursor) File "data_collector.py", line 234, in save_user_in_database cursor.execute(db_line) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/cursors.py", line 164, in execute block = self.connection.execute(query) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 133, in execute return self.command('s' + query + ';') File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 138, in command return self.mapi.cmd(command) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/mapi.py", line 235, in cmd raise exception(string) pymonetdb.exceptions.IntegrityError: INSERT INTO: PRIMARY KEY constraint 'users.users_user_id_pkey' violated Have you any advice? Is there a way to perform the same check in just one SQL line? Is this way atomic (if a would ever move to a multithreading script)? Am I doing something wrong with my current code? Please, take into account that I do not have a deep experience with other SQL tools so it is perfectly possible that I miss something that for you is extremely obvious. Thank you in advance! Best regards, Stefano Piani
Hai Stefano, I'll give you some information tomorrow. It's been busy last days. Regards , Jennie
On 4 Jul 2017, at 16:32, Stefano Piani
wrote: Dear monetdb users,
I started to use MonetDB a few weeks ago and, until now, I have been able to use it proficiently. Unfortunately, I have faced a small issue that I have not been able to figure out how to fix.
I collect data generated by different users. A unique id is associated to each user. I have a specific table (called "user") where I save the name and some other information about the user. In this table, the ID is a primary key. When I collect the data, I want to check if the specific user who generated that data is already saved in the user table. If this is not the case, I want to add a row to the user table for this user. Unfortunately, I have not been able to accomplish this operation in one SQL query and, therefore, I have been forced to query for the user id and, if the answer is negative, add the user using another query. The following is the python script that perform the operations.
def save_user_in_database(user, cursor): db_line = 'SELECT 1 FROM users WHERE user_id = {}'.format(user.id) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
answer = cursor.fetchall() LOGGER.info('Get this answer %s', answer) if len(answer) == 0: LOGGER.info('Inserting new user')
db_line = sql_cmd_from_template( USER_TABLE_INSERT_TEMPLATE, user.id, user.name, user.description, user.location ) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
This function is executed inside a single threaded application. Moreover, there are no "DELETE" or similar operations inside this script (it only adds rows to the database). No user (human or software) but this script connects to my MonetDB instance.
Unfortunately, after I while that the script is running (let's say one or two days), something unexpected happens. For example, this is the log generated for user 42847029 (there are a lot of other lines in between that I removed). As you can see, after a while, my check fails and I don't understand why.
2017/06/13 09:30:01 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 09:30:01 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:00:04 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:00:04 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:02:33 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:02:33 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:30:36 - INFO - save_user_in_database: Get this answer [] 2017/06/13 10:30:36 - INFO - save_user_in_database: Inserting new user 2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: INSERT INTO users(user_id, user_name, user_description, user_location) VALUES (42847029, 'XXXX', 'XXXX', 'XXXX') 2017/06/13 10:30:36 - WARNING - on_data: Error elaborating data Traceback (most recent call last): File "data_collector.py", line 267, in on_data return self.elaborate_data(data) File "data_collector.py", line 354, in elaborate_data save_user_in_database(user, cursor) File "data_collector.py", line 234, in save_user_in_database cursor.execute(db_line) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/cursors.py", line 164, in execute block = self.connection.execute(query) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 133, in execute return self.command('s' + query + ';') File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 138, in command return self.mapi.cmd(command) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/mapi.py", line 235, in cmd raise exception(string) pymonetdb.exceptions.IntegrityError: INSERT INTO: PRIMARY KEY constraint 'users.users_user_id_pkey' violated
Have you any advice? Is there a way to perform the same check in just one SQL line? Is this way atomic (if a would ever move to a multithreading script)? Am I doing something wrong with my current code? Please, take into account that I do not have a deep experience with other SQL tools so it is perfectly possible that I miss something that for you is extremely obvious.
Thank you in advance!
Best regards,
Stefano Piani _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hello Stefano, Which MonetDB version do you use? On which OS?
On 4 Jul 2017, at 16:32, Stefano Piani
wrote: Dear monetdb users,
I started to use MonetDB a few weeks ago and, until now, I have been able to use it proficiently. Unfortunately, I have faced a small issue that I have not been able to figure out how to fix.
I collect data generated by different users. A unique id is associated to each user. I have a specific table (called "user") where I save the name and some other information about the user. In this table, the ID is a primary key. When I collect the data, I want to check if the specific user who generated that data is already saved in the user table. If this is not the case, I want to add a row to the user table for this user. Unfortunately, I have not been able to accomplish this operation in one SQL query and, therefore, I have been forced to query for the user id and, if the answer is negative, add the user using another query. The following is the python script that perform the operations.
def save_user_in_database(user, cursor): db_line = 'SELECT 1 FROM users WHERE user_id = {}'.format(user.id) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
answer = cursor.fetchall() LOGGER.info('Get this answer %s', answer) if len(answer) == 0: LOGGER.info('Inserting new user')
db_line = sql_cmd_from_template( USER_TABLE_INSERT_TEMPLATE, user.id, user.name, user.description, user.location ) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
Do you execute the queries in a different schema than the default “sys” schema (which is recommended for database users)? Because “users” is a special VIEW in the “sys” schema.
This function is executed inside a single threaded application. Moreover, there are no "DELETE" or similar operations inside this script (it only adds rows to the database). No user (human or software) but this script connects to my MonetDB instance.
Unfortunately, after I while that the script is running (let's say one or two days), something unexpected happens. For example, this is the log generated for user 42847029 (there are a lot of other lines in between that I removed). As you can see, after a while, my check fails and I don't understand why.
2017/06/13 09:30:01 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 09:30:01 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:00:04 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:00:04 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:02:33 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:02:33 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:30:36 - INFO - save_user_in_database: Get this answer [] 2017/06/13 10:30:36 - INFO - save_user_in_database: Inserting new user
This shouldn’t happen, as is shown by the error below that a user_id of 42847029 still exists; also, as you mentioned, there hasn’t been any delete queries. This might be a bug in MonetDB. Is this problem repeatable? If yes, can you give us a sequence of queries to reproduce this problem. We’d like to look into it.
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: INSERT INTO users(user_id, user_name, user_description, user_location) VALUES (42847029, 'XXXX', 'XXXX', 'XXXX') 2017/06/13 10:30:36 - WARNING - on_data: Error elaborating data Traceback (most recent call last): File "data_collector.py", line 267, in on_data return self.elaborate_data(data) File "data_collector.py", line 354, in elaborate_data save_user_in_database(user, cursor) File "data_collector.py", line 234, in save_user_in_database cursor.execute(db_line) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/cursors.py", line 164, in execute block = self.connection.execute(query) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 133, in execute return self.command('s' + query + ';') File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 138, in command return self.mapi.cmd(command) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/mapi.py", line 235, in cmd raise exception(string) pymonetdb.exceptions.IntegrityError: INSERT INTO: PRIMARY KEY constraint 'users.users_user_id_pkey' violated
Have you any advice? Is there a way to perform the same check in just one SQL line? Is this way atomic (if a would ever move to a multithreading script)?
It is not atomic, but you can make it atomic by wrapping both queries into one transaction.
Am I doing something wrong with my current code? Please, take into account that I do not have a deep experience with other SQL tools so it is perfectly possible that I miss something that for you is extremely obvious.
No, you’re not doing anything wrong. But there are better alternatives: 1. If this is a standalone action (i.e. not part of a transaction), the check for existing user_id is not necessary. The simplest and most efficient way is to always do the insert and ignore the “primary key constraint … violated” error (e.g. use a try-catch block). 2. If this insert is going to be part of a (big, long) transaction, then you won’t want the insert to fail. In that case, you can use a stored procedure: create procedure insert_new (id int, name varchar(10), description varchar(10), location varchar(10)) begin declare cnt int; set cnt = (select count(*) from myusers where user_id=id); if (cnt = 0 ) then insert into myusers values (id, name, description, location); end if; end; Then call the procedure: call insert_new (42847029, 'XXXX', 'XXXX', 'XXXX’) I hope this helps. Regards, Jennie
Thank you in advance!
Best regards,
Stefano Piani _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hello Stefano, Have you been able to test my suggestions? Do they help? Jennie
On 7 Jul 2017, at 15:16, Ying Zhang
wrote: Hello Stefano,
Which MonetDB version do you use? On which OS?
On 4 Jul 2017, at 16:32, Stefano Piani
wrote: Dear monetdb users,
I started to use MonetDB a few weeks ago and, until now, I have been able to use it proficiently. Unfortunately, I have faced a small issue that I have not been able to figure out how to fix.
I collect data generated by different users. A unique id is associated to each user. I have a specific table (called "user") where I save the name and some other information about the user. In this table, the ID is a primary key. When I collect the data, I want to check if the specific user who generated that data is already saved in the user table. If this is not the case, I want to add a row to the user table for this user. Unfortunately, I have not been able to accomplish this operation in one SQL query and, therefore, I have been forced to query for the user id and, if the answer is negative, add the user using another query. The following is the python script that perform the operations.
def save_user_in_database(user, cursor): db_line = 'SELECT 1 FROM users WHERE user_id = {}'.format(user.id) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
answer = cursor.fetchall() LOGGER.info('Get this answer %s', answer) if len(answer) == 0: LOGGER.info('Inserting new user')
db_line = sql_cmd_from_template( USER_TABLE_INSERT_TEMPLATE, user.id, user.name, user.description, user.location ) LOGGER.info('Executing on DB: %s', db_line) cursor.execute(db_line)
Do you execute the queries in a different schema than the default “sys” schema (which is recommended for database users)? Because “users” is a special VIEW in the “sys” schema.
This function is executed inside a single threaded application. Moreover, there are no "DELETE" or similar operations inside this script (it only adds rows to the database). No user (human or software) but this script connects to my MonetDB instance.
Unfortunately, after I while that the script is running (let's say one or two days), something unexpected happens. For example, this is the log generated for user 42847029 (there are a lot of other lines in between that I removed). As you can see, after a while, my check fails and I don't understand why.
2017/06/13 09:30:01 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 09:30:01 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:00:04 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:00:04 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:02:33 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:02:33 - INFO - save_user_in_database: Get this answer [(1,)]
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: SELECT 1 FROM users WHERE user_id = 42847029 2017/06/13 10:30:36 - INFO - save_user_in_database: Get this answer [] 2017/06/13 10:30:36 - INFO - save_user_in_database: Inserting new user
This shouldn’t happen, as is shown by the error below that a user_id of 42847029 still exists; also, as you mentioned, there hasn’t been any delete queries.
This might be a bug in MonetDB.
Is this problem repeatable? If yes, can you give us a sequence of queries to reproduce this problem. We’d like to look into it.
2017/06/13 10:30:36 - INFO - save_user_in_database: Executing on DB: INSERT INTO users(user_id, user_name, user_description, user_location) VALUES (42847029, 'XXXX', 'XXXX', 'XXXX') 2017/06/13 10:30:36 - WARNING - on_data: Error elaborating data Traceback (most recent call last): File "data_collector.py", line 267, in on_data return self.elaborate_data(data) File "data_collector.py", line 354, in elaborate_data save_user_in_database(user, cursor) File "data_collector.py", line 234, in save_user_in_database cursor.execute(db_line) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/cursors.py", line 164, in execute block = self.connection.execute(query) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 133, in execute return self.command('s' + query + ';') File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/sql/connections.py", line 138, in command return self.mapi.cmd(command) File "/opt/Python-3.6.0/lib/python3.6/site-packages/pymonetdb-1.0.4-py3.6.egg/pymonetdb/mapi.py", line 235, in cmd raise exception(string) pymonetdb.exceptions.IntegrityError: INSERT INTO: PRIMARY KEY constraint 'users.users_user_id_pkey' violated
Have you any advice? Is there a way to perform the same check in just one SQL line? Is this way atomic (if a would ever move to a multithreading script)?
It is not atomic, but you can make it atomic by wrapping both queries into one transaction.
Am I doing something wrong with my current code? Please, take into account that I do not have a deep experience with other SQL tools so it is perfectly possible that I miss something that for you is extremely obvious.
No, you’re not doing anything wrong. But there are better alternatives:
1. If this is a standalone action (i.e. not part of a transaction), the check for existing user_id is not necessary. The simplest and most efficient way is to always do the insert and ignore the “primary key constraint … violated” error (e.g. use a try-catch block).
2. If this insert is going to be part of a (big, long) transaction, then you won’t want the insert to fail. In that case, you can use a stored procedure:
create procedure insert_new (id int, name varchar(10), description varchar(10), location varchar(10)) begin declare cnt int; set cnt = (select count(*) from myusers where user_id=id);
if (cnt = 0 ) then insert into myusers values (id, name, description, location); end if; end;
Then call the procedure: call insert_new (42847029, 'XXXX', 'XXXX', 'XXXX’)
I hope this helps.
Regards, Jennie
Thank you in advance!
Best regards,
Stefano Piani _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Stefano Piani
-
Ying Zhang