Hello, We’ve been battling for some time with what is fundamentally a simple query. We have a databases with about 15 columns which store int and float values. The query searches in each column with an IN condition , where the possible test values in the IN condition may vary from a 1 to 200 - where these 200 conditions are all the possible values of that column. So normally a query would look something like: SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 etc. The database has about 2 mil rows but will grow to potentially 10 mil rows. A query similar to the one above takes about 80-100 ms. With about 50 test values for each IN. So far, the best results we got with MariaDB and proper indexing using Engine Memory where the above query takes about 2ms. We suspect MonetDB is not optimised for such queries, but since we don’t know much about its workings can you please confirm this? Otherwise, is there a way to improve the above query? Say using multiple JOINs instead of IN ? By the way, we are experimenting with MonetDB out of curiosity, we heard a lot of good things about it. Thank you very much, Silviu
Hi You should understand the difference between hot and cold runs. Hash indices are automatically created the first time you run such a query. Furthermore, your intermediates are still large. For details on where the time is spent in the query you can look at advice in https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming regards, Martin On 26/11/2016 20:28, blacky_2k2@yahoo.com wrote:
Hello,
We’ve been battling for some time with what is fundamentally a simple query.
We have a databases with about 15 columns which store int and float values.
The query searches in each column with an IN condition , where the possible test values in the IN condition may vary from a 1 to 200 - where these 200 conditions are all the possible values of that column.
So normally a query would look something like:
SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 etc.
The database has about 2 mil rows but will grow to potentially 10 mil rows.
A query similar to the one above takes about 80-100 ms. With about 50 test values for each IN.
So far, the best results we got with MariaDB and proper indexing using Engine Memory where the above query takes about 2ms.
We suspect MonetDB is not optimised for such queries, but since we don’t know much about its workings can you please confirm this? Otherwise, is there a way to improve the above query? Say using multiple JOINs instead of IN ?
By the way, we are experimenting with MonetDB out of curiosity, we heard a lot of good things about it.
Thank you very much,
Silviu
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hello Martin, Thank you for your reply. Yes, we’ve noticed the difference in time between hot and cold runs. We are only talking about hot runs. BTW, forgot to mention we created the database on a RAMDISK for improved performance. Query time measurements are accurate, I mean, it’s observable when a query takes 50ms and 500ms, the time it takes to retrieve the results is noticeable since you have to wait to see the result, normally the queries should be fast enough so that such things wouldn’t matter and results would be delivered instanously. We are also talking about empty result sets, cause we know these take the longest. Also the time varies with the number of conditions, for instance take the query SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 LIMIT 1 etc. With only one IN condition the query time is 10-20 ms, add 10 IN conditions and it goes to 100 ms or more. I suppose this is normal given the LIMIT 1 condition. Also these timing are with sys.querylog_disable(); with it enabled it will get to 400ms . As mentioned earlier, from what I read MonetDB shines at analytic queries, while we are basically trying to retrieve one id from a large haystack based on a certain parameters. But still I wanted to give my feedback on the matter. All the best, Silviu From: Martin Kersten Sent: sâmbătă, 26 noiembrie 2016 23:00 To: Communication channel for MonetDB users Subject: Re: MonetDB and IN search Hi You should understand the difference between hot and cold runs. Hash indices are automatically created the first time you run such a query. Furthermore, your intermediates are still large. For details on where the time is spent in the query you can look at advice in https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming regards, Martin On 26/11/2016 20:28, blacky_2k2@yahoo.com wrote:
Hello,
We’ve been battling for some time with what is fundamentally a simple query.
We have a databases with about 15 columns which store int and float values.
The query searches in each column with an IN condition , where the possible test values in the IN condition may vary from a 1 to 200 - where these 200 conditions are all the possible values of that column.
So normally a query would look something like:
SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 etc.
The database has about 2 mil rows but will grow to potentially 10 mil rows.
A query similar to the one above takes about 80-100 ms. With about 50 test values for each IN.
So far, the best results we got with MariaDB and proper indexing using Engine Memory where the above query takes about 2ms.
We suspect MonetDB is not optimised for such queries, but since we don’t know much about its workings can you please confirm this? Otherwise, is there a way to improve the above query? Say using multiple JOINs instead of IN ?
By the way, we are experimenting with MonetDB out of curiosity, we heard a lot of good things about it.
Thank you very much,
Silviu
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hai Silviu, Thanks for sharing this. You indeed have hit some less optimised parts of MonetDB. By prefixing a query with PLAN or EXPLAIN, you can get its logical or physical execution plan, respectively. - For the IN queries, the EXPLAIN shows that each value in the IN clause is transacted into an algebra.subselect over the whole column. This might explain the growing query exec. time with more IN conditions. - LIMIT is barely optimised (if any at all). It’s only handled at the end of a query exec. with an algebra.subslice to filter out the final topN values. So adding a LIMIT clause doesn’t help reducing the amount of computation through out the query. Turning the IN conditions into JOINs will probably give you better performance. Regards, Jennie
On 27 Nov 2016, at 17:52, blacky_2k2@yahoo.com wrote:
Hello Martin,
Thank you for your reply.
Yes, we’ve noticed the difference in time between hot and cold runs. We are only talking about hot runs. BTW, forgot to mention we created the database on a RAMDISK for improved performance.
Query time measurements are accurate, I mean, it’s observable when a query takes 50ms and 500ms, the time it takes to retrieve the results is noticeable since you have to wait to see the result, normally the queries should be fast enough so that such things wouldn’t matter and results would be delivered instanously. We are also talking about empty result sets, cause we know these take the longest.
Also the time varies with the number of conditions, for instance take the query SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 LIMIT 1 etc. With only one IN condition the query time is 10-20 ms, add 10 IN conditions and it goes to 100 ms or more. I suppose this is normal given the LIMIT 1 condition.
Also these timing are with sys.querylog_disable(); with it enabled it will get to 400ms .
As mentioned earlier, from what I read MonetDB shines at analytic queries, while we are basically trying to retrieve one id from a large haystack based on a certain parameters. But still I wanted to give my feedback on the matter.
All the best, Silviu From: Martin Kersten Sent: sâmbătă, 26 noiembrie 2016 23:00 To: Communication channel for MonetDB users Subject: Re: MonetDB and IN search
Hi
You should understand the difference between hot and cold runs. Hash indices are automatically created the first time you run such a query. Furthermore, your intermediates are still large.
For details on where the time is spent in the query you can look at advice in https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming
regards, Martin
On 26/11/2016 20:28, blacky_2k2@yahoo.com wrote:
Hello,
We’ve been battling for some time with what is fundamentally a simple query.
We have a databases with about 15 columns which store int and float values.
The query searches in each column with an IN condition , where the possible test values in the IN condition may vary from a 1 to 200 - where these 200 conditions are all the possible values of that column.
So normally a query would look something like:
SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 etc.
The database has about 2 mil rows but will grow to potentially 10 mil rows.
A query similar to the one above takes about 80-100 ms. With about 50 test values for each IN.
So far, the best results we got with MariaDB and proper indexing using Engine Memory where the above query takes about 2ms.
We suspect MonetDB is not optimised for such queries, but since we don’t know much about its workings can you please confirm this? Otherwise, is there a way to improve the above query? Say using multiple JOINs instead of IN ?
By the way, we are experimenting with MonetDB out of curiosity, we heard a lot of good things about it.
Thank you very much,
Silviu
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Jennie, Thank you very much for your very valuable insight. Out of curiosity and for the sake of good feedback we will try to see how our the query performs using JOINs. Will get back to you once we have some proper time measurements. Best regards, Silviu From: Ying Zhang Sent: luni, 28 noiembrie 2016 19:34 To: Communication channel for MonetDB users Cc: Martin Kersten Subject: Re: MonetDB and IN search Hai Silviu, Thanks for sharing this. You indeed have hit some less optimised parts of MonetDB. By prefixing a query with PLAN or EXPLAIN, you can get its logical or physical execution plan, respectively. - For the IN queries, the EXPLAIN shows that each value in the IN clause is transacted into an algebra.subselect over the whole column. This might explain the growing query exec. time with more IN conditions. - LIMIT is barely optimised (if any at all). It’s only handled at the end of a query exec. with an algebra.subslice to filter out the final topN values. So adding a LIMIT clause doesn’t help reducing the amount of computation through out the query. Turning the IN conditions into JOINs will probably give you better performance. Regards, Jennie
On 27 Nov 2016, at 17:52, blacky_2k2@yahoo.com wrote:
Hello Martin,
Thank you for your reply.
Yes, we’ve noticed the difference in time between hot and cold runs. We are only talking about hot runs. BTW, forgot to mention we created the database on a RAMDISK for improved performance.
Query time measurements are accurate, I mean, it’s observable when a query takes 50ms and 500ms, the time it takes to retrieve the results is noticeable since you have to wait to see the result, normally the queries should be fast enough so that such things wouldn’t matter and results would be delivered instanously. We are also talking about empty result sets, cause we know these take the longest.
Also the time varies with the number of conditions, for instance take the query SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 LIMIT 1 etc. With only one IN condition the query time is 10-20 ms, add 10 IN conditions and it goes to 100 ms or more. I suppose this is normal given the LIMIT 1 condition.
Also these timing are with sys.querylog_disable(); with it enabled it will get to 400ms .
As mentioned earlier, from what I read MonetDB shines at analytic queries, while we are basically trying to retrieve one id from a large haystack based on a certain parameters. But still I wanted to give my feedback on the matter.
All the best, Silviu From: Martin Kersten Sent: sâmbătă, 26 noiembrie 2016 23:00 To: Communication channel for MonetDB users Subject: Re: MonetDB and IN search
Hi
You should understand the difference between hot and cold runs. Hash indices are automatically created the first time you run such a query. Furthermore, your intermediates are still large.
For details on where the time is spent in the query you can look at advice in https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming
regards, Martin
On 26/11/2016 20:28, blacky_2k2@yahoo.com wrote:
Hello,
We’ve been battling for some time with what is fundamentally a simple query.
We have a databases with about 15 columns which store int and float values.
The query searches in each column with an IN condition , where the possible test values in the IN condition may vary from a 1 to 200 - where these 200 conditions are all the possible values of that column.
So normally a query would look something like:
SELECT id FROM table WHERE a IN (1,2,3,4) AND b IN (4,5,6,) AND c IN (2,5,6) ...... AND i = 5 etc.
The database has about 2 mil rows but will grow to potentially 10 mil rows.
A query similar to the one above takes about 80-100 ms. With about 50 test values for each IN.
So far, the best results we got with MariaDB and proper indexing using Engine Memory where the above query takes about 2ms.
We suspect MonetDB is not optimised for such queries, but since we don’t know much about its workings can you please confirm this? Otherwise, is there a way to improve the above query? Say using multiple JOINs instead of IN ?
By the way, we are experimenting with MonetDB out of curiosity, we heard a lot of good things about it.
Thank you very much,
Silviu
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
blacky_2k2@yahoo.com
-
Martin Kersten
-
Ying Zhang