Adding conditions slows query 1000x
Hi all, When I run the following query the results are computed extremely fast (within 5 ms): SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100; However if I add additional conditions to the query so that it becomes the following: SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100; The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB. I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it? Best regards, Dennis Pallett
Also, for what it's worth, the same query (with both x and y conditions) takes about 30 ms with PostgreSQL and makes use of a btree index on the x and y columns. Clearly this isn't happening with MonetDB. Any ideas on how I can force this? Best regards, Dennis On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
does monet has btree index ?
On Tue, Jul 29, 2014 at 5:28 PM, Dennis Pallett
Also, for what it's worth, the same query (with both x and y conditions) takes about 30 ms with PostgreSQL and makes use of a btree index on the x and y columns. Clearly this isn't happening with MonetDB. Any ideas on how I can force this?
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 07/29/2014 04:35 PM, amihay gonen wrote:
does monet has btree index ? NO, it relies on hash structures, imprints (See SIGMOD 2013), and sorting regards, Martin
On Tue, Jul 29, 2014 at 5:28 PM, Dennis Pallett
mailto:dennis@pallett.nl> wrote: Also, for what it's worth, the same query (with both x and y conditions) takes about 30 ms with PostgreSQL and makes use of a btree index on the x and y columns. Clearly this isn't happening with MonetDB. Any ideas on how I can force this?
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Dennis If you compare the two traces you will notice that the fast one exploits the fact that the column is sorted on x. No further actions are required, so query is done. In the second query, although it is fast to access the x-candidates, that is not necessarily true for the corresponding y-candidates. To make it fast, first question to answer is you are running hot/cold queries. MonetDB may decide to build a hash index on y, the first time you use it in your session. So, what are the traces of a cold/hot run? If you use a b-tree with compound (x,y) key in Postgres you benefit from the multidimensional sort. regards, Martin On 07/29/2014 11:43 AM, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Martin, Thank you for your reply. I have attached a trace of a hot run of a query against my original table, which is *not* sorted on x or y (but on an unrelated column). All I'm really trying to achieve is a multi-dimensional range-search, currently in two dimensions (x,y) and in the future in more dimensions (e.g. x, y and time). Is this something that MonetDB is just not suitable for? Best regards, Dennis On 29-7-2014 16:39, Martin Kersten wrote:
Hi Dennis
If you compare the two traces you will notice that the fast one exploits the fact that the column is sorted on x. No further actions are required, so query is done.
In the second query, although it is fast to access the x-candidates, that is not necessarily true for the corresponding y-candidates. To make it fast, first question to answer is you are running hot/cold queries. MonetDB may decide to build a hash index on y, the first time you use it in your session.
So, what are the traces of a cold/hot run?
If you use a b-tree with compound (x,y) key in Postgres you benefit from the multidimensional sort.
regards, Martin
On 07/29/2014 11:43 AM, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi If you look at the trace you'll notice that the intermediate is half your database size; Both expressions should have been turned into a simple subselect. It might be an SQL optimizer issue. FYI the latest multidimensional table we played with had 640B x,y,z colum. regards, Martin On 07/30/2014 09:32 AM, Dennis Pallett wrote:
Hi Martin,
Thank you for your reply.
I have attached a trace of a hot run of a query against my original table, which is *not* sorted on x or y (but on an unrelated column). All I'm really trying to achieve is a multi-dimensional range-search, currently in two dimensions (x,y) and in the future in more dimensions (e.g. x, y and time). Is this something that MonetDB is just not suitable for?
Best regards, Dennis
On 29-7-2014 16:39, Martin Kersten wrote:
Hi Dennis
If you compare the two traces you will notice that the fast one exploits the fact that the column is sorted on x. No further actions are required, so query is done.
In the second query, although it is fast to access the x-candidates, that is not necessarily true for the corresponding y-candidates. To make it fast, first question to answer is you are running hot/cold queries. MonetDB may decide to build a hash index on y, the first time you use it in your session.
So, what are the traces of a cold/hot run?
If you use a b-tree with compound (x,y) key in Postgres you benefit from the multidimensional sort.
regards, Martin
On 07/29/2014 11:43 AM, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi all, Just wanted to provide an update to this thread. With help from Martin Kersten and one of his colleagues at CWI a fix has been added to the stable branch of MonetDB which has resulted in a better optimized query plan for my multi-range query. This has indeed improved the performance of my query somewhat but not as much as I would've liked. Which leads me to believe that perhaps MonetDB is not applying the LIMIT clause as expected. The range predicates of my query cover approximately 4+ million rows (about 1/3 of my total database) but I'm only interested in the first 100 rows, hence the LIMIT clause. Is it possible that MonetDB is first computing the full result set (i.e. 4+ million rows) and then only returning 100 rows? I've once again attached a trace of my (optimized) query. Best regards, Dennis On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
Hi Dennis,
MonetDB will first compute the entire result and then will print only the
first 100. This is because MonetDB execution model is not tuple-at-a-time
with a pipeline of operators, instead each operator will consume the entire
input before giving the result to the next operator. Therefore, you can not
possibly know how many values to select on the first column to produce
exactly 100 results after a join for example, becuase you dont know how
many values will actually join with the next column.
Hope this helps.
Lefteris
On Thu, Jul 31, 2014 at 10:47 AM, Dennis Pallett
Hi all,
Just wanted to provide an update to this thread.
With help from Martin Kersten and one of his colleagues at CWI a fix has been added to the stable branch of MonetDB which has resulted in a better optimized query plan for my multi-range query. This has indeed improved the performance of my query somewhat but not as much as I would've liked.
Which leads me to believe that perhaps MonetDB is not applying the LIMIT clause as expected. The range predicates of my query cover approximately 4+ million rows (about 1/3 of my total database) but I'm only interested in the first 100 rows, hence the LIMIT clause. Is it possible that MonetDB is first computing the full result set (i.e. 4+ million rows) and then only returning 100 rows?
I've once again attached a trace of my (optimized) query.
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Lefteris, Thank you for your fast reply. At least that explains why my query is still quite slow, since MonetDB is probably joining millions of records, even though I only want 100. Not sure how I'm going to solve that for now but at least I know exactly why it is so slow. Best regards, Dennis On 31-7-2014 11:25, Lefteris wrote:
Hi Dennis,
MonetDB will first compute the entire result and then will print only the first 100. This is because MonetDB execution model is not tuple-at-a-time with a pipeline of operators, instead each operator will consume the entire input before giving the result to the next operator. Therefore, you can not possibly know how many values to select on the first column to produce exactly 100 results after a join for example, becuase you dont know how many values will actually join with the next column.
Hope this helps.
Lefteris
On Thu, Jul 31, 2014 at 10:47 AM, Dennis Pallett
mailto:dennis@pallett.nl> wrote: Hi all,
Just wanted to provide an update to this thread.
With help from Martin Kersten and one of his colleagues at CWI a fix has been added to the stable branch of MonetDB which has resulted in a better optimized query plan for my multi-range query. This has indeed improved the performance of my query somewhat but not as much as I would've liked.
Which leads me to believe that perhaps MonetDB is not applying the LIMIT clause as expected. The range predicates of my query cover approximately 4+ million rows (about 1/3 of my total database) but I'm only interested in the first 100 rows, hence the LIMIT clause. Is it possible that MonetDB is first computing the full result set (i.e. 4+ million rows) and then only returning 100 rows?
I've once again attached a trace of my (optimized) query.
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Keep in mind, that LIMIT (and OFFSET, SAMPLE, etc) are pagination
operators. That is, they are used to define the presentation of a result of
a query and not to alter the evaluation. Therefore, although some DBMS
architectures make it easy to take advantage of such operators to reduce
computation, it is not "correct" to consider a query fast with limit and
slow without a limit. The query is what it is. If you need to make queries
that will run faster then for example you will have to increase the
selectivity, i.e., the SELECT operator is what is part of the query
evaluation and not part of the presentation of the result.
On Thu, Jul 31, 2014 at 2:08 PM, Dennis Pallett
Hi Lefteris,
Thank you for your fast reply. At least that explains why my query is still quite slow, since MonetDB is probably joining millions of records, even though I only want 100. Not sure how I'm going to solve that for now but at least I know exactly why it is so slow.
Best regards, Dennis
On 31-7-2014 11:25, Lefteris wrote:
Hi Dennis,
MonetDB will first compute the entire result and then will print only the first 100. This is because MonetDB execution model is not tuple-at-a-time with a pipeline of operators, instead each operator will consume the entire input before giving the result to the next operator. Therefore, you can not possibly know how many values to select on the first column to produce exactly 100 results after a join for example, becuase you dont know how many values will actually join with the next column.
Hope this helps.
Lefteris
On Thu, Jul 31, 2014 at 10:47 AM, Dennis Pallett
wrote: Hi all,
Just wanted to provide an update to this thread.
With help from Martin Kersten and one of his colleagues at CWI a fix has been added to the stable branch of MonetDB which has resulted in a better optimized query plan for my multi-range query. This has indeed improved the performance of my query somewhat but not as much as I would've liked.
Which leads me to believe that perhaps MonetDB is not applying the LIMIT clause as expected. The range predicates of my query cover approximately 4+ million rows (about 1/3 of my total database) but I'm only interested in the first 100 rows, hence the LIMIT clause. Is it possible that MonetDB is first computing the full result set (i.e. 4+ million rows) and then only returning 100 rows?
I've once again attached a trace of my (optimized) query.
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing listusers-list@monetdb.orghttps://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
I disagree completely. LIMIT indicates how many results you want. It is
silly to process more data than you have to, and the lack of intelligent
processing on large queries in a system _designed_ for large queries should
absolutely be considered a bug.
On Thu, Jul 31, 2014 at 9:48 AM, Lefteris
Keep in mind, that LIMIT (and OFFSET, SAMPLE, etc) are pagination operators. That is, they are used to define the presentation of a result of a query and not to alter the evaluation. Therefore, although some DBMS architectures make it easy to take advantage of such operators to reduce computation, it is not "correct" to consider a query fast with limit and slow without a limit. The query is what it is. If you need to make queries that will run faster then for example you will have to increase the selectivity, i.e., the SELECT operator is what is part of the query evaluation and not part of the presentation of the result.
On Thu, Jul 31, 2014 at 2:08 PM, Dennis Pallett
wrote: Hi Lefteris,
Thank you for your fast reply. At least that explains why my query is still quite slow, since MonetDB is probably joining millions of records, even though I only want 100. Not sure how I'm going to solve that for now but at least I know exactly why it is so slow.
Best regards, Dennis
On 31-7-2014 11:25, Lefteris wrote:
Hi Dennis,
MonetDB will first compute the entire result and then will print only the first 100. This is because MonetDB execution model is not tuple-at-a-time with a pipeline of operators, instead each operator will consume the entire input before giving the result to the next operator. Therefore, you can not possibly know how many values to select on the first column to produce exactly 100 results after a join for example, becuase you dont know how many values will actually join with the next column.
Hope this helps.
Lefteris
On Thu, Jul 31, 2014 at 10:47 AM, Dennis Pallett
wrote: Hi all,
Just wanted to provide an update to this thread.
With help from Martin Kersten and one of his colleagues at CWI a fix has been added to the stable branch of MonetDB which has resulted in a better optimized query plan for my multi-range query. This has indeed improved the performance of my query somewhat but not as much as I would've liked.
Which leads me to believe that perhaps MonetDB is not applying the LIMIT clause as expected. The range predicates of my query cover approximately 4+ million rows (about 1/3 of my total database) but I'm only interested in the first 100 rows, hence the LIMIT clause. Is it possible that MonetDB is first computing the full result set (i.e. 4+ million rows) and then only returning 100 rows?
I've once again attached a trace of my (optimized) query.
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing listusers-list@monetdb.orghttps://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
I hate repeating myself but I will do so...
1. LIMIT is a pagination operator
2. Different architectures may or may not by able to take advantage of such
information. In a tuple-at-a-time execution model this is easy --- stop
when you get exactly 100 results. In a vectorized execution model, this is
also possible but with some extra costs --- stop when you get exactly 100
results but you might have computed some not needed vectors (thousands of
values). In a columnar execution model, such as monetdb, that materializes
intermediate results, you compute the entire result. *It is an
architectural choice*
3. This not silly nor a bug certainly.
If you can find a SQL standard that says that LIMIT is not a pagination
operator, or for that matter tell me which 100 results should I compute in
case of a 100 LIMIT, then you will have a case. And don't tell me the first
100, because what is first if there is no SORT BY operator?
On Thu, Jul 31, 2014 at 6:02 PM, Christopher Nelson
I disagree completely. LIMIT indicates how many results you want. It is silly to process more data than you have to, and the lack of intelligent processing on large queries in a system _designed_ for large queries should absolutely be considered a bug.
On Thu, Jul 31, 2014 at 9:48 AM, Lefteris
wrote: Keep in mind, that LIMIT (and OFFSET, SAMPLE, etc) are pagination operators. That is, they are used to define the presentation of a result of a query and not to alter the evaluation. Therefore, although some DBMS architectures make it easy to take advantage of such operators to reduce computation, it is not "correct" to consider a query fast with limit and slow without a limit. The query is what it is. If you need to make queries that will run faster then for example you will have to increase the selectivity, i.e., the SELECT operator is what is part of the query evaluation and not part of the presentation of the result.
On Thu, Jul 31, 2014 at 2:08 PM, Dennis Pallett
wrote: Hi Lefteris,
Thank you for your fast reply. At least that explains why my query is still quite slow, since MonetDB is probably joining millions of records, even though I only want 100. Not sure how I'm going to solve that for now but at least I know exactly why it is so slow.
Best regards, Dennis
On 31-7-2014 11:25, Lefteris wrote:
Hi Dennis,
MonetDB will first compute the entire result and then will print only the first 100. This is because MonetDB execution model is not tuple-at-a-time with a pipeline of operators, instead each operator will consume the entire input before giving the result to the next operator. Therefore, you can not possibly know how many values to select on the first column to produce exactly 100 results after a join for example, becuase you dont know how many values will actually join with the next column.
Hope this helps.
Lefteris
On Thu, Jul 31, 2014 at 10:47 AM, Dennis Pallett
wrote: Hi all,
Just wanted to provide an update to this thread.
With help from Martin Kersten and one of his colleagues at CWI a fix has been added to the stable branch of MonetDB which has resulted in a better optimized query plan for my multi-range query. This has indeed improved the performance of my query somewhat but not as much as I would've liked.
Which leads me to believe that perhaps MonetDB is not applying the LIMIT clause as expected. The range predicates of my query cover approximately 4+ million rows (about 1/3 of my total database) but I'm only interested in the first 100 rows, hence the LIMIT clause. Is it possible that MonetDB is first computing the full result set (i.e. 4+ million rows) and then only returning 100 rows?
I've once again attached a trace of my (optimized) query.
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing listusers-list@monetdb.orghttps://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
1. So what.
2. Yes, it is an architectural choice. Without sort it doesn't matter which
100 items you return. You are going to pick some hundred anyway, so pick
some stable hundred. You might as well pick the 'first' hundred items you
process.
3. It is silly that a DBMS designed for large data sets doesn't take
advantage of all possible optimizations. Other columnar, vector, and
distributed systems do. I understand it is not be trivial. Hard work is
likely involved. But that doesn't make it not silly.
On Thu, Jul 31, 2014 at 12:27 PM, Lefteris
I hate repeating myself but I will do so...
1. LIMIT is a pagination operator 2. Different architectures may or may not by able to take advantage of such information. In a tuple-at-a-time execution model this is easy --- stop when you get exactly 100 results. In a vectorized execution model, this is also possible but with some extra costs --- stop when you get exactly 100 results but you might have computed some not needed vectors (thousands of values). In a columnar execution model, such as monetdb, that materializes intermediate results, you compute the entire result. *It is an architectural choice* 3. This not silly nor a bug certainly.
If you can find a SQL standard that says that LIMIT is not a pagination operator, or for that matter tell me which 100 results should I compute in case of a 100 LIMIT, then you will have a case. And don't tell me the first 100, because what is first if there is no SORT BY operator?
On Thu, Jul 31, 2014 at 6:02 PM, Christopher Nelson < nadiasvertex@gmail.com> wrote:
I disagree completely. LIMIT indicates how many results you want. It is silly to process more data than you have to, and the lack of intelligent processing on large queries in a system _designed_ for large queries should absolutely be considered a bug.
On Thu, Jul 31, 2014 at 9:48 AM, Lefteris
wrote: Keep in mind, that LIMIT (and OFFSET, SAMPLE, etc) are pagination operators. That is, they are used to define the presentation of a result of a query and not to alter the evaluation. Therefore, although some DBMS architectures make it easy to take advantage of such operators to reduce computation, it is not "correct" to consider a query fast with limit and slow without a limit. The query is what it is. If you need to make queries that will run faster then for example you will have to increase the selectivity, i.e., the SELECT operator is what is part of the query evaluation and not part of the presentation of the result.
On Thu, Jul 31, 2014 at 2:08 PM, Dennis Pallett
wrote: Hi Lefteris,
Thank you for your fast reply. At least that explains why my query is still quite slow, since MonetDB is probably joining millions of records, even though I only want 100. Not sure how I'm going to solve that for now but at least I know exactly why it is so slow.
Best regards, Dennis
On 31-7-2014 11:25, Lefteris wrote:
Hi Dennis,
MonetDB will first compute the entire result and then will print only the first 100. This is because MonetDB execution model is not tuple-at-a-time with a pipeline of operators, instead each operator will consume the entire input before giving the result to the next operator. Therefore, you can not possibly know how many values to select on the first column to produce exactly 100 results after a join for example, becuase you dont know how many values will actually join with the next column.
Hope this helps.
Lefteris
On Thu, Jul 31, 2014 at 10:47 AM, Dennis Pallett
wrote: Hi all,
Just wanted to provide an update to this thread.
With help from Martin Kersten and one of his colleagues at CWI a fix has been added to the stable branch of MonetDB which has resulted in a better optimized query plan for my multi-range query. This has indeed improved the performance of my query somewhat but not as much as I would've liked.
Which leads me to believe that perhaps MonetDB is not applying the LIMIT clause as expected. The range predicates of my query cover approximately 4+ million rows (about 1/3 of my total database) but I'm only interested in the first 100 rows, hence the LIMIT clause. Is it possible that MonetDB is first computing the full result set (i.e. 4+ million rows) and then only returning 100 rows?
I've once again attached a trace of my (optimized) query.
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
Hi all,
When I run the following query the results are computed extremely fast (within 5 ms):
SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 LIMIT 100;
However if I add additional conditions to the query so that it becomes the following:
SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 LIMIT 100;
The time it takes to compute the results is approximately 1000x bigger (i.e. 5 seconds). Clearly the additional conditions on the coordinates_y column is forcing MonetDB to take a different query strategy but I don't know how I can solve this. In Postgres I would make sure there is an index on the (coordinates_x, coordinates_y) column but this doesn't seem to have any effect with MonetDB.
I've attached traces of both queries. There are approximately 11 million rows in the table. Can anyone tell me why there is such a huge difference in query execution time and how I can prevent it?
Best regards, Dennis Pallett
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing listusers-list@monetdb.orghttps://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
"Without sort it doesn't matter which 100 items you return." what if there
are side-effects, triggers or updates on materialized views? what you are
suggesting here is a DBMS having random results with the same query,
because the 'first' hundred items you process may be different each time
(cached results, updates etc.).
Pagination operator is of much importance here but you are missing that...
What if I have the following query in *any* system
SELECT * FROM table ORDER BY table.k ASCENDING LIMIT 100
and k is not sorted nor indexed. How can you possible return the 100
smaller values of k if you dont first scan the entire k column and sort it?
Therefor a LIMIT will not and can not *guarantee* a faster execution time
of queries in *any* system, because.... it is a pagination operator! the
only guarantee is that it wont flood your screen with results...
On Thu, Jul 31, 2014 at 6:53 PM, Christopher Nelson
1. So what. 2. Yes, it is an architectural choice. Without sort it doesn't matter which 100 items you return. You are going to pick some hundred anyway, so pick some stable hundred. You might as well pick the 'first' hundred items you process. 3. It is silly that a DBMS designed for large data sets doesn't take advantage of all possible optimizations. Other columnar, vector, and distributed systems do. I understand it is not be trivial. Hard work is likely involved. But that doesn't make it not silly.
On Thu, Jul 31, 2014 at 12:27 PM, Lefteris
wrote: I hate repeating myself but I will do so...
1. LIMIT is a pagination operator 2. Different architectures may or may not by able to take advantage of such information. In a tuple-at-a-time execution model this is easy --- stop when you get exactly 100 results. In a vectorized execution model, this is also possible but with some extra costs --- stop when you get exactly 100 results but you might have computed some not needed vectors (thousands of values). In a columnar execution model, such as monetdb, that materializes intermediate results, you compute the entire result. *It is an architectural choice* 3. This not silly nor a bug certainly.
If you can find a SQL standard that says that LIMIT is not a pagination operator, or for that matter tell me which 100 results should I compute in case of a 100 LIMIT, then you will have a case. And don't tell me the first 100, because what is first if there is no SORT BY operator?
On Thu, Jul 31, 2014 at 6:02 PM, Christopher Nelson < nadiasvertex@gmail.com> wrote:
I disagree completely. LIMIT indicates how many results you want. It is silly to process more data than you have to, and the lack of intelligent processing on large queries in a system _designed_ for large queries should absolutely be considered a bug.
On Thu, Jul 31, 2014 at 9:48 AM, Lefteris
wrote: Keep in mind, that LIMIT (and OFFSET, SAMPLE, etc) are pagination operators. That is, they are used to define the presentation of a result of a query and not to alter the evaluation. Therefore, although some DBMS architectures make it easy to take advantage of such operators to reduce computation, it is not "correct" to consider a query fast with limit and slow without a limit. The query is what it is. If you need to make queries that will run faster then for example you will have to increase the selectivity, i.e., the SELECT operator is what is part of the query evaluation and not part of the presentation of the result.
On Thu, Jul 31, 2014 at 2:08 PM, Dennis Pallett
wrote: Hi Lefteris,
Thank you for your fast reply. At least that explains why my query is still quite slow, since MonetDB is probably joining millions of records, even though I only want 100. Not sure how I'm going to solve that for now but at least I know exactly why it is so slow.
Best regards, Dennis
On 31-7-2014 11:25, Lefteris wrote:
Hi Dennis,
MonetDB will first compute the entire result and then will print only the first 100. This is because MonetDB execution model is not tuple-at-a-time with a pipeline of operators, instead each operator will consume the entire input before giving the result to the next operator. Therefore, you can not possibly know how many values to select on the first column to produce exactly 100 results after a join for example, becuase you dont know how many values will actually join with the next column.
Hope this helps.
Lefteris
On Thu, Jul 31, 2014 at 10:47 AM, Dennis Pallett
wrote: Hi all,
Just wanted to provide an update to this thread.
With help from Martin Kersten and one of his colleagues at CWI a fix has been added to the stable branch of MonetDB which has resulted in a better optimized query plan for my multi-range query. This has indeed improved the performance of my query somewhat but not as much as I would've liked.
Which leads me to believe that perhaps MonetDB is not applying the LIMIT clause as expected. The range predicates of my query cover approximately 4+ million rows (about 1/3 of my total database) but I'm only interested in the first 100 rows, hence the LIMIT clause. Is it possible that MonetDB is first computing the full result set (i.e. 4+ million rows) and then only returning 100 rows?
I've once again attached a trace of my (optimized) query.
Best regards, Dennis
On 29-7-2014 11:43, Dennis Pallett wrote:
> Hi all, > > When I run the following query the results are computed extremely > fast (within 5 ms): > > SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted > WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 > LIMIT 100; > > However if I add additional conditions to the query so that it > becomes the following: > > SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted > WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 > AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 > LIMIT 100; > > The time it takes to compute the results is approximately 1000x > bigger (i.e. 5 seconds). Clearly the additional conditions on the > coordinates_y column is forcing MonetDB to take a different query strategy > but I don't know how I can solve this. In Postgres I would make sure there > is an index on the (coordinates_x, coordinates_y) column but this doesn't > seem to have any effect with MonetDB. > > I've attached traces of both queries. There are approximately 11 > million rows in the table. Can anyone tell me why there is such a huge > difference in query execution time and how I can prevent it? > > Best regards, > Dennis Pallett >
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing listusers-list@monetdb.orghttps://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Please note that I have nothing but respect for MonetDB.
LIMIT is not magic. And you are correct that there are situations where you
can't do much to help yourself with it. Triggers aren't one of them because
there are no select triggers. LIMIT is only allowed on SELECT. But
situations do exist where LIMIT can't be used to optimize.
In a columnar database there's no reason to ever store a column unsorted.
As documented here:
https://www.monetdb.org/Documentation/Manuals/MonetDB/Architecture/Storagemo...
MonetDB takes advantage of this obvious optimization. There is no need to
ever perform a column scan, and it is always trivial to find the top N
columns which satisfy some criteria and order. Extending this to joins is
not trivial, but can and has been done. It is an obvious optimization
target for a database designed for large data sets.
Your last assertion about LIMIT not guaranteeing faster results in any
system is demonstrably incorrect. Both Vertica and PostgreSQL use LIMIT in
the optimizer, and if an order is specified it can significantly improve
the performance of the query. When ORDER BY is not specified it is
explicitly called out that the results are unpredictable. That's a
different discussion entirely, though.
On Thu, Jul 31, 2014 at 1:13 PM, Lefteris
"Without sort it doesn't matter which 100 items you return." what if there are side-effects, triggers or updates on materialized views? what you are suggesting here is a DBMS having random results with the same query, because the 'first' hundred items you process may be different each time (cached results, updates etc.).
Pagination operator is of much importance here but you are missing that... What if I have the following query in *any* system
SELECT * FROM table ORDER BY table.k ASCENDING LIMIT 100
and k is not sorted nor indexed. How can you possible return the 100 smaller values of k if you dont first scan the entire k column and sort it? Therefor a LIMIT will not and can not *guarantee* a faster execution time of queries in *any* system, because.... it is a pagination operator! the only guarantee is that it wont flood your screen with results...
On Thu, Jul 31, 2014 at 6:53 PM, Christopher Nelson < nadiasvertex@gmail.com> wrote:
1. So what. 2. Yes, it is an architectural choice. Without sort it doesn't matter which 100 items you return. You are going to pick some hundred anyway, so pick some stable hundred. You might as well pick the 'first' hundred items you process. 3. It is silly that a DBMS designed for large data sets doesn't take advantage of all possible optimizations. Other columnar, vector, and distributed systems do. I understand it is not be trivial. Hard work is likely involved. But that doesn't make it not silly.
On Thu, Jul 31, 2014 at 12:27 PM, Lefteris
wrote: I hate repeating myself but I will do so...
1. LIMIT is a pagination operator 2. Different architectures may or may not by able to take advantage of such information. In a tuple-at-a-time execution model this is easy --- stop when you get exactly 100 results. In a vectorized execution model, this is also possible but with some extra costs --- stop when you get exactly 100 results but you might have computed some not needed vectors (thousands of values). In a columnar execution model, such as monetdb, that materializes intermediate results, you compute the entire result. *It is an architectural choice* 3. This not silly nor a bug certainly.
If you can find a SQL standard that says that LIMIT is not a pagination operator, or for that matter tell me which 100 results should I compute in case of a 100 LIMIT, then you will have a case. And don't tell me the first 100, because what is first if there is no SORT BY operator?
On Thu, Jul 31, 2014 at 6:02 PM, Christopher Nelson < nadiasvertex@gmail.com> wrote:
I disagree completely. LIMIT indicates how many results you want. It is silly to process more data than you have to, and the lack of intelligent processing on large queries in a system _designed_ for large queries should absolutely be considered a bug.
On Thu, Jul 31, 2014 at 9:48 AM, Lefteris
wrote: Keep in mind, that LIMIT (and OFFSET, SAMPLE, etc) are pagination operators. That is, they are used to define the presentation of a result of a query and not to alter the evaluation. Therefore, although some DBMS architectures make it easy to take advantage of such operators to reduce computation, it is not "correct" to consider a query fast with limit and slow without a limit. The query is what it is. If you need to make queries that will run faster then for example you will have to increase the selectivity, i.e., the SELECT operator is what is part of the query evaluation and not part of the presentation of the result.
On Thu, Jul 31, 2014 at 2:08 PM, Dennis Pallett
wrote: Hi Lefteris,
Thank you for your fast reply. At least that explains why my query is still quite slow, since MonetDB is probably joining millions of records, even though I only want 100. Not sure how I'm going to solve that for now but at least I know exactly why it is so slow.
Best regards, Dennis
On 31-7-2014 11:25, Lefteris wrote:
Hi Dennis,
MonetDB will first compute the entire result and then will print only the first 100. This is because MonetDB execution model is not tuple-at-a-time with a pipeline of operators, instead each operator will consume the entire input before giving the result to the next operator. Therefore, you can not possibly know how many values to select on the first column to produce exactly 100 results after a join for example, becuase you dont know how many values will actually join with the next column.
Hope this helps.
Lefteris
On Thu, Jul 31, 2014 at 10:47 AM, Dennis Pallett
wrote: > Hi all, > > Just wanted to provide an update to this thread. > > With help from Martin Kersten and one of his colleagues at CWI a fix > has been added to the stable branch of MonetDB which has resulted in a > better optimized query plan for my multi-range query. This has indeed > improved the performance of my query somewhat but not as much as I would've > liked. > > Which leads me to believe that perhaps MonetDB is not applying the > LIMIT clause as expected. The range predicates of my query cover > approximately 4+ million rows (about 1/3 of my total database) but I'm only > interested in the first 100 rows, hence the LIMIT clause. Is it possible > that MonetDB is first computing the full result set (i.e. 4+ million rows) > and then only returning 100 rows? > > I've once again attached a trace of my (optimized) query. > > Best regards, > Dennis > > > On 29-7-2014 11:43, Dennis Pallett wrote: > >> Hi all, >> >> When I run the following query the results are computed extremely >> fast (within 5 ms): >> >> SELECT id_str, len,coordinates_x,coordinates_y FROM uk_neogeo_sorted >> WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 >> LIMIT 100; >> >> However if I add additional conditions to the query so that it >> becomes the following: >> >> SELECT id_str,len,coordinates_x,coordinates_y FROM uk_neogeo_sorted >> WHERE coordinates_x >= 0.0 AND coordinates_x <= 22.499999996867977 >> AND coordinates_y >= 0.0 AND coordinates_y <= 61.60639636617352 >> LIMIT 100; >> >> The time it takes to compute the results is approximately 1000x >> bigger (i.e. 5 seconds). Clearly the additional conditions on the >> coordinates_y column is forcing MonetDB to take a different query strategy >> but I don't know how I can solve this. In Postgres I would make sure there >> is an index on the (coordinates_x, coordinates_y) column but this doesn't >> seem to have any effect with MonetDB. >> >> I've attached traces of both queries. There are approximately 11 >> million rows in the table. Can anyone tell me why there is such a huge >> difference in query execution time and how I can prevent it? >> >> Best regards, >> Dennis Pallett >> > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > >
_______________________________________________ users-list mailing listusers-list@monetdb.orghttps://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (5)
-
amihay gonen
-
Christopher Nelson
-
Dennis Pallett
-
Lefteris
-
Martin Kersten