[MonetDB-users] Query time takes equally long on whole and just small part of table
Hello, I'm new to monetdb and installed it two weeks ago to do some tests. I was very much impressed with the speed on large tables, but in some of my latest tests regarding subparts of these tables based on a unix timestamp the time was the same as querying the whole table. Does anyone know what I can do to get higher speeds on queries in monetdb that only use part of the table, like partition pruning, explicit foreign keys or unique statements, indexes or some other way? I have the following two tables and number of records: CREATE TABLE table1 (T1vid INT NOT NULL, T1Field1 VARCHAR(20) NOT NULL, T1Field2 TINYINT NOT NULL); This table has about 400.000 records. The records are non time bounded, it's just a list. The T1vid is an increased ID, but not generated in momentdb at the moment, but in mysql CREATE TABLE table2 (T2mid INT NOT NULL, T2vid INT NOT NULL, T2timestamp INT NOT NULL, T2Field1 INT NOT NULL, T2Field2 INT NOT NULL); This table has about 1.000.000 records and a unix_timestamp field. It get a few new records every second and records older then 24 hours are deleted. The T2mid-field refers to some other table that's not in momentdb's database, so just an integer. The T2vid-field however refers to the T1vid-field in table1 (a foreign key, but i didn't define it that way). The following query takes about 3,5 seconds, it doesnt't have the timestamp included in the WHERE-clause: SELECT SUM(t2.T2Field2), t1.T1Field1, t1.T1Field2, (t2.T2timestamp / 3600) as interval FROM table2 AS t2 LEFT JOIN table 1 AS t1 ON t1.T1vid=t2.T2vid WHERE t2.T2Field1=8 AND t2.T2vid IN (*list of 15 unique v.T1vid ID's*) GROUP BY interval, t1.T1Field1, t1.T1Field2 ORDER BY interval DESC; Resulting is a list of the 15 v.T1vid ID's times the number of interval's. t2.T2timestamp is now devided by 3600 and grouped on this result, so each set of 15 v.T1vid ID's would reflect 1 hour, returning max 24x15 rows. ** The following query also takes about 3,5 seconds, but it does have the timestamp included in the WHERE-clause. It only needs to access 1 hour of data instead of the whole 24 hours, and groups it into intervals of 5 minutes (300 sec): SELECT SUM(t2.T2Field2), t1.T1Field1, t1.T1Field2, (t2.T2timestamp / 300) as interval FROM table2 AS t2 LEFT JOIN table 1 AS t1 ON t1.T1vid=t2.T2vid WHERE* t2.T2timestamp BETWEEN 1298360000 AND 1298363600* AND t2.T2Field1=8 AND t2.T2vid IN (*list of 15 unique v.T1vid ID's*) GROUP BY interval, t1.T1Field1, t1.T1Field2 ORDER BY interval DESC; t2.T2timestamp is now devided by 300, so each set of 15 v.T1vid ID's would reflect 5 minuntes, returning max 12x15 rows. I would very much appreciate any help or hints; please let me know any question you might have. Kind regards, Rob Berentsen
Hi Rob:
I am no technical expert on MonetDB. But, we have been using MonetDB for
more than a year now. So, I can give you couple of suggestions
(or workarounds) that might help based on our experience. We are using a
third-party application that generates queries and sends them to MonetDB
which has a transaction table with more than 1.6billion records and growing
(timeseries data like yours..) linked to dimensional tables. The kinds of
queries that are being generated are almost exact type that you are showing,
some sort of an aggregation query using group by and filtered with where
clause.
One thing that we noticed with queries with joins is that inner join seems
to be significantly faster than left/right joins even if the results are no
different between the 3 types of joins. I know that might not be an option
for you. But, it works for our use case.
Another thing that we noticed is if you add constraints, ie primary and
foreign keys, you will get better performance. I think you might already
have them in your tables though....
We are in the process of testing Oct-2010SP1 release so we can upgrade
our Nov-2009SP2 environment. We are seeing significant performance
improvement in most of our typical queries and much smaller memory footprint
with the Oct-2010SP1 release. But, we see some performance degradation with
queries based on views instead of tables. You might want to try the Nov-2009
optimizer and see if that helps. You can do this by specifyng the Nov-2009
optimizer in the monetdb5.conf file.
Hope that helps!
Best regards,
Henry
On Wed, Feb 23, 2011 at 4:24 AM, Rob Berentsen
Hello,
I'm new to monetdb and installed it two weeks ago to do some tests. I was very much impressed with the speed on large tables, but in some of my latest tests regarding subparts of these tables based on a unix timestamp the time was the same as querying the whole table. Does anyone know what I can do to get higher speeds on queries in monetdb that only use part of the table, like partition pruning, explicit foreign keys or unique statements, indexes or some other way?
I have the following two tables and number of records:
CREATE TABLE table1 (T1vid INT NOT NULL, T1Field1 VARCHAR(20) NOT NULL, T1Field2 TINYINT NOT NULL); This table has about 400.000 records. The records are non time bounded, it's just a list. The T1vid is an increased ID, but not generated in momentdb at the moment, but in mysql
CREATE TABLE table2 (T2mid INT NOT NULL, T2vid INT NOT NULL, T2timestamp INT NOT NULL, T2Field1 INT NOT NULL, T2Field2 INT NOT NULL); This table has about 1.000.000 records and a unix_timestamp field. It get a few new records every second and records older then 24 hours are deleted. The T2mid-field refers to some other table that's not in momentdb's database, so just an integer. The T2vid-field however refers to the T1vid-field in table1 (a foreign key, but i didn't define it that way).
The following query takes about 3,5 seconds, it doesnt't have the timestamp included in the WHERE-clause: SELECT SUM(t2.T2Field2), t1.T1Field1, t1.T1Field2, (t2.T2timestamp / 3600) as interval FROM table2 AS t2 LEFT JOIN table 1 AS t1 ON t1.T1vid=t2.T2vid WHERE t2.T2Field1=8 AND t2.T2vid IN (*list of 15 unique v.T1vid ID's*) GROUP BY interval, t1.T1Field1, t1.T1Field2 ORDER BY interval DESC; Resulting is a list of the 15 v.T1vid ID's times the number of interval's. t2.T2timestamp is now devided by 3600 and grouped on this result, so each set of 15 v.T1vid ID's would reflect 1 hour, returning max 24x15 rows. ** The following query also takes about 3,5 seconds, but it does have the timestamp included in the WHERE-clause. It only needs to access 1 hour of data instead of the whole 24 hours, and groups it into intervals of 5 minutes (300 sec): SELECT SUM(t2.T2Field2), t1.T1Field1, t1.T1Field2, (t2.T2timestamp / 300) as interval FROM table2 AS t2 LEFT JOIN table 1 AS t1 ON t1.T1vid=t2.T2vid WHERE* t2.T2timestamp BETWEEN 1298360000 AND 1298363600* AND t2.T2Field1=8 AND t2.T2vid IN (*list of 15 unique v.T1vid ID's*) GROUP BY interval, t1.T1Field1, t1.T1Field2 ORDER BY interval DESC;
t2.T2timestamp is now devided by 300, so each set of 15 v.T1vid ID's would reflect 5 minuntes, returning max 12x15 rows.
I would very much appreciate any help or hints; please let me know any question you might have.
Kind regards, Rob Berentsen
------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hello Henry,
Thank you for your reply! See inline.
On Thu, Feb 24, 2011 at 7:54 AM, Henry Addington
I am no technical expert on MonetDB. But, we have been using MonetDB for more than a year now. So, I can give you couple of suggestions (or workarounds) that might help based on our experience. We are using a third-party application that generates queries and sends them to MonetDB which has a transaction table with more than 1.6billion records and growing (timeseries data like yours..) linked to dimensional tables. The kinds of queries that are being generated are almost exact type that you are showing, some sort of an aggregation query using group by and filtered with where clause.
One thing that we noticed with queries with joins is that inner join seems to be significantly faster than left/right joins even if the results are no different between the 3 types of joins. I know that might not be an option for you. But, it works for our use case.
I've changed my queries into two separate ones: one for the total timescope, and one for (occasionally) getting the results spread over the interval for one item. The new total timescope query is faster on less time as scope, and as fast as before on the total 24 hour scope, so definitely an improvement for me.
Another thing that we noticed is if you add constraints, ie primary and foreign keys, you will get better performance. I think you might already have them in your tables though....
I do have them, but i didn't add them in my create statement. I''m also going to try if this adds extra speed.
We are in the process of testing Oct-2010SP1 release so we can upgrade our Nov-2009SP2 environment. We are seeing significant performance improvement in most of our typical queries and much smaller memory footprint with the Oct-2010SP1 release. But, we see some performance degradation with queries based on views instead of tables. You might want to try the Nov-2009 optimizer and see if that helps. You can do this by specifyng the Nov-2009 optimizer in the monetdb5.conf file.
This didn't show any difference on the original query, but thanks for mentioning. I might need it in the future. Kind regards, Rob Berentsen
On Wed, Feb 23, 2011 at 4:24 AM, Rob Berentsen
wrote: Hello,
I'm new to monetdb and installed it two weeks ago to do some tests. I was very much impressed with the speed on large tables, but in some of my latest tests regarding subparts of these tables based on a unix timestamp the time was the same as querying the whole table. Does anyone know what I can do to get higher speeds on queries in monetdb that only use part of the table, like partition pruning, explicit foreign keys or unique statements, indexes or some other way?
I have the following two tables and number of records:
CREATE TABLE table1 (T1vid INT NOT NULL, T1Field1 VARCHAR(20) NOT NULL, T1Field2 TINYINT NOT NULL); This table has about 400.000 records. The records are non time bounded, it's just a list. The T1vid is an increased ID, but not generated in momentdb at the moment, but in mysql
CREATE TABLE table2 (T2mid INT NOT NULL, T2vid INT NOT NULL, T2timestamp INT NOT NULL, T2Field1 INT NOT NULL, T2Field2 INT NOT NULL); This table has about 1.000.000 records and a unix_timestamp field. It get a few new records every second and records older then 24 hours are deleted. The T2mid-field refers to some other table that's not in momentdb's database, so just an integer. The T2vid-field however refers to the T1vid-field in table1 (a foreign key, but i didn't define it that way).
The following query takes about 3,5 seconds, it doesnt't have the timestamp included in the WHERE-clause: SELECT SUM(t2.T2Field2), t1.T1Field1, t1.T1Field2, (t2.T2timestamp / 3600) as interval FROM table2 AS t2 LEFT JOIN table 1 AS t1 ON t1.T1vid=t2.T2vid WHERE t2.T2Field1=8 AND t2.T2vid IN (*list of 15 unique v.T1vid ID's*) GROUP BY interval, t1.T1Field1, t1.T1Field2 ORDER BY interval DESC; Resulting is a list of the 15 v.T1vid ID's times the number of interval's. t2.T2timestamp is now devided by 3600 and grouped on this result, so each set of 15 v.T1vid ID's would reflect 1 hour, returning max 24x15 rows. ** The following query also takes about 3,5 seconds, but it does have the timestamp included in the WHERE-clause. It only needs to access 1 hour of data instead of the whole 24 hours, and groups it into intervals of 5 minutes (300 sec): SELECT SUM(t2.T2Field2), t1.T1Field1, t1.T1Field2, (t2.T2timestamp / 300) as interval FROM table2 AS t2 LEFT JOIN table 1 AS t1 ON t1.T1vid=t2.T2vid WHERE* t2.T2timestamp BETWEEN 1298360000 AND 1298363600* AND t2.T2Field1=8 AND t2.T2vid IN (*list of 15 unique v.T1vid ID's*) GROUP BY interval, t1.T1Field1, t1.T1Field2 ORDER BY interval DESC;
t2.T2timestamp is now devided by 300, so each set of 15 v.T1vid ID's would reflect 5 minuntes, returning max 12x15 rows.
I would very much appreciate any help or hints; please let me know any question you might have.
Kind regards, Rob Berentsen
------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
Henry Addington
-
Rob Berentsen