MonetDB Evaluation - Excessive slowness in a particular query
Hi all, we're testing MonetDB performances in order to understand if it could be possible to use it as data mart in our BI software. We're considering a set of queries executed in our test environment (CentOS 7, 32 GB RAM, 4 cpu). All the executed queries have quite good performances (between 10 and 45 seconds) except the following one that is taking several minutes in order to be executed: SELECT id, count(distinct value) FROM table GROUP BY id; The table contains 100 millions of data. It has about 130 columns (varchar, int, bigint, smallint, tinyint, timestamp, datetime). We have no BLOB/CLOB/Text types. The fields used in the query have these data types: id: int value: bigint Can you suggest us any optimizations in order to have this query run faster? Do you know some possible reasons why it's so slow? Kind Regards, Alfio Bettini The information contained in this message, together with any attachments, may be privileged and confidential and is intended for the use of the addressee(s) only. Any review, copying, use or distribution of this email by others (including any of its attachments) is strictly prohibited. If you are not the intended recipient, please destroy this email and its attachments and contact the sender immediately. This email was sent by EidosMedia SpAhttp://www.eidosmedia.com
Hi, I'm afraid there is not much to optimize on the user side for a simple query like yours, other than having your data sorted by id,value . Also, with only two columns involved --- with ~100 million records their sizes are ~400 MB and ~800 MB --- all data easily fits in the 32 GB of your machine. To get more insight on what's going on, could you profile your query by prefixing it with "TRACE", i.e., TRACE SELECT id, count(distinct value) FROM table GROUP BY id; and share the resulting performance trace? Also could you share the following statistics: select count(*) from table; select count(id) from table; select count(distinct id) from table; select count(value) from table; select count(distinct value) from table; select count(*) from ( select id,value from table group by id,value ) as x; Best, Stefan ----- On Dec 14, 2017, at 10:59 AM, Alfio Bettini alfio.bettini@eidosmedia.com wrote:
Hi all,
we're testing MonetDB performances in order to understand if it could be possible to use it as data mart in our BI software.
We're considering a set of queries executed in our test environment (CentOS 7, 32 GB RAM, 4 cpu).
All the executed queries have quite good performances (between 10 and 45 seconds) except the following one that is taking several minutes in order to be executed:
SELECT id, count(distinct value)
FROM table
GROUP BY id;
The table contains 100 millions of data. It has about 130 columns (varchar, int, bigint, smallint, tinyint, timestamp, datetime). We have no BLOB/CLOB/Text types.
The fields used in the query have these data types:
id: int
value: bigint
Can you suggest us any optimizations in order to have this query run faster? Do you know some possible reasons why it's so slow?
Kind Regards,
Alfio Bettini
The information contained in this message, together with any attachments, may be privileged and confidential and is intended for the use of the addressee(s) only. Any review, copying, use or distribution of this email by others (including any of its attachments) is strictly prohibited. If you are not the intended recipient, please destroy this email and its attachments and contact the sender immediately. This email was sent by EidosMedia SpA
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi all,
I reopen this thread about excessive slowness in query "SELECT id, count(distinct value) FROM table1 GROUP BY id;" (you can find more info in my mail below)
About your suggestion (thanks Stefan!), I can provide the below info.
The required statistics are the following ones:
select count(*) from table: 92.912.272 items
select count(id) from table: 92.912.272 items
select count(distinct id) from table: 63 items
select count(value) from table: 92.912.272 items
select count(distinct value) from table: 86.964.522 items
select count(*) from (select id,value from table group by id,value) as x: 86.964.522 items
The result of the TRACE command is:
sql>TRACE SELECT id, count(distinct value) FROM table1 GROUP BY id;
+--------------------------+---------+
| id | L3 |
+==========================+=========+
| 20160906 | 779489 |
| 20160907 | 1616739 |
| 20160908 | 1769753 |
| 20160909 | 1764169 |
| 20160910 | 668824 |
| 20160911 | 716343 |
| 20160912 | 1685183 |
| 20160913 | 1619177 |
| 20160914 | 1630866 |
| 20160915 | 1831826 |
| 20160916 | 1717702 |
| 20160917 | 638397 |
| 20160918 | 693979 |
| 20160919 | 1635602 |
| 20160920 | 1595859 |
| 20160921 | 1650712 |
| 20160922 | 1777228 |
| 20160923 | 1783591 |
| 20160924 | 652866 |
| 20160925 | 722674 |
| 20160926 | 1641260 |
| 20160927 | 1673879 |
| 20160928 | 1689016 |
| 20160929 | 1865369 |
| 20160930 | 1791773 |
| 20161001 | 654382 |
| 20161002 | 731022 |
| 20161003 | 1302974 |
| 20161004 | 1593157 |
| 20161005 | 1570264 |
| 20161006 | 1750299 |
| 20161007 | 1788563 |
| 20161008 | 590823 |
| 20161009 | 700178 |
| 20161010 | 1649577 |
| 20161011 | 1602492 |
| 20161012 | 1639719 |
| 20161013 | 1759836 |
| 20161014 | 1732986 |
| 20161015 | 641682 |
| 20161016 | 672841 |
| 20161017 | 1652638 |
| 20161018 | 1631122 |
| 20161019 | 1646245 |
| 20161020 | 1831482 |
| 20161021 | 1836734 |
| 20161022 | 618921 |
| 20161023 | 687258 |
| 20161024 | 1659389 |
| 20161025 | 1743597 |
| 20161026 | 1709957 |
| 20161027 | 1774661 |
| 20161028 | 1796815 |
| 20161029 | 665849 |
| 20161030 | 726861 |
| 20161031 | 1686187 |
| 20161101 | 1694782 |
| 20161102 | 1582901 |
| 20161103 | 1761407 |
| 20161104 | 1815731 |
| 20161105 | 629033 |
| 20161106 | 684745 |
| 20161107 | 1135136 |
+--------------------------+---------+
63 tuples (5m 25s)
+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| usec | statement |
+==========+============================================================================================================================================================================================================================+
| 67 | X_1=0@0:void := querylog.define("trace select id, count(distinct value) from table1 group by id;":str, "default_pipe":str, 38:int); |
| 218 | X_40=
Hi all,
we're testing MonetDB performances in order to understand if it could be possible to use it as data mart in our BI software.
We're considering a set of queries executed in our test environment (CentOS 7, 32 GB RAM, 4 cpu).
All the executed queries have quite good performances (between 10 and 45 seconds) except the following one that is taking several minutes in order to be executed:
SELECT id, count(distinct value)
FROM table
GROUP BY id;
The table contains 100 millions of data. It has about 130 columns (varchar, int, bigint, smallint, tinyint, timestamp, datetime). We have no BLOB/CLOB/Text types.
The fields used in the query have these data types:
id: int
value: bigint
Can you suggest us any optimizations in order to have this query run faster? Do you know some possible reasons why it's so slow?
Kind Regards,
Alfio Bettini
The information contained in this message, together with any attachments, may be privileged and confidential and is intended for the use of the addressee(s) only. Any review, copying, use or distribution of this email by others (including any of its attachments) is strictly prohibited. If you are not the intended recipient, please destroy this email and its attachments and contact the sender immediately. This email was sent by EidosMedia SpA
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list The information contained in this message, together with any attachments, may be privileged and confidential and is intended for the use of the addressee(s) only. Any review, copying, use or distribution of this email by others (including any of its attachments) is strictly prohibited. If you are not the intended recipient, please destroy this email and its attachments and contact the sender immediately. This email was sent by EidosMedia SpAhttp://www.eidosmedia.com
Hi all,
I reopen this thread about excessive slowness in query "SELECT id, count(distinct value) FROM table1 GROUP BY id;" (you can find more info in my mail below)
About your suggestion (thanks Stefan!), I can provide the below info.
The required statistics are the following ones:
select count(*) from table: 92.912.272 items
select count(id) from table: 92.912.272 items
select count(distinct id) from table: 63 items
select count(value) from table: 92.912.272 items
select count(distinct value) from table: 86.964.522 items
select count(*) from (select id,value from table group by id,value) as x: 86.964.522 items
The result of the TRACE command is:
sql>TRACE SELECT id, count(distinct value) FROM table1 GROUP BY id;
+--------------------------+---------+
| id | L3 |
+==========================+=========+
| 20160906 | 779489 |
| 20160907 | 1616739 |
| 20160908 | 1769753 |
| 20160909 | 1764169 |
| 20160910 | 668824 |
| 20160911 | 716343 |
| 20160912 | 1685183 |
| 20160913 | 1619177 |
| 20160914 | 1630866 |
| 20160915 | 1831826 |
| 20160916 | 1717702 |
| 20160917 | 638397 |
| 20160918 | 693979 |
| 20160919 | 1635602 |
| 20160920 | 1595859 |
| 20160921 | 1650712 |
| 20160922 | 1777228 |
| 20160923 | 1783591 |
| 20160924 | 652866 |
| 20160925 | 722674 |
| 20160926 | 1641260 |
| 20160927 | 1673879 |
| 20160928 | 1689016 |
| 20160929 | 1865369 |
| 20160930 | 1791773 |
| 20161001 | 654382 |
| 20161002 | 731022 |
| 20161003 | 1302974 |
| 20161004 | 1593157 |
| 20161005 | 1570264 |
| 20161006 | 1750299 |
| 20161007 | 1788563 |
| 20161008 | 590823 |
| 20161009 | 700178 |
| 20161010 | 1649577 |
| 20161011 | 1602492 |
| 20161012 | 1639719 |
| 20161013 | 1759836 |
| 20161014 | 1732986 |
| 20161015 | 641682 |
| 20161016 | 672841 |
| 20161017 | 1652638 |
| 20161018 | 1631122 |
| 20161019 | 1646245 |
| 20161020 | 1831482 |
| 20161021 | 1836734 |
| 20161022 | 618921 |
| 20161023 | 687258 |
| 20161024 | 1659389 |
| 20161025 | 1743597 |
| 20161026 | 1709957 |
| 20161027 | 1774661 |
| 20161028 | 1796815 |
| 20161029 | 665849 |
| 20161030 | 726861 |
| 20161031 | 1686187 |
| 20161101 | 1694782 |
| 20161102 | 1582901 |
| 20161103 | 1761407 |
| 20161104 | 1815731 |
| 20161105 | 629033 |
| 20161106 | 684745 |
| 20161107 | 1135136 |
+--------------------------+---------+
63 tuples (5m 25s)
+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| usec | statement |
+==========+============================================================================================================================================================================================================================+
| 67 | X_1=0@0:void := querylog.define("trace select id, count(distinct value) from table1 group by id;":str, "default_pipe":str, 38:int); |
| 218 | X_40=
Hi all,
we're testing MonetDB performances in order to understand if it could be possible to use it as data mart in our BI software.
We're considering a set of queries executed in our test environment (CentOS 7, 32 GB RAM, 4 cpu).
All the executed queries have quite good performances (between 10 and 45 seconds) except the following one that is taking several minutes in order to be executed:
SELECT id, count(distinct value)
FROM table
GROUP BY id;
The table contains 100 millions of data. It has about 130 columns (varchar, int, bigint, smallint, tinyint, timestamp, datetime). We have no BLOB/CLOB/Text types.
The fields used in the query have these data types:
id: int
value: bigint
Can you suggest us any optimizations in order to have this query run faster? Do you know some possible reasons why it's so slow?
Kind Regards,
Alfio Bettini
The information contained in this message, together with any attachments, may be privileged and confidential and is intended for the use of the addressee(s) only. Any review, copying, use or distribution of this email by others (including any of its attachments) is strictly prohibited. If you are not the intended recipient, please destroy this email and its attachments and contact the sender immediately. This email was sent by EidosMedia SpA
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list The information contained in this message, together with any attachments, may be privileged and confidential and is intended for the use of the addressee(s) only. Any review, copying, use or distribution of this email by others (including any of its attachments) is strictly prohibited. If you are not the intended recipient, please destroy this email and its attachments and contact the sender immediately. This email was sent by EidosMedia SpAhttp://www.eidosmedia.com
participants (2)
-
Alfio Bettini
-
Stefan Manegold