RE: MonetDB Evaluation - Excessive slowness in a particular query

6 Feb 2018

      Hi all,

I reopen this thread about excessive slowness in query "SELECT id, count(distinct value) FROM table1 GROUP BY id;" (you can find more info in my mail below)

About your suggestion (thanks Stefan!), I can provide the below info.

The required statistics are the following ones:

select count(*) from table: 92.912.272 items
select count(id) from table: 92.912.272 items
select count(distinct id) from table: 63 items
select count(value) from table: 92.912.272 items
select count(distinct value) from table: 86.964.522 items
select count(*) from (select id,value from table group by id,value) as x: 86.964.522 items

The result of the TRACE command is:

sql>TRACE SELECT id, count(distinct value) FROM table1 GROUP BY id;
+--------------------------+---------+
| id | L3      |
+==========================+=========+
|                 20160906 |  779489 |
|                 20160907 | 1616739 |
|                 20160908 | 1769753 |
|                 20160909 | 1764169 |
|                 20160910 |  668824 |
|                 20160911 |  716343 |
|                 20160912 | 1685183 |
|                 20160913 | 1619177 |
|                 20160914 | 1630866 |
|                 20160915 | 1831826 |
|                 20160916 | 1717702 |
|                 20160917 |  638397 |
|                 20160918 |  693979 |
|                 20160919 | 1635602 |
|                 20160920 | 1595859 |
|                 20160921 | 1650712 |
|                 20160922 | 1777228 |
|                 20160923 | 1783591 |
|                 20160924 |  652866 |
|                 20160925 |  722674 |
|                 20160926 | 1641260 |
|                 20160927 | 1673879 |
|                 20160928 | 1689016 |
|                 20160929 | 1865369 |
|                 20160930 | 1791773 |
|                 20161001 |  654382 |
|                 20161002 |  731022 |
|                 20161003 | 1302974 |
|                 20161004 | 1593157 |
|                 20161005 | 1570264 |
|                 20161006 | 1750299 |
|                 20161007 | 1788563 |
|                 20161008 |  590823 |
|                 20161009 |  700178 |
|                 20161010 | 1649577 |
|                 20161011 | 1602492 |
|                 20161012 | 1639719 |
|                 20161013 | 1759836 |
|                 20161014 | 1732986 |
|                 20161015 |  641682 |
|                 20161016 |  672841 |
|                 20161017 | 1652638 |
|                 20161018 | 1631122 |
|                 20161019 | 1646245 |
|                 20161020 | 1831482 |
|                 20161021 | 1836734 |
|                 20161022 |  618921 |
|                 20161023 |  687258 |
|                 20161024 | 1659389 |
|                 20161025 | 1743597 |
|                 20161026 | 1709957 |
|                 20161027 | 1774661 |
|                 20161028 | 1796815 |
|                 20161029 |  665849 |
|                 20161030 |  726861 |
|                 20161031 | 1686187 |
|                 20161101 | 1694782 |
|                 20161102 | 1582901 |
|                 20161103 | 1761407 |
|                 20161104 | 1815731 |
|                 20161105 |  629033 |
|                 20161106 |  684745 |
|                 20161107 | 1135136 |
+--------------------------+---------+
63 tuples (5m 25s)
+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| usec     | statement                                                                                                                                                                                                                  |
+==========+============================================================================================================================================================================================================================+
|       67 | X_1=0@0:void := querylog.define("trace select id, count(distinct value) from table1 group by id;":str, "default_pipe":str, 38:int);        |
|      218 | X_40=[0]:bat[:str] := bat.new(nil:str);                                                                                                                                                                         |
|       76 | X_47=[1]:bat[:str] := bat.append(X_40=[1]:bat[:str], "sys.table1":str);                                                                                                 |
|       48 | X_57=[2]:bat[:str] := bat.append(X_47=[2]:bat[:str], "sys.L4":str);                                                                                                                                  |
|       41 | X_46=[0]:bat[:int] := bat.new(nil:int);                                                                                                                                                                         |
|       48 | X_55=[1]:bat[:int] := bat.append(X_46=[1]:bat[:int], 0:int);                                                                                                                                         |
|       43 | X_65=[2]:bat[:int] := bat.append(X_55=[2]:bat[:int], 0:int);                                                                                                                                         |
|       40 | X_44=[0]:bat[:int] := bat.new(nil:int);                                                                                                                                                                         |
|       49 | X_53=[1]:bat[:int] := bat.append(X_44=[1]:bat[:int], 32:int);                                                                                                                                        |
|       44 | X_63=[2]:bat[:int] := bat.append(X_53=[2]:bat[:int], 64:int);                                                                                                                                        |
|       56 | X_43=[0]:bat[:str] := bat.new(nil:str);                                                                                                                                                                         |
|       52 | X_51=[1]:bat[:str] := bat.append(X_43=[1]:bat[:str], "int":str);                                                                                                                                     |
|       48 | X_61=[2]:bat[:str] := bat.append(X_51=[2]:bat[:str], "bigint":str);                                                                                                                                  |
|       40 | X_42=[0]:bat[:str] := bat.new(nil:str);                                                                                                                                                                         |
|       46 | X_49=[1]:bat[:str] := bat.append(X_42=[1]:bat[:str], "id":str);                                                                                                                |
|       50 | X_59=[2]:bat[:str] := bat.append(X_49=[2]:bat[:str], "L3":str);                                                                                                                                      |
|       33 | X_4=0:int := sql.mvc();                                                                                                                                                                                                    |
|       72 | C_107=[23228068]:bat[:oid] := sql.tid(X_4=0:int, "sys":str, "table1":str, 0:int, 4:int);                                                                                           |
|      165 | X_114=[23228068]:bat[:int] := sql.bind(X_4=0:int, "sys":str, "table1":str, "id":str, 0:int, 0:int, 4:int);                                                   |
|       59 | X_123=[23228068]:bat[:int] := algebra.projection(C_107=[23228068]:bat[:oid], X_114=[23228068]:bat[:int]);                                                                                 |
|      149 | X_118=[23228068]:bat[:lng] := sql.bind(X_4=0:int, "sys":str, "table1":str, "cod_event_id":str, 0:int, 0:int, 4:int);                                                               |
|       95 | X_127=[23228068]:bat[:lng] := algebra.projection(C_107=[23228068]:bat[:oid], X_118=[23228068]:bat[:lng]);                                                                                 |
|  2193479 | (X_134=[23228068]:bat[:oid], C_135=[21765452]:bat[:oid], X_136=[21765452]:bat[:lng]) := group.group(X_127=[23228068]:bat[:lng]);                                               |
| 53185013 | (X_161=[23228068]:bat[:oid], C_162=[21660369]:bat[:oid], X_163=[21660369]:bat[:lng]) := group.subgroupdone(X_125=[23228068]:bat[:int], X_142=[23228068]:bat[:oid]); |
|       60 | language.pass(C_107=[23228068]:bat[:oid]);                                                                                                                                                                      |
|      133 | X_117=[23228068]:bat[:int] := sql.bind(X_4=0:int, "sys":str, "table1":str, "id":str, 0:int, 3:int, 4:int);                                                   |
|       95 | C_113=[23228068]:bat[:oid] := sql.tid(X_4=0:int, "sys":str, "table1":str, 3:int, 4:int);                                                                                           |
|       62 | X_126=[23228068]:bat[:int] := algebra.projection(C_113=[23228068]:bat[:oid], X_117=[23228068]:bat[:int]);                                                                                 |
|      135 | X_120=[23228068]:bat[:lng] := sql.bind(X_4=0:int, "sys":str, "table1":str, "cod_event_id":str, 0:int, 2:int, 4:int);                                                               |
|       84 | X_116=[23228068]:bat[:int] := sql.bind(X_4=0:int, "sys":str, "table1":str, "id":str, 0:int, 2:int, 4:int);                                                   |
|      162 | C_111=[23228068]:bat[:oid] := sql.tid(X_4=0:int, "sys":str, "table1":str, 2:int, 4:int);                                                                                           |
|      139 | X_125=[23228068]:bat[:int] := algebra.projection(C_111=[23228068]:bat[:oid], X_116=[23228068]:bat[:int]);                                                                                 |
|       95 | X_129=[23228068]:bat[:lng] := algebra.projection(C_111=[23228068]:bat[:oid], X_120=[23228068]:bat[:lng]);                                                                                 |
|       45 | language.pass(C_111=[23228068]:bat[:oid]);                                                                                                                                                                      |
|  1449732 | (X_142=[23228068]:bat[:oid], C_143=[21660369]:bat[:oid], X_144=[21660369]:bat[:lng]) := group.group(X_129=[23228068]:bat[:lng]);                                               |
| 58895764 | (X_157=[23228068]:bat[:oid], C_158=[21829821]:bat[:oid], X_159=[21829821]:bat[:lng]) := group.subgroupdone(X_124=[23228068]:bat[:int], X_138=[23228068]:bat[:oid]); |
| 60546070 | (X_165=[23228068]:bat[:oid], C_166=[21709222]:bat[:oid], X_167=[21709222]:bat[:lng]) := group.subgroupdone(X_126=[23228068]:bat[:int], X_146=[23228068]:bat[:oid]); |
|     1532 | X_119=[23228068]:bat[:lng] := sql.bind(X_4=0:int, "sys":str, "table1":str, "cod_event_id":str, 0:int, 1:int, 4:int);                                                               |
|      121 | X_115=[23228068]:bat[:int] := sql.bind(X_4=0:int, "sys":str, "table1":str, "id":str, 0:int, 1:int, 4:int);                                                   |
|       69 | C_109=[23228068]:bat[:oid] := sql.tid(X_4=0:int, "sys":str, "table1":str, 1:int, 4:int);                                                                                           |
|      198 | X_128=[23228068]:bat[:lng] := algebra.projection(C_109=[23228068]:bat[:oid], X_119=[23228068]:bat[:lng]);                                                                                 |
|  1050521 | (X_138=[23228068]:bat[:oid], C_139=[21829821]:bat[:oid], X_140=[21829821]:bat[:lng]) := group.group(X_128=[23228068]:bat[:lng]);                                               |
|     2782 | X_121=[23228068]:bat[:lng] := sql.bind(X_4=0:int, "sys":str, "table1":str, "cod_event_id":str, 0:int, 3:int, 4:int);                                                               |
|     8837 | X_130=[23228068]:bat[:lng] := algebra.projection(C_113=[23228068]:bat[:oid], X_121=[23228068]:bat[:lng]);                                                                                 |
|      108 | language.pass(C_113=[23228068]:bat[:oid]);                                                                                                                                                                      |
|       77 | X_124=[23228068]:bat[:int] := algebra.projection(C_109=[23228068]:bat[:oid], X_115=[23228068]:bat[:int]);                                                                                 |
|       37 | language.pass(C_109=[23228068]:bat[:oid]);                                                                                                                                                                      |
|  1198687 | (X_146=[23228068]:bat[:oid], C_147=[21709222]:bat[:oid], X_148=[21709222]:bat[:lng]) := group.group(X_130=[23228068]:bat[:lng]);                                               |
| 64683118 | (X_153=[23228068]:bat[:oid], C_154=[21765452]:bat[:oid], X_155=[21765452]:bat[:lng]) := group.subgroupdone(X_123=[23228068]:bat[:int], X_134=[23228068]:bat[:oid]); |
| 65695935 | barrier X_211=false:bit := language.dataflow();                                                                                                                                                                            |
+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
50 tuples (5m 25s)

Let me know if you need further info.

Best,
Alfio

-----Original Message-----
From: users-list [mailto:users-list-bounces+alfio.bettini=eidosmedia.com@monetdb.org] On Behalf Of Stefan Manegold
Sent: Thursday, 14 December 2017 1:17 PM
To: Communication channel for MonetDB users 
Subject: Re: MonetDB Evaluation - Excessive slowness in a particular query

Hi,

I'm afraid there is not much to optimize on the user side for a simple query like yours, other than having your data sorted by id,value .

Also, with only two columns involved --- with ~100 million records their sizes are ~400 MB and ~800 MB --- all data easily fits in the 32 GB of your machine.

To get more insight on what's going on,
could you profile your query by prefixing it with "TRACE", i.e.,

TRACE SELECT id, count(distinct value) FROM table GROUP BY id;

and share the resulting performance trace?

Also could you share the following statistics:

select count(*) from table;
select count(id) from table;
select count(distinct id) from table;
select count(value) from table;
select count(distinct value) from table; select count(*) from ( select id,value from table group by id,value ) as x;

Best,
Stefan

----- On Dec 14, 2017, at 10:59 AM, Alfio Bettini alfio.bettini@eidosmedia.com wrote:
...
Hi all,
we're testing MonetDB performances in order to understand if it could
be possible to use it as data mart in our BI software.
We're considering a set of queries executed in our test environment
(CentOS 7,
32 GB RAM, 4 cpu).
All the executed queries have quite good performances (between 10 and
45
seconds) except the following one that is taking several minutes in
order to be
executed:
SELECT id, count(distinct value)
FROM table
GROUP BY id;
The table contains 100 millions of data. It has about 130 columns
(varchar, int, bigint, smallint, tinyint, timestamp, datetime). We
have no BLOB/CLOB/Text types.
The fields used in the query have these data types:
id: int
value: bigint
Can you suggest us any optimizations in order to have this query run
faster? Do you know some possible reasons why it's so slow?
Kind Regards,
Alfio Bettini
The information contained in this message, together with any
attachments, may be privileged and confidential and is intended for
the use of the addressee(s) only. Any review, copying, use or
distribution of this email by others (including any of its
attachments) is strictly prohibited. If you are not the intended
recipient, please destroy this email and its attachments and contact
the sender immediately. This email was sent by EidosMedia SpA
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

The information contained in this message, together with any attachments, may be privileged and confidential and is intended for the use of the addressee(s) only. Any review, copying, use or distribution of this email by others (including any of its attachments) is strictly prohibited. If you are not the intended recipient, please destroy this email and its attachments and contact the sender immediately. This email was sent by EidosMedia SpAhttp://www.eidosmedia.com