Hi All, I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server. Server configuration - 8 CPU Cores 64 GB RAM I have loaded a data set with 70 columns having 15 Millions Rows. I have started mserver5 as below -- mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb --set mapi_open=true --set mapi_port=54321 --set merovingian_uri=mapi:monetdb://localhost:54321/testdb --set mapi_usock=/data/monetdb/testdb/.mapi.sock --set monet_vault_key=/data/monetdb/testdb/.vaultkey --set sql_optimizer=default_pipe --set max_clients=64 Now I have ran a query as below -- select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type; This query took around 90 Secs .. So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%. My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread. More info -- mserver5 --version MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved Visit http://www.monetdb.org/ for further information Found 63.0GiB available memory, 8 available cpu cores Libraries: libpcre: 7.8 2008-09-05 (compiled with 7.8) openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013) libxml2: 2.7.6 (compiled with 2.7.6) Compiled by: root@localhost (x86_64-unknown-linux-gnu) Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2 Linking : /usr/bin/ld -m elf_x86_64 Any pointers ?? Thanks
Hi Each MAL instruction is executed by a single thread. Some operators require the complete input to be available, such as in your case the group.subgroup(). martin On 18/01/15 01:27, shamsul hassan wrote:
Hi All,
I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server.
Server configuration -
8 CPU Cores 64 GB RAM
I have loaded a data set with 70 columns having 15 Millions Rows.
I have started mserver5 as below --
mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb --set mapi_open=true --set mapi_port=54321 --set merovingian_uri=mapi:monetdb://localhost:54321/testdb --set mapi_usock=/data/monetdb/testdb/.mapi.sock --set monet_vault_key=/data/monetdb/testdb/.vaultkey --set sql_optimizer=default_pipe --set max_clients=64
Now I have ran a query as below --
select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
This query took around 90 Secs .. So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%.
My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread.
More info --
mserver5 --version
MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 63.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@localhost (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Any pointers ??
Thanks
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi,
I haven't understand it completely, does it mean that it will always use a
single thread while doing Group by.
Also I have noticed one behaviour as below --
1) Very Fast Execution ---
Select product_type,sum(ord_qty) FROM bookings where ord_entry_date
between '2014-04-01' and '2014-10-01' group by product_type;
2) Suddenly very slow execution , the moment I add another AGGREGATE
function something like below --
select product_type,sum(ord_qty) ,COUNT(DISTINCT sales_document) FROM
bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group
by product_type;
So with 2 AGGREGATE functions it behaves very slow as compared to 1
AGGREGATE function.
Also even with 2 AGGREGATE functions example above , if i add more
dimentions and group by 2-3 columns then there is a slight improvement in
performance even though we are adding more GROUP BYs
Is this behaviour is expected, does anybody else have also experienced such
behaviour.
Can it be because of OCT2014-SP1 release ??
Thanks
On Sun, Jan 18, 2015 at 11:20 AM, Martin Kersten
Hi
Each MAL instruction is executed by a single thread. Some operators require the complete input to be available, such as in your case the group.subgroup().
martin
On 18/01/15 01:27, shamsul hassan wrote:
Hi All,
I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server.
Server configuration -
8 CPU Cores 64 GB RAM
I have loaded a data set with 70 columns having 15 Millions Rows.
I have started mserver5 as below --
mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb --set mapi_open=true --set mapi_port=54321 --set merovingian_uri=mapi:monetdb://localhost:54321/testdb --set mapi_usock=/data/monetdb/testdb/.mapi.sock --set monet_vault_key=/data/monetdb/testdb/.vaultkey --set sql_optimizer=default_pipe --set max_clients=64
Now I have ran a query as below --
select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
This query took around 90 Secs .. So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%.
My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread.
More info --
mserver5 --version
MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 63.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@localhost (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Any pointers ??
Thanks
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi All,
Any help /Inputs here are really appreciated.
Thanks
On Mon, Jan 19, 2015 at 11:17 AM, shamsul hassan
Hi,
I haven't understand it completely, does it mean that it will always use a single thread while doing Group by.
Also I have noticed one behaviour as below --
1) Very Fast Execution ---
Select product_type,sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
2) Suddenly very slow execution , the moment I add another AGGREGATE function something like below --
select product_type,sum(ord_qty) ,COUNT(DISTINCT sales_document) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
So with 2 AGGREGATE functions it behaves very slow as compared to 1 AGGREGATE function.
Also even with 2 AGGREGATE functions example above , if i add more dimentions and group by 2-3 columns then there is a slight improvement in performance even though we are adding more GROUP BYs
Is this behaviour is expected, does anybody else have also experienced such behaviour. Can it be because of OCT2014-SP1 release ??
Thanks
On Sun, Jan 18, 2015 at 11:20 AM, Martin Kersten
wrote: Hi
Each MAL instruction is executed by a single thread. Some operators require the complete input to be available, such as in your case the group.subgroup().
martin
On 18/01/15 01:27, shamsul hassan wrote:
Hi All,
I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server.
Server configuration -
8 CPU Cores 64 GB RAM
I have loaded a data set with 70 columns having 15 Millions Rows.
I have started mserver5 as below --
mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb --set mapi_open=true --set mapi_port=54321 --set merovingian_uri=mapi:monetdb://localhost:54321/testdb --set mapi_usock=/data/monetdb/testdb/.mapi.sock --set monet_vault_key=/data/monetdb/testdb/.vaultkey --set sql_optimizer=default_pipe --set max_clients=64
Now I have ran a query as below --
select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
This query took around 90 Secs .. So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%.
My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread.
More info --
mserver5 --version
MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 63.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@localhost (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Any pointers ??
Thanks
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 19/01/15 16:27, shamsul hassan wrote:
Hi All, Hi
Any help /Inputs here are really appreciated.
Thanks
On Mon, Jan 19, 2015 at 11:17 AM, shamsul hassan
mailto:shamsulbuddy@gmail.com> wrote: Hi,
I haven't understand it completely, does it mean that it will always use a single thread while doing Group by. Every MAL instruction is handled by a single thread. Parallel processing of aggregates is shown in the plans.
Also I have noticed one behaviour as below --
1) Very Fast Execution ---
Select product_type,sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
2) Suddenly very slow execution , the moment I add another AGGREGATE function something like below --
select product_type,sum(ord_qty) ,COUNT(DISTINCT sales_document) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type; COUNT(DISTINCT sales_document) is an expensive operation, as all duplicate elimination routines require some form of sorting.
So with 2 AGGREGATE functions it behaves very slow as compared to 1 AGGREGATE function.
Also even with 2 AGGREGATE functions example above , if i add more dimentions and group by 2-3 columns then there is a slight improvement in performance even though we are adding more GROUP BYs
Is this behaviour is expected, does anybody else have also experienced such behaviour.
Behaviour can often be explained by understanding the underlying algorithmic complexity of the relational algebra operators, and the properties that actual columns expose. This calls for a study of those algorithms (using the general literature on algorithms). Trial and error won't bring you much further. regards, Martin
Can it be because of OCT2014-SP1 release ??
Thanks
On Sun, Jan 18, 2015 at 11:20 AM, Martin Kersten
mailto:martin@monetdb.org> wrote: Hi
Each MAL instruction is executed by a single thread. Some operators require the complete input to be available, such as in your case the group.subgroup().
martin
On 18/01/15 01:27, shamsul hassan wrote:
Hi All,
I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server.
Server configuration -
8 CPU Cores 64 GB RAM
I have loaded a data set with 70 columns having 15 Millions Rows.
I have started mserver5 as below --
mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb --set mapi_open=true --set mapi_port=54321 --set merovingian_uri=mapi:monetdb:/__/localhost:54321/testdb --set mapi_usock=/data/monetdb/__testdb/.mapi.sock --set monet_vault_key=/data/monetdb/__testdb/.vaultkey --set sql_optimizer=default_pipe --set max_clients=64
Now I have ran a query as below --
select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
This query took around 90 Secs .. So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%.
My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread.
More info --
mserver5 --version
MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 63.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@localhost (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Any pointers ??
Thanks
_________________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/__mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list
_________________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/__mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Yes looks like COUNT(DISTINCT sales_document) is a very expensive
operation, the moment I remove DISTINCT and just fired the query like
COUNT(sales_doc),sum(shpd_qty),sum(ord_qty)
.i.e with more AGGREGATE functions , its behaving very fast. But the data
here might be wrong as we are not counting the DISTINCT sales_document ,so
few sales_document might get counted multiple times.
Any other way of achieving the same but without using DISTINCT?
Also I did more testing with MonetDB and this time with Detail line level
data (NO Aggregation) and on that front its slow.
I fired the query - select * from booking where sales_document_line_comb =
12345610 and , this took 40 Secs even though sales_document_line_comb is
Primary KEY for booking table.
Same way I fired many detail queries but all of them are slow.
Any pointers here in terms of increasing the Detail line level perforrmace.
Thanks
On Mon, Jan 19, 2015 at 7:41 PM, Martin Kersten
On 19/01/15 16:27, shamsul hassan wrote:
Hi All,
Hi
Any help /Inputs here are really appreciated.
Thanks
On Mon, Jan 19, 2015 at 11:17 AM, shamsul hassan
mailto:shamsulbuddy@gmail.com> wrote: Hi,
I haven't understand it completely, does it mean that it will always use a single thread while doing Group by.
Every MAL instruction is handled by a single thread. Parallel processing of aggregates is shown in the plans.
Also I have noticed one behaviour as below --
1) Very Fast Execution ---
Select product_type,sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
2) Suddenly very slow execution , the moment I add another AGGREGATE function something like below --
select product_type,sum(ord_qty) ,COUNT(DISTINCT sales_document) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
COUNT(DISTINCT sales_document) is an expensive operation, as all duplicate elimination routines require some form of sorting.
So with 2 AGGREGATE functions it behaves very slow as compared to 1 AGGREGATE function.
Also even with 2 AGGREGATE functions example above , if i add more dimentions and group by 2-3 columns then there is a slight improvement in performance even though we are adding more GROUP BYs
Is this behaviour is expected, does anybody else have also experienced such behaviour.
Behaviour can often be explained by understanding the underlying algorithmic complexity of the relational algebra operators, and the properties that actual columns expose. This calls for a study of those algorithms (using the general literature on algorithms).
Trial and error won't bring you much further.
regards, Martin
Can it be because of OCT2014-SP1 release ??
Thanks
On Sun, Jan 18, 2015 at 11:20 AM, Martin Kersten
mailto:martin@monetdb.org> wrote: Hi
Each MAL instruction is executed by a single thread. Some operators require the complete input to be available, such as in your case the group.subgroup().
martin
On 18/01/15 01:27, shamsul hassan wrote:
Hi All,
I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server.
Server configuration -
8 CPU Cores 64 GB RAM
I have loaded a data set with 70 columns having 15 Millions Rows.
I have started mserver5 as below --
mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb --set mapi_open=true --set mapi_port=54321 --set merovingian_uri=mapi:monetdb:/__/localhost:54321/testdb --set mapi_usock=/data/monetdb/__testdb/.mapi.sock --set monet_vault_key=/data/monetdb/__testdb/.vaultkey --set sql_optimizer=default_pipe --set max_clients=64
Now I have ran a query as below --
select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
This query took around 90 Secs .. So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%.
My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread.
More info --
mserver5 --version
MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 63.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@localhost (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Any pointers ??
Thanks
_________________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/__mailman/listinfo/users-list < https://www.monetdb.org/mailman/listinfo/users-list>
_________________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/__mailman/listinfo/users-list < https://www.monetdb.org/mailman/listinfo/users-list>
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On Jan 20, 2015, at 16:02, shamsul hassan
wrote: Yes looks like COUNT(DISTINCT sales_document) is a very expensive operation, the moment I remove DISTINCT and just fired the query like COUNT(sales_doc),sum(shpd_qty),sum(ord_qty) .i.e with more AGGREGATE functions , its behaving very fast. But the data here might be wrong as we are not counting the DISTINCT sales_document ,so few sales_document might get counted multiple times. Any other way of achieving the same but without using DISTINCT?
Also I did more testing with MonetDB and this time with Detail line level data (NO Aggregation) and on that front its slow.
I fired the query - select * from booking where sales_document_line_comb = 12345610 and , this took 40 Secs even though sales_document_line_comb is Primary KEY for booking table.
Hai Shamsul, You might want to have a look at the TRACE output to see where the time went. Regards, Jennie
Same way I fired many detail queries but all of them are slow.
Any pointers here in terms of increasing the Detail line level perforrmace.
Thanks
On Mon, Jan 19, 2015 at 7:41 PM, Martin Kersten
wrote: On 19/01/15 16:27, shamsul hassan wrote: Hi All, Hi Any help /Inputs here are really appreciated.
Thanks
On Mon, Jan 19, 2015 at 11:17 AM, shamsul hassan
mailto:shamsulbuddy@gmail.com> wrote: Hi,
I haven't understand it completely, does it mean that it will always use a single thread while doing Group by. Every MAL instruction is handled by a single thread. Parallel processing of aggregates is shown in the plans.
Also I have noticed one behaviour as below --
1) Very Fast Execution ---
Select product_type,sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
2) Suddenly very slow execution , the moment I add another AGGREGATE function something like below --
select product_type,sum(ord_qty) ,COUNT(DISTINCT sales_document) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type; COUNT(DISTINCT sales_document) is an expensive operation, as all duplicate elimination routines require some form of sorting.
So with 2 AGGREGATE functions it behaves very slow as compared to 1 AGGREGATE function.
Also even with 2 AGGREGATE functions example above , if i add more dimentions and group by 2-3 columns then there is a slight improvement in performance even though we are adding more GROUP BYs
Is this behaviour is expected, does anybody else have also experienced such behaviour. Behaviour can often be explained by understanding the underlying algorithmic complexity of the relational algebra operators, and the properties that actual columns expose. This calls for a study of those algorithms (using the general literature on algorithms).
Trial and error won't bring you much further.
regards, Martin Can it be because of OCT2014-SP1 release ??
Thanks
On Sun, Jan 18, 2015 at 11:20 AM, Martin Kersten
mailto:martin@monetdb.org> wrote: Hi
Each MAL instruction is executed by a single thread. Some operators require the complete input to be available, such as in your case the group.subgroup().
martin
On 18/01/15 01:27, shamsul hassan wrote:
Hi All,
I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server.
Server configuration -
8 CPU Cores 64 GB RAM
I have loaded a data set with 70 columns having 15 Millions Rows.
I have started mserver5 as below --
mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb --set mapi_open=true --set mapi_port=54321 --set merovingian_uri=mapi:monetdb:/__/localhost:54321/testdb --set mapi_usock=/data/monetdb/__testdb/.mapi.sock --set monet_vault_key=/data/monetdb/__testdb/.vaultkey --set sql_optimizer=default_pipe --set max_clients=64
Now I have ran a query as below --
select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
This query took around 90 Secs .. So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%.
My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread.
More info --
mserver5 --version
MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 63.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@localhost (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Any pointers ??
Thanks
_________________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/__mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list
_________________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/__mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi, please also be aware that every first point selection per column after a server (re-)start builds a hash table, which is then kept for re-use with subsequent point selections. Thus, the first point query per column is slower than subsequent ones. Next to profiling using the TRACE prefix, you can also check whether (though unlikely in this case) projection is a problem, i.e., run select sales_document_line_comb from booking where sales_document_line_comb = 12345610; Stefan ----- Original Message -----
On Jan 20, 2015, at 16:02, shamsul hassan
wrote: Yes looks like COUNT(DISTINCT sales_document) is a very expensive operation, the moment I remove DISTINCT and just fired the query like COUNT(sales_doc),sum(shpd_qty),sum(ord_qty) .i.e with more AGGREGATE functions , its behaving very fast. But the data here might be wrong as we are not counting the DISTINCT sales_document ,so few sales_document might get counted multiple times. Any other way of achieving the same but without using DISTINCT?
Also I did more testing with MonetDB and this time with Detail line level data (NO Aggregation) and on that front its slow.
I fired the query - select * from booking where sales_document_line_comb = 12345610 and , this took 40 Secs even though sales_document_line_comb is Primary KEY for booking table.
Hai Shamsul,
You might want to have a look at the TRACE output to see where the time went.
Regards,
Jennie
Same way I fired many detail queries but all of them are slow.
Any pointers here in terms of increasing the Detail line level perforrmace.
Thanks
On Mon, Jan 19, 2015 at 7:41 PM, Martin Kersten
wrote: On 19/01/15 16:27, shamsul hassan wrote: Hi All, Hi Any help /Inputs here are really appreciated.
Thanks
On Mon, Jan 19, 2015 at 11:17 AM, shamsul hassan
mailto:shamsulbuddy@gmail.com> wrote: Hi,
I haven't understand it completely, does it mean that it will always use a single thread while doing Group by. Every MAL instruction is handled by a single thread. Parallel processing of aggregates is shown in the plans.
Also I have noticed one behaviour as below --
1) Very Fast Execution ---
Select product_type,sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
2) Suddenly very slow execution , the moment I add another AGGREGATE function something like below --
select product_type,sum(ord_qty) ,COUNT(DISTINCT sales_document) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type; COUNT(DISTINCT sales_document) is an expensive operation, as all duplicate elimination routines require some form of sorting.
So with 2 AGGREGATE functions it behaves very slow as compared to 1 AGGREGATE function.
Also even with 2 AGGREGATE functions example above , if i add more dimentions and group by 2-3 columns then there is a slight improvement in performance even though we are adding more GROUP BYs
Is this behaviour is expected, does anybody else have also experienced such behaviour. Behaviour can often be explained by understanding the underlying algorithmic complexity of the relational algebra operators, and the properties that actual columns expose. This calls for a study of those algorithms (using the general literature on algorithms).
Trial and error won't bring you much further.
regards, Martin Can it be because of OCT2014-SP1 release ??
Thanks
On Sun, Jan 18, 2015 at 11:20 AM, Martin Kersten
mailto:martin@monetdb.org> wrote: Hi
Each MAL instruction is executed by a single thread. Some operators require the complete input to be available, such as in your case the group.subgroup().
martin
On 18/01/15 01:27, shamsul hassan wrote:
Hi All,
I have recently install monetdb "OCT 2014-SP1" release on my Oracle Enterprise 6 Linux server.
Server configuration -
8 CPU Cores 64 GB RAM
I have loaded a data set with 70 columns having 15 Millions Rows.
I have started mserver5 as below --
mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb --set mapi_open=true --set mapi_port=54321 --set merovingian_uri=mapi:monetdb:/__/localhost:54321/testdb --set mapi_usock=/data/monetdb/__testdb/.mapi.sock --set monet_vault_key=/data/monetdb/__testdb/.vaultkey --set sql_optimizer=default_pipe --set max_clients=64
Now I have ran a query as below --
select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group by product_type;
This query took around 90 Secs .. So I started tomograph and checked plotted graph ... which shows the parallelism usage as 39.9 % only and its utilizing only 1 thread with maximum time went in the step - group.subgroupdone almost 98%.
My Question is that even I have started the server with gdk_nr_threads=8 , then why it is just using single thread.
More info --
mserver5 --version
MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 63.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11 Feb 2013)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@localhost (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Any pointers ??
Thanks
_________________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/__mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list
_________________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/__mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
participants (4)
-
Martin Kersten
-
shamsul hassan
-
Stefan Manegold
-
Ying Zhang