Merge table; partition pruning?

Ok, I was all excited about MERGE TABLEs moving up into the SQL layer, unfortunately my enthusiasm might be misplaced as I don’t see any improvement in query performance. Let me explain: I have a fact table with ~300 M rows with 5 years of data. I created 5 yearly tables, ordering by my date_key (and also created a year column). Checked storage to verify MonetDB knew they were sorted. Created a Merge table over all 5 yearly tables. I then ran test queries with various year subsets (using direct fact column filter and joining from date dimension), from 1 to all 5 and well I didn’t see any improvement in query speed when a subset of the 5 years was queried. In fact I usually saw 10-20% degradation of performance. I am no MAL expert, but looking at the PLAN and the EXPLAIN/TRACE outputs it appears to me tables that shouldn’t have data for the query are still scanned. I ran this both on a 16GB 4 core Windows box and a 32GB 8 core EC2 Linux box, same behavior both places, using both July2015 and July2015SP1. Is this expected behavior? Am I missing some magic to have MonetDB do the Oracle concept of Partition Elimination? Or are Merge tables really just meant for data maintenance? Thanx, Bryan -- ------------------------------ This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.

I think merge tables are meant to be run on different machines. I haven't
read anything that says that monetdb runs operations on merge tables in
parallel if they are being searched from one process. However, if you made
5 remote tables and collected them together into a single merge table you
are likely to see a significant improvement.
On Wed, Nov 11, 2015 at 3:25 PM Bryan Senseman
Ok, I was all excited about MERGE TABLEs moving up into the SQL layer, unfortunately my enthusiasm might be misplaced as I don’t see any improvement in query performance. Let me explain:
I have a fact table with ~300 M rows with 5 years of data.
I created 5 yearly tables, ordering by my date_key (and also created a year column). Checked storage to verify MonetDB knew they were sorted.
Created a Merge table over all 5 yearly tables.
I then ran test queries with various year subsets (using direct fact column filter and joining from date dimension), from 1 to all 5 and well I didn’t see any improvement in query speed when a subset of the 5 years was queried. In fact I usually saw 10-20% degradation of performance. I am no MAL expert, but looking at the PLAN and the EXPLAIN/TRACE outputs it appears to me tables that shouldn’t have data for the query are still scanned.
I ran this both on a 16GB 4 core Windows box and a 32GB 8 core EC2 Linux box, same behavior both places, using both July2015 and July2015SP1.
Is this expected behavior? Am I missing some magic to have MonetDB do the Oracle concept of Partition Elimination? Or are Merge tables really just meant for data maintenance?
Thanx,
Bryan
------------------------------
This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system. _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi All,
Well first of all tables that contain no data will be scaned because you
added them to your MERGE TABLE they are just simply part of nice easy way
to UNION tables together.
I am also unsure if you are using REMOTE TABLES there is more on those here
which my help you further.
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/DistributedQueryP...
Regards,
Brian Hood
On Nov 12, 2015 7:51 PM, "Christopher Nelson"
I think merge tables are meant to be run on different machines. I haven't read anything that says that monetdb runs operations on merge tables in parallel if they are being searched from one process. However, if you made 5 remote tables and collected them together into a single merge table you are likely to see a significant improvement.
On Wed, Nov 11, 2015 at 3:25 PM Bryan Senseman
wrote: Ok, I was all excited about MERGE TABLEs moving up into the SQL layer, unfortunately my enthusiasm might be misplaced as I don’t see any improvement in query performance. Let me explain:
I have a fact table with ~300 M rows with 5 years of data.
I created 5 yearly tables, ordering by my date_key (and also created a year column). Checked storage to verify MonetDB knew they were sorted.
Created a Merge table over all 5 yearly tables.
I then ran test queries with various year subsets (using direct fact column filter and joining from date dimension), from 1 to all 5 and well I didn’t see any improvement in query speed when a subset of the 5 years was queried. In fact I usually saw 10-20% degradation of performance. I am no MAL expert, but looking at the PLAN and the EXPLAIN/TRACE outputs it appears to me tables that shouldn’t have data for the query are still scanned.
I ran this both on a 16GB 4 core Windows box and a 32GB 8 core EC2 Linux box, same behavior both places, using both July2015 and July2015SP1.
Is this expected behavior? Am I missing some magic to have MonetDB do the Oracle concept of Partition Elimination? Or are Merge tables really just meant for data maintenance?
Thanx,
Bryan
------------------------------
This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system. _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi, understood and very good improvement the MERGE/REMOTE is - as it lays a solid base for partitioning/sharding but this does not answer the original question which i take the liberty to rephrase since i was looking for this functionality like a small child: * for UNION/MERGE/REMOTE i would expect following execution sequence 1. break the original UNION into separate sub-queries 2. prune some of these sub-queries based on the original query predicates where applicable - this could be for example skipping the scan for certain partitions based on partition key exclusion (when i search by year=2008 i would expect to skip all partitions where year<>2008 considering the year is the partition key) 3. push the predicates (from an outer-query - could be view and/or a complex SQL) down to the sub-queries (this will not prevent the partition scan but the pushed-down predicates may in effect achieve the data reduction ) 4. run the sub-queries in parallel 5. collect the results * would you please shed some light on o how MonetDB does the processing o is there any intent in future to do something similar to the above (1-5)? thanks in advance milan On 12. 11. 2015 23:48, Brian Hood wrote:
Hi All,
Well first of all tables that contain no data will be scaned because you added them to your MERGE TABLE they are just simply part of nice easy way to UNION tables together.
I am also unsure if you are using REMOTE TABLES there is more on those here which my help you further.
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/DistributedQueryP...
Regards,
Brian Hood
On Nov 12, 2015 7:51 PM, "Christopher Nelson"
mailto:nadiasvertex@gmail.com> wrote: I think merge tables are meant to be run on different machines. I haven't read anything that says that monetdb runs operations on merge tables in parallel if they are being searched from one process. However, if you made 5 remote tables and collected them together into a single merge table you are likely to see a significant improvement.
On Wed, Nov 11, 2015 at 3:25 PM Bryan Senseman
mailto:monetdb@openbi.com> wrote: Ok, I was all excited about MERGE TABLEs moving up into the SQL layer, unfortunately my enthusiasm might be misplaced as I don’t see any improvement in query performance. Let me explain:
I have a fact table with ~300 M rows with 5 years of data.
I created 5 yearly tables, ordering by my date_key (and also created a year column). Checked storage to verify MonetDB knew they were sorted.
Created a Merge table over all 5 yearly tables.
I then ran test queries with various year subsets (using direct fact column filter and joining from date dimension), from 1 to all 5 and well I didn’t see any improvement in query speed when a subset of the 5 years was queried. In fact I usually saw 10-20% degradation of performance. I am no MAL expert, but looking at the PLAN and the EXPLAIN/TRACE outputs it appears to me tables that shouldn’t have data for the query are still scanned.
I ran this both on a 16GB 4 core Windows box and a 32GB 8 core EC2 Linux box, same behavior both places, using both July2015 and July2015SP1.
Is this expected behavior? Am I missing some magic to have MonetDB do the Oracle concept of Partition Elimination? Or are Merge tables really just meant for data maintenance?
Thanx,
Bryan
------------------------------------------------------------------------
This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (4)
-
Brian Hood
-
Bryan Senseman
-
Budulinku Dejmihrasku
-
Christopher Nelson