Are joins really this expensive?
I have created a traditional star-schema in MonetDB, using both Linux and Windows. We are using Mondrian OLAP on top so it generates the standard OLAP-y type of query using group-bys of dimensional values. What I’m finding is that doing joins in MonetDB is extremely expensive, for example: The following query against a fact table with 261 million rows returns in 9-11seconds consistently select date_hour_id/10000 as date_month_id, avg(seconds) from fact_utilization_hourly group by date_month_id; returning 41 rows, the query includes a “Trick” to get to a month_year value because the time_ids are smart keys… yyyyMMddhh. With the following query, using standard OLAP-y syntax, the query takes over 5 mins consistently. select year_month_value, avg(seconds) from fact_utilization_hourly f join dim_date_hour d on f.date_hour_id = d.date_hour_id group by year_month_value; returning the same 41 rows. I’m at a loss to figure out why this join causes a 3200% percent increase in response time. And this seems to have gotten worse in SP5 from anecdotal evidence. Bryan -- ------------------------------ This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
Dear Bryan,
unless you could share your data such that we could reproduce the problem,
could you please profile both queries using TRACE, stethoscope or tomograph;
cf., http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming
and share the respective output?
Then we might be able analyze what exactly causes the performance difference between both queries.
Thanks,
Stefan
Bryan Senseman
I have created a traditional star-schema in MonetDB, using both Linux and Windows. We are using Mondrian OLAP on top so it generates the standard OLAP-y type of query using group-bys of dimensional values. What I’m finding is that doing joins in MonetDB is extremely expensive, for example:
The following query against a fact table with 261 million rows returns in 9-11seconds consistently
select date_hour_id/10000 as date_month_id, avg(seconds) from fact_utilization_hourly group by date_month_id;
returning 41 rows, the query includes a “Trick” to get to a month_year value because the time_ids are smart keys… yyyyMMddhh.
With the following query, using standard OLAP-y syntax, the query takes over 5 mins consistently.
select year_month_value, avg(seconds) from fact_utilization_hourly f join dim_date_hour d on f.date_hour_id = d.date_hour_id group by year_month_value;
returning the same 41 rows.
I’m at a loss to figure out why this join causes a 3200% percent increase in response time. And this seems to have gotten worse in SP5 from anecdotal evidence.
Bryan
--
------------------------------
This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
------------------------------------------------------------------------
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | Database Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Dear Bryan Assuming your tables contains smart *unique* keys, i.e. uses a PRIMARY KEY constraint, then i would not expect a significant slow down, unless your system is overloaded already by other processes. To further aid, see http://www.monetdb.org/Developers/ReportingABug On 10/19/13 9:17 PM, Stefan Manegold wrote:
Dear Bryan,
unless you could share your data such that we could reproduce the problem, could you please profile both queries using TRACE, stethoscope or tomograph; cf., http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming and share the respective output?
Then we might be able analyze what exactly causes the performance difference between both queries.
Thanks, Stefan
Bryan Senseman
wrote: I have created a traditional star-schema in MonetDB, using both Linux and Windows. We are using Mondrian OLAP on top so it generates the standard OLAP-y type of query using group-bys of dimensional values. What I'm finding is that doing joins in MonetDB is extremely expensive, for example:
The following query against a fact table with 261 million rows returns in 9-11seconds consistently
select date_hour_id/10000 as date_month_id, avg(seconds) from fact_utilization_hourly group by date_month_id;
returning 41 rows, the query includes a "Trick" to get to a month_year value because the time_ids are smart keys... yyyyMMddhh.
With the following query, using standard OLAP-y syntax, the query takes over 5 mins consistently.
select year_month_value, avg(seconds) from fact_utilization_hourly f join dim_date_hour d on f.date_hour_id = d.date_hour_id group by year_month_value;
returning the same 41 rows.
I'm at a loss to figure out why this join causes a 3200% percent increase in response time. And this seems to have gotten worse in SP5 from anecdotal evidence.
Bryan
------------------------------------------------------------------------
This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
------------------------------------------------------------------------
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | Database Architectures (DA) | | www.CWI.nl/~manegold http://www.CWI.nl/%7Emanegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Three onther questions: Your email is not completely clear about which version of MonetDB you're using. The last sentence suggests you're using Feb2013 SP5. Is this correct? Did you install a pre-build binary installtion (if so, which), or did you build MonetDB yourself (if so, from Mercurial source or from tarball, and using which configure options)? And on what kind of platform are you running on (hardware, OS)? Thanks! Stefan ----- Original Message -----
Dear Bryan,
unless you could share your data such that we could reproduce the problem, could you please profile both queries using TRACE, stethoscope or tomograph; cf., http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming and share the respective output?
Then we might be able analyze what exactly causes the performance difference between both queries.
Thanks, Stefan
Bryan Senseman
wrote: I have created a traditional star-schema in MonetDB, using both Linux and Windows. We are using Mondrian OLAP on top so it generates the standard OLAP-y type of query using group-bys of dimensional values. What I’m finding is that doing joins in MonetDB is extremely expensive, for example:
The following query against a fact table with 261 million rows returns in 9-11seconds consistently
select date_hour_id/10000 as date_month_id, avg(seconds) from fact_utilization_hourly group by date_month_id;
returning 41 rows, the query includes a “Trick” to get to a month_year value because the time_ids are smart keys… yyyyMMddhh.
With the following query, using standard OLAP-y syntax, the query takes over 5 mins consistently.
select year_month_value, avg(seconds) from fact_utilization_hourly f join dim_date_hour d on f.date_hour_id = d.date_hour_id group by year_month_value;
returning the same 41 rows.
I’m at a loss to figure out why this join causes a 3200% percent increase in response time. And this seems to have gotten worse in SP5 from anecdotal evidence.
Bryan
--
------------------------------
This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
------------------------------------------------------------------------
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | Database Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
participants (3)
-
Bryan Senseman
-
Martin Kersten
-
Stefan Manegold