Re: Are joins really this expensive?
Forwarding to user-list as this seems to have gotten lost
Actually, I'm running a few different versions of MonetDB on different
environments. July2012SP2, Feb2013SP4 & SP5. July2012SP2 on Windows and
Linux (Ubuntu); Feb2013SP5 on Windows and Linux, and Feb2013SP4 on Linux.
All were installed using pre-built packages/installers. The trace files
themselves are from a Linux Feb2013SP4 installation. The Feb2013SP5
install was an inplace install on top of July2012SP2 and wasn't sure if
different "data" versions would cloud the trace.
To Martins questions (which I never got in my inbox...or junk) yes,
date_hour_id is a unique key for the dim_date_hour table, but obviously
duplicated within the fact_utilization_hourly table.
My question was more generalized, but if a specific solution can be found,
fantastic. In general I am seeing joins drastically reduce performance
overall.
All are running on EC2; m1.xlarge with High I/O
Thanx,
Bryan
On Mon, Oct 28, 2013 at 9:10 AM, Bryan Senseman
Forwarding to user-list as this seems to have gotten lost
Actually, I'm running a few different versions of MonetDB on different environments. July2012SP2, Feb2013SP4 & SP5. July2012SP2 on Windows and Linux (Ubuntu); Feb2013SP5 on Windows and Linux, and Feb2013SP4 on Linux. All were installed using pre-built packages/installers. The trace files themselves are from a Linux Feb2013SP4 installation. The Feb2013SP5 install was an inplace install on top of July2012SP2 and wasn't sure if different "data" versions would cloud the trace.
To Martins questions (which I never got in my inbox...or junk) yes, date_hour_id is a unique key for the dim_date_hour table, but obviously duplicated within the fact_utilization_hourly table.
My question was more generalized, but if a specific solution can be found, fantastic. In general I am seeing joins drastically reduce performance overall.
All are running on EC2; m1.xlarge with High I/O
Thanx, Bryan
---------- Forwarded message ---------- From: Bryan Senseman
Date: Sat, Oct 19, 2013 at 8:51 PM Subject: Re: Are joins really this expensive? To: Stefan Manegold Ask, and ye shall receive, let me know if you need anything else; and I can't share the data, nor would you want a many GB data bomb.
Bryan
On Sat, Oct 19, 2013 at 3:17 PM, Stefan Manegold
wrote: Dear Bryan,
unless you could share your data such that we could reproduce the problem, could you please profile both queries using TRACE, stethoscope or tomograph; cf., http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming and share the respective output?
Then we might be able analyze what exactly causes the performance difference between both queries.
Thanks, Stefan
Bryan Senseman
wrote: I have created a traditional star-schema in MonetDB, using both Linux and Windows. We are using Mondrian OLAP on top so it generates the standard OLAP-y type of query using group-bys of dimensional values. What I’m finding is that doing joins in MonetDB is extremely expensive, for example:
The following query against a fact table with 261 million rows returns in 9-11seconds consistently
select date_hour_id/10000 as date_month_id, avg(seconds) from fact_utilization_hourly group by date_month_id;
returning 41 rows, the query includes a “Trick” to get to a month_year value because the time_ids are smart keys… yyyyMMddhh.
With the following query, using standard OLAP-y syntax, the query takes over 5 mins consistently.
select year_month_value, avg(seconds) from fact_utilization_hourly f join dim_date_hour d on f.date_hour_id = d.date_hour_id group by year_month_value;
returning the same 41 rows.
I’m at a loss to figure out why this join causes a 3200% percent increase in response time. And this seems to have gotten worse in SP5 from anecdotal evidence.
Bryan
------------------------------
This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
------------------------------
users-list mailing list users-list@monetdb.org
-- | Stefan.Manegold@CWI.nl | Database Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
-- ------------------------------ This transmission is confidential and intended solely for the use of the recipient named above. It may contain confidential, proprietary, or legally privileged information. If you are not the intended recipient, you are hereby notified that any unauthorized review, use, disclosure or distribution is strictly prohibited. If you have received this transmission in error, please contact the sender by reply e-mail and delete the original transmission and all copies from your system.
participants (1)
-
Bryan Senseman