Re: Are joins really this expensive?

30 Oct 2013

      Forwarding to user-list as this seems to have gotten lost

Actually, I'm running a few different versions of MonetDB on different
environments.  July2012SP2, Feb2013SP4 & SP5.  July2012SP2 on Windows and
Linux (Ubuntu); Feb2013SP5 on Windows and Linux, and Feb2013SP4 on Linux.
 All were installed using pre-built packages/installers.  The trace files
themselves are from a Linux Feb2013SP4 installation.  The Feb2013SP5
install was an inplace install on top of July2012SP2 and wasn't sure if
different "data" versions would cloud the trace.

To Martins questions (which I never got in my inbox...or junk) yes,
date_hour_id is a unique key for the dim_date_hour table, but obviously
duplicated within the fact_utilization_hourly table.

My question was more generalized, but if a specific solution can be found,
fantastic.  In general I am seeing joins drastically reduce performance
overall.

All are running on EC2; m1.xlarge with High I/O

Thanx,
Bryan

On Mon, Oct 28, 2013 at 9:10 AM, Bryan Senseman  wrote:
...
Forwarding to user-list as this seems to have gotten lost
Actually, I'm running a few different versions of MonetDB on different
environments.  July2012SP2, Feb2013SP4 & SP5.  July2012SP2 on Windows and
Linux (Ubuntu); Feb2013SP5 on Windows and Linux, and Feb2013SP4 on Linux.
 All were installed using pre-built packages/installers.  The trace files
themselves are from a Linux Feb2013SP4 installation.  The Feb2013SP5
install was an inplace install on top of July2012SP2 and wasn't sure if
different "data" versions would cloud the trace.
To Martins questions (which I never got in my inbox...or junk) yes,
date_hour_id is a unique key for the dim_date_hour table, but obviously
duplicated within the fact_utilization_hourly table.
My question was more generalized, but if a specific solution can be found,
fantastic.  In general I am seeing joins drastically reduce performance
overall.
All are running on EC2; m1.xlarge with High I/O
Thanx,
Bryan
---------- Forwarded message ----------
From: Bryan Senseman 
Date: Sat, Oct 19, 2013 at 8:51 PM
Subject: Re: Are joins really this expensive?
To: Stefan Manegold 
Ask, and ye shall receive, let me know if you need anything else; and I
can't share the data, nor would you want a many GB data bomb.
Bryan
On Sat, Oct 19, 2013 at 3:17 PM, Stefan Manegold wrote:
...
Dear Bryan,
unless you could share your data such that we could reproduce the problem,
could you please profile both queries using TRACE, stethoscope or
tomograph;
cf.,
http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming
and share the respective output?
Then we might be able analyze what exactly causes the performance
difference between both queries.
Thanks,
Stefan
Bryan Senseman  wrote:
...
I have created a traditional star-schema in MonetDB, using both Linux
and Windows.  We are using Mondrian OLAP on top so it generates the
standard OLAP-y type of query using group-bys of dimensional values.  What
I’m finding is that doing joins in MonetDB is extremely expensive, for
example:
The following query against a fact table with 261 million rows returns
in 9-11seconds consistently
select date_hour_id/10000 as date_month_id, avg(seconds) from
fact_utilization_hourly group by date_month_id;
returning 41 rows, the query includes a “Trick” to get to a month_year
value because the  time_ids are smart keys… yyyyMMddhh.
With the following query, using standard OLAP-y syntax, the query takes
over 5 mins consistently.
select year_month_value, avg(seconds) from fact_utilization_hourly f
join dim_date_hour d on f.date_hour_id = d.date_hour_id group by
year_month_value;
returning the same 41 rows.
I’m at a loss to figure out why this join causes a 3200% percent
increase in response time.  And this seems to have gotten worse in SP5 from
anecdotal evidence.
Bryan
------------------------------
This transmission is confidential and intended solely for the use of the
recipient named above. It may contain confidential, proprietary, or legally
privileged information. If you are not the intended recipient, you are
hereby notified that any unauthorized review, use, disclosure or
distribution is strictly prohibited. If you have received this transmission
in error, please contact the sender by reply e-mail and delete the original
transmission and all copies from your system.
------------------------------
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | Database Architectures (DA) |
| www.CWI.nl/~manegold | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
-- 

------------------------------

This transmission is confidential and intended solely for the use of the 
recipient named above. It may contain confidential, proprietary, or legally 
privileged information. If you are not the intended recipient, you are 
hereby notified that any unauthorized review, use, disclosure or 
distribution is strictly prohibited. If you have received this transmission 
in error, please contact the sender by reply e-mail and delete the original 
transmission and all copies from your system.

Bryan Senseman

tags

participants (1)