Are joins really this expensive?

19 Oct 2013

      I have created a traditional star-schema in MonetDB, using both Linux and
Windows.  We are using Mondrian OLAP on top so it generates the standard
OLAP-y type of query using group-bys of dimensional values.  What I’m
finding is that doing joins in MonetDB is extremely expensive, for example:

The following query against a fact table with 261 million rows returns in
9-11seconds consistently

select date_hour_id/10000 as date_month_id, avg(seconds) from
fact_utilization_hourly group by date_month_id;

returning 41 rows, the query includes a “Trick” to get to a month_year
value because the  time_ids are smart keys… yyyyMMddhh.

With the following query, using standard OLAP-y syntax, the query takes
over 5 mins consistently.

select year_month_value, avg(seconds) from fact_utilization_hourly f join
dim_date_hour d on f.date_hour_id = d.date_hour_id group by
year_month_value;

returning the same 41 rows.

I’m at a loss to figure out why this join causes a 3200% percent increase
in response time.  And this seems to have gotten worse in SP5 from
anecdotal evidence.

Bryan

-- 

------------------------------

This transmission is confidential and intended solely for the use of the 
recipient named above. It may contain confidential, proprietary, or legally 
privileged information. If you are not the intended recipient, you are 
hereby notified that any unauthorized review, use, disclosure or 
distribution is strictly prohibited. If you have received this transmission 
in error, please contact the sender by reply e-mail and delete the original 
transmission and all copies from your system.

Bryan Senseman

Stefan Manegold

Martin Kersten

Stefan Manegold

tags

participants (3)