
Thank you Stefan,
So as I understand that the algebra.(left)fetchjoin() is the most optimized
way in monetdb.
I was hoping in my case to gain some precious seconds by trying to instruct
monetdb to use other algorithm,
I have a small table around 50k rows (that should fit easily in memory)
that need to be joined with a big table of 12M rows, the join results
is around 150M rows.
The join key is a char(21) column which is the first column of a primary
key defined on both tables (I think this why its reflected in the
algebra.(left)fetchjoin()).
On the join I am doing an aggregate on a combination of columns from both
tables.
The execution time is around 80s, returning 250 rows, similar queries on
different columns needs to be executed around 50 times.
I am trying to avoid to materialize the join results in a table and then
execute the aggregation on the materialized table, the reason is that it
will consume a 35GB of disk space.
Is it hard to achieve a better performance by relying only on the join?
Another performance issue i am facing is with range join on date columns,
does monetdb has a specific method/index for this kind of join?
Regards.
On Mon, Nov 2, 2015 at 7:48 PM, Stefan Manegold
Hi,
whenever algebra.(left)fetchjoin() is used, you surely do not want to use hash-join or merge-join, instead. fetch-join is MonetDB lingo for the binary match on OIDs that can exploit positional lookups because (at least) one of the arguments is a dense (ascending) (non-materialized) sequence of OIDs (aka., "Virtual OIDs" or "VOID"). This implicitly performs a (very efficient) merge whenever also the second argument is sorted, and otherwise implicitly a very efficient hash using identity as a perfect hash function (no collisions, no false positives). This operation does not come from / represent a table-join on SQL level, but rather the projection (aka. column reconstruction) required in (late-materialization) columnar query execution.
Having said that, in "real" join cases, i.e., when algebra.join() is used on value (rather than OID) columns, the actual join algorithm is chosen on-the-fly in side the algebra.join() MAL operation, and thus does not show in the MAL plan. You can see (some of) the internal algorithm decisions by starting mserver5 with the --algorithm command line switch (see the mserver5 man page for details).
There is no way to "overrule" the runtime algorithm decisions made inside MonetDB's kernel (GDK).
Best, Stefan
----- On Nov 2, 2015, at 5:37 PM, imad hajj chahine imad.hajj.chahine@gmail.com wrote:
Hi,
How can we control the algorithm used for the join operation? In the trace i am always getting algebra.leftfetchjoin, what could possibly be done to use merge join or hash join that will be reflected in the trace by algebra.join.
Thanks.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list