Thank you Martin, I will try this solution, but I know that monetdb uses dictionary compression for strings, isn't similar for what you have suggested? 

On Mon, Nov 2, 2015 at 10:43 PM, Martin Kersten <martin.kersten@cwi.nl> wrote:
Hi

A standard 'trick' in many research papers involving joins over huge amounts of unique strings, is to
replace them by a short integer key (e.g. a hash value). This brings down your storage footprint
significantly and also many of the algorithms become based on integer comparisons instead of strings.

The strings can always be recovered in the final stage (e.g. your 250 rows)

MonetDB uses optimized code paths for range joins.

regards, Martin


On 02/11/15 19:44, imad hajj chahine wrote:
Thank you Stefan,

So as I understand that the algebra.(left)fetchjoin() is the most optimized way in monetdb.

I was hoping in my case to gain some precious seconds by trying to instruct monetdb to use other algorithm,
I have a small table around 50k rows (that should fit easily in memory) that need to be joined with a big table of 12M rows, the join results is around 150M rows.
The join key is a char(21) column which is the first column of a primary key defined on both tables (I think this why its reflected in the algebra.(left)fetchjoin()).
On the join I am doing an aggregate on a combination of columns from both tables.

The execution time is around 80s, returning 250 rows, similar queries on different columns needs to be executed around 50 times.
I am trying to avoid to materialize the join results in a table and then execute the aggregation on the materialized table, the reason is that it will consume a 35GB of disk space.

Is it hard to achieve a better performance by relying only on the join?

Another performance issue i am facing is with range join on date columns, does monetdb has a specific method/index for this kind of join?

Regards.



On Mon, Nov 2, 2015 at 7:48 PM, Stefan Manegold <Stefan.Manegold@cwi.nl <mailto:Stefan.Manegold@cwi.nl>> wrote:


    Hi,

    whenever algebra.(left)fetchjoin() is used, you surely do not want to use hash-join or merge-join, instead.
    fetch-join is MonetDB lingo for the binary match on OIDs that can exploit positional lookups because (at least)
    one of the arguments is a dense (ascending) (non-materialized) sequence of OIDs (aka., "Virtual OIDs" or "VOID").
    This implicitly performs a (very efficient) merge whenever also the second argument is sorted,
    and otherwise implicitly a very efficient hash using identity as a perfect hash function (no collisions, no false positives).
    This operation does not come from / represent a table-join on SQL level, but rather the projection (aka.
    column reconstruction) required in (late-materialization) columnar query execution.

    Having said that, in "real" join cases, i.e., when algebra.join() is used on value (rather than OID) columns,
    the actual join algorithm is chosen on-the-fly in side the algebra.join() MAL operation, and thus does not
    show in the MAL plan.
    You can see (some of) the internal algorithm decisions by starting mserver5 with the --algorithm command line switch
    (see the mserver5 man page for details).

    There is no way to "overrule" the runtime algorithm decisions made inside MonetDB's kernel (GDK).

    Best,
    Stefan

    ----- On Nov 2, 2015, at 5:37 PM, imad hajj chahine imad.hajj.chahine@gmail.com <mailto:imad.hajj.chahine@gmail.com> wrote:

     > Hi,
     >
     > How can we control the algorithm used for the join operation? In the trace i am
     > always getting algebra.leftfetchjoin, what could possibly be done to use merge
     > join or hash join that will be reflected in the trace by algebra.join.
     >
     > Thanks.
     >
     > _______________________________________________
     > users-list mailing list
     > users-list@monetdb.org <mailto:users-list@monetdb.org>
     > https://www.monetdb.org/mailman/listinfo/users-list

    --
    | Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
    | www.CWI.nl/~manegold/ <http://www.CWI.nl/~manegold/>  | Science Park 123 (L321) |
    | +31 (0)20 592-4212 <tel:%2B31%20%280%2920%20592-4212>     | 1098 XG Amsterdam  (NL) |
    _______________________________________________
    users-list mailing list
    users-list@monetdb.org <mailto:users-list@monetdb.org>
    https://www.monetdb.org/mailman/listinfo/users-list




_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list


_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list