Re: Hashjoin performance with large vs small tables

11 May 2015


      ...
...
Also, those 430ms are not invested. The second time will still take
430ms. So
hashing on a very small bat is never a good investment. On the contrary,
hashing on a larger (but not too much) table is a good investment. The
next
time a similar query comes in, it will be sub-millisecond.
Well, this is a trade-off that in in general hard to judge.
If the bigger table / BAT is a base table/BAT, the hash table will
(nowadays)
be made persistent and *could* be reused --- whether it indeed will be
reused,
we cannot predict. If the bigger table is a transient intermediate result,
re-use is unlikely ...
That's fair.
...
Having said that, is your smaller table a base table or an intermediate
result
that is (might be) a tiny slice of a large (huge) base table?
Then current code might build the hash on the entire parent BAT rather
than on
the tiny slice ...
They both are base tables. The tiny table is created and a single insert is
done. The large one is also a regular table, with NOT NULL constraint on
the join column and the entire table is marked read-only.
...
Also: Which version of MonetDB are we talking about?
Oct2014 SP3
...
Stefan
--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list

Re: Hashjoin performance with large vs small tables

Roberto Cornacchia