
Hi Stefan,
Actually, according to my second email, 430 should be compared to 30, not
to 0.8. So the difference in speed is "only" 15x.
Still, your question remains. However, even if the two performed equally,
the current case would not invest anything..
On 11 May 2015 at 18:40, Stefan Manegold
Hi Roberto,
thanks for reporting!
This need detailed profiling to understand what's happening. Why are 16M look-ups to a 1-tuple hash table 500x more expensive than building a 16M hash table (assuming that is it indeed built on the fly, i.e., does not already exist), and then performing a single look-up ... ?
Stefan
----- On May 11, 2015, at 6:30 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
Hi there,
Triggered by a suspiciously slow join, I had a look at BATsubjoin, in particular at how it is decided which algorithm to use.
I looked at a couple of cases with a join on strings. This is one (pseudo syntax):
l.count ~16M l.T.sorted = 0 l.T.revsorted = 0
r.count = 1 r.T.sorted = 1 r.T.revsorted = 1
In this case, a hash table on r is created (because it's smaller). This join takes 430ms . I forced swapping l and r, thus built the hash table on the larger bat, and then it takes 0.8ms .
Notice that though r.count = 1 is a bit of an extreme case, it does happen often in practice. And the case with a few tuples would not be very different.
The question I wanted to ask is:
is it always right to build the hash table on the small table? Perhaps some more heuristics can help? Like: if the smaller is very small, and the larger is not extremely large, then hash on the larger one?
Roberto
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list