21 Apr
2016
21 Apr
'16
4:21 p.m.
Related to my previous question about persisting hashes, I would like to throw another one. BATsubjoin has a series of heuristics to decide what type of join implementation to use. When using hash-join, the latest rule says: if nothing else applied, build a hash on the smaller bat. Could you tell me what is the rationale for this? >From what I could verify: - when sizes are comparable: it doesn't really make much difference which side is hashed - when sizes differ much: sure, building the hash table on that is much cheaper, but the join as a whole becomes 4-5 times slower then when hashing on the larger bat. In which case hashing on the larger bat is a good option? Cheers, Roberto