Related to my previous question about persisting hashes, I would like to throw another one.

BATsubjoin has a series of heuristics to decide what type of join implementation to use. When using hash-join, the latest rule says: if nothing else applied, build a hash on the smaller bat.

Could you tell me what is the rationale for this?

From what I could verify:
- when sizes are comparable: it doesn't really make much difference which side is hashed
- when sizes differ much: sure, building the hash table on that is much cheaper, but the join as a whole becomes 4-5 times slower then when hashing on the larger bat.

In which case hashing on the larger bat is a good option?

Cheers,
Roberto