Roberto,
just to recap all facts:
- MonetDB Oct2014-SP3
- equality join between 2 string-BATs
- both BATs are persistent "base" BATs
- larger BAT has 16 M BUNs
- smaller BAT has 1 BUN
- when (forcefully) building the hash on the larger one,
and then performing a single probe from the smaller one,
the first ("cold") join takes 30 ms (building the hash table),
while any next ("hot") one takes 0.8 ms (re-using the pre-built hash table).
This suggests ~29.2 ms for building the 16 M hash table
and 0.8 ms for a single probe into that hash table.
- when building the 1 BUN hash table on the smaller one,
and then performing 16 M probes from the larger one,
the (first?) ("cold"?) join takes 430 ms?
+ How long does a subsequent ("hot") join take (re-using the pre-built hash table)?
Could you run detailed profiling (e.g., using valgrind/callgrind) to analyze where
the time goes in all 4 cases (hash on larger vs. hash on smaller & "cold" vs. "hot")?
Could you share your data to reproduce and analyze the problem?
Thanks!
Stefan
----- On May 11, 2015, at 6:49 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
>> Also, those 430ms are not invested. The second time will still take 430ms. So
>> hashing on a very small bat is never a good investment. On the contrary,
>> hashing on a larger (but not too much) table is a good investment. The next
>> time a similar query comes in, it will be sub-millisecond.
>
> Well, this is a trade-off that in in general hard to judge.
> If the bigger table / BAT is a base table/BAT, the hash table will (nowadays)
> be made persistent and *could* be reused --- whether it indeed will be reused,
> we cannot predict. If the bigger table is a transient intermediate result,
> re-use is unlikely ...
>
>
> That's fair.
>
>
> Having said that, is your smaller table a base table or an intermediate result
> that is (might be) a tiny slice of a large (huge) base table?
> Then current code might build the hash on the entire parent BAT rather than on
> the tiny slice ...
>
>
> They both are base tables. The tiny table is created and a single insert is
> done. The large one is also a regular table, with NOT NULL constraint on the
> join column and the entire table is marked read-only.
>
>
>
> Also: Which version of MonetDB are we talking about?
>
>
> Oct2014 SP3
>
>
> Stefan
>
> --
>| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> _______________________________________________
> developers-list mailing list
> developers-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
>
> _______________________________________________
> developers-list mailing list
> developers-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
--
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list