Why aren't hashes preserved by joins?
I noticed a difference between - a hash-based string selection from a persistent, read-only table - a hash-based join on the same table and same column They both build a hash on the same string column (verified with gdb), but the select can reuse the hash (second call is almost free), while the join keeps rebuilding the hash. Is this expected? Roberto
I just noticed that PERSISTENTHASH (checked by BAThash) in gdk_private is
not defined (commented). But this wouldn't explain the different between
selection and join.
On 21 April 2016 at 18:02, Roberto Cornacchia
I noticed a difference between - a hash-based string selection from a persistent, read-only table - a hash-based join on the same table and same column
They both build a hash on the same string column (verified with gdb), but the select can reuse the hash (second call is almost free), while the join keeps rebuilding the hash.
Is this expected?
Roberto
Actually, the selection can reuse the hash only when mitosis is not active.
Perhaps this makes sense.
sql>set optimizer='sequential_pipe';
operation successful (0.830ms)
sql>select value from obj_string where value = 'apple' limit 1;
+-------+
| value |
+=======+
| apple |
+-------+
1 tuple (570.217ms)
sql>select value from obj_string where value = 'apple' limit 1;
+-------+
| value |
+=======+
| apple |
+-------+
1 tuple (1.991ms)
But still, this isn't happening with joins, even with sequential pipe.
One difference I noticed is that subselect takes both the tid and the
persistent bat as direct inputs, while subjoin takes the leftfetchjoin
between the same tid and persistent bat as input. Can that be the reason?
On 21 April 2016 at 18:07, Roberto Cornacchia
I just noticed that PERSISTENTHASH (checked by BAThash) in gdk_private is not defined (commented). But this wouldn't explain the different between selection and join.
On 21 April 2016 at 18:02, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I noticed a difference between - a hash-based string selection from a persistent, read-only table - a hash-based join on the same table and same column
They both build a hash on the same string column (verified with gdb), but the select can reuse the hash (second call is almost free), while the join keeps rebuilding the hash.
Is this expected?
Roberto
The PERSISTENTHASH should not make a difference between joins and
selections, and imo hash should be reusable during joins as they are
during selections. I have not noticed what you saw, so maybe someone
else has an explanation. If not, it might even be a bug.
Now, slices and mitosis is another tricky part. Hashes are not working
with slices and that is an issue that we need to address by
redesigning maybe some parts (it might have already been addressed and
I dont know of).
On Fri, Apr 22, 2016 at 12:53 PM, Roberto Cornacchia
Actually, the selection can reuse the hash only when mitosis is not active. Perhaps this makes sense.
sql>set optimizer='sequential_pipe'; operation successful (0.830ms) sql>select value from obj_string where value = 'apple' limit 1; +-------+ | value | +=======+ | apple | +-------+ 1 tuple (570.217ms) sql>select value from obj_string where value = 'apple' limit 1; +-------+ | value | +=======+ | apple | +-------+ 1 tuple (1.991ms)
But still, this isn't happening with joins, even with sequential pipe. One difference I noticed is that subselect takes both the tid and the persistent bat as direct inputs, while subjoin takes the leftfetchjoin between the same tid and persistent bat as input. Can that be the reason?
On 21 April 2016 at 18:07, Roberto Cornacchia
wrote: I just noticed that PERSISTENTHASH (checked by BAThash) in gdk_private is not defined (commented). But this wouldn't explain the different between selection and join.
On 21 April 2016 at 18:02, Roberto Cornacchia
wrote: I noticed a difference between - a hash-based string selection from a persistent, read-only table - a hash-based join on the same table and same column
They both build a hash on the same string column (verified with gdb), but the select can reuse the hash (second call is almost free), while the join keeps rebuilding the hash.
Is this expected?
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Could someone help me understand which conditions (if any) should hold for
hashjoins to reuse previously build hashes?
(from latest Jul2015 onward)
At the beginning of Jul2015 branch, I had the impression this was working
well, but now I cannot reproduce it.
On 22 April 2016 at 13:09, Lefteris
The PERSISTENTHASH should not make a difference between joins and selections, and imo hash should be reusable during joins as they are during selections. I have not noticed what you saw, so maybe someone else has an explanation. If not, it might even be a bug.
Now, slices and mitosis is another tricky part. Hashes are not working with slices and that is an issue that we need to address by redesigning maybe some parts (it might have already been addressed and I dont know of).
On Fri, Apr 22, 2016 at 12:53 PM, Roberto Cornacchia
wrote: Actually, the selection can reuse the hash only when mitosis is not active. Perhaps this makes sense.
sql>set optimizer='sequential_pipe'; operation successful (0.830ms) sql>select value from obj_string where value = 'apple' limit 1; +-------+ | value | +=======+ | apple | +-------+ 1 tuple (570.217ms) sql>select value from obj_string where value = 'apple' limit 1; +-------+ | value | +=======+ | apple | +-------+ 1 tuple (1.991ms)
But still, this isn't happening with joins, even with sequential pipe. One difference I noticed is that subselect takes both the tid and the persistent bat as direct inputs, while subjoin takes the leftfetchjoin between the same tid and persistent bat as input. Can that be the reason?
On 21 April 2016 at 18:07, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I just noticed that PERSISTENTHASH (checked by BAThash) in gdk_private
is
not defined (commented). But this wouldn't explain the different between selection and join.
On 21 April 2016 at 18:02, Roberto Cornacchia
wrote: I noticed a difference between - a hash-based string selection from a persistent, read-only table - a hash-based join on the same table and same column
They both build a hash on the same string column (verified with gdb),
but
the select can reuse the hash (second call is almost free), while the join keeps rebuilding the hash.
Is this expected?
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Lefteris
-
Roberto Cornacchia