[Monetdb-developers] hashjoin and strHash

Wouter Alink

18 Dec 2009 18 Dec '09

3:22 p.m.

Dear developers, I would like to propose a change in GDK and hear opinions. It is about the following issue: in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap. Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT? Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created. Any suggestions whether such a change is useful? Which benchmarks will be influenced? I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT? Greetings, Wouter Index: src/gdk/gdk_relop.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, + except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate);

Show replies by date

Lefteris

18 Dec 18 Dec

4:01 p.m.

Hi Wouter, funny think, I had the same exact problem and we were thinking about this issue. The idea here is that this observation for strings might not be always true, and it is a situation that cannot be always determined on the kernel level. Correct me if I am wrong, but your benefit on query comes because the hash in the large BAT is already there, that's why the second time you get 0.01? You mention hot run so I assume the BAT is already there with a hash index. While in the original situation the hash is on the small BAT thus you don't benefit from the hot run. But if a big BAT of strings is to be used again it is unknown in the gdk level. So, I solved the problem by forcing the hash index on the big BAT in a higher level (in Monet5) where it knows something more about the application (in my case RDF store). Can you do instead that? force the hash index in a higher level for you application? If gdk see a hash index already there, then it will choose that independent of the size. lefteris On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote:

...

Dear developers,

I would like to propose a change in GDK and hear opinions. It is about the following issue:

in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap.

Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT?

Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created.

Any suggestions whether such a change is useful? Which benchmarks will be influenced?

I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT?

Greetings, Wouter

Index: src/gdk/gdk_relop.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, + except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate);

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Stefan Manegold

10:51 p.m.

Hi Wouter, in the lines of Lefteris' reply: for a single join with no hash table present a priori, the number of hash function calls is euqal to the number of BUNs in both BATs; for each inner BUN the function need to be called to build the hash table, for each outer BUN it needs to be called to probe the hash table. Hence, the pure hashing costs are independent of which BAT is inner and which outer. Given that, the reason the choose the smaller as inner is indeed to increase spacial locallity (and thus performance) of the inherently random access while building and probing the hashtable. As Lefteris pointed out, the "operational optimization" in GDK is a pure peephole optimization dealing only with the very operation at hand. I.e., in general it cannot anticipate the benefits of future re-use of efforts, like investing in the (more expensive) building of a larger hash table to be able to re-use this in several later operations --- which IMHO is independent of the data type. Such descisions need to be made at higher levels, either in MAL optimizers or in the front-end that generates the MAL plan. Stefan On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:

...

Hi Wouter,

funny think, I had the same exact problem and we were thinking about this issue. The idea here is that this observation for strings might not be always true, and it is a situation that cannot be always determined on the kernel level. Correct me if I am wrong, but your benefit on query comes because the hash in the large BAT is already there, that's why the second time you get 0.01? You mention hot run so I assume the BAT is already there with a hash index. While in the original situation the hash is on the small BAT thus you don't benefit from the hot run. But if a big BAT of strings is to be used again it is unknown in the gdk level. So, I solved the problem by forcing the hash index on the big BAT in a higher level (in Monet5) where it knows something more about the application (in my case RDF store). Can you do instead that? force the hash index in a higher level for you application? If gdk see a hash index already there, then it will choose that independent of the size.

lefteris

On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote:

...
Dear developers,

I would like to propose a change in GDK and hear opinions. It is about the following issue:

in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap.

Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT?

Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created.

Any suggestions whether such a change is useful? Which benchmarks will be influenced?

I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT?

Greetings, Wouter

Index: src/gdk/gdk_relop.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, + except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate);

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Wouter Alink

19 Dec 19 Dec

11:21 a.m.

Lefteris, you are correct in that i meant 'second time the query was run' when I wrote 'hot run'. I see that at GDK level reuse cannot be estimated. Although with current hardware which has an abundance of memory, and the fact that strings take up much more storage than a single BUN (so a hash-entry is usually relatively small compared to its data) GDK might weigh the additional costs. GDK also decides which things to keep in memory or throw it out, which in turn is also based on reuse. The costs for performing the initial join are dominated by the strHash function, and building the hashtable on the big BAT or the smaller BAT makes (almost) no difference, except for the additional memory use. If on such a big bat again a join is performed, it will be beneficial to have the hashtable in place. What I was hoping for were explanations of situations where it makes no sense to build the hashtable on the bigger string BAT, but a good counter-example I haven't seen. In general, i can see, it would not be beneficial if the big BAT is not joined twice, but if it doesn't hurt too much, couldn't it just be the default? Eventually I would like to be using the SQL layer only. Here there would be plenty of tables with string-columns, and some will be joined against. Should a MAL optimizer detect that I am about to join two string-BATs, and that one BAT is bigger than the other and has many different values, and therefore should build a hashtable on the bigger one? The MAL optimizer can only guess about my next query (although I agree that it could do a better job at guessing), and calculating heapsize/batsize seems to be an operation that is also difficult to do on a MAL layer. is really nobody in favour of changing the behavior of joining string bats for large bats with many different values? well, than I give up. Wouter 2009/12/18 Stefan Manegold :

...

Hi Wouter,

in the lines of Lefteris' reply: for a single join with no hash table present a priori, the number of hash function calls is euqal to the number of BUNs in both BATs; for each inner BUN the function need to be called to build the hash table, for each outer BUN it needs to be called to probe the hash table. Hence, the pure hashing costs are independent of which BAT is inner and which outer. Given that, the reason the choose the smaller as inner is indeed to increase spacial locallity (and thus performance) of the inherently random access while building and probing the hashtable.

As Lefteris pointed out, the "operational optimization" in GDK is a pure peephole optimization dealing only with the very operation at hand. I.e., in general it cannot anticipate the benefits of future re-use of efforts, like investing in the (more expensive) building of a larger hash table to be able to re-use this in several later operations --- which IMHO is independent of the data type. Such descisions need to be made at higher levels, either in MAL optimizers or in the front-end that generates the MAL plan.

Stefan

On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:

...
Hi Wouter,

funny think, I had the same exact problem and we were thinking about this issue. The idea here is that this observation for strings might not be always true, and it is a situation that cannot be always determined on the kernel level. Correct me if I am wrong, but your benefit on query comes because the hash in the large BAT is already there, that's why the second time you get 0.01? You mention hot run so I assume the BAT is already there with a hash index. While in the original situation the hash is on the small BAT thus you don't benefit from the hot run. But if a big BAT of strings is to be used again it is unknown in the gdk level. So, I solved the problem by forcing the hash index on the big BAT in a higher level (in Monet5) where it knows something more about the application (in my case RDF store). Can you do instead that? force the hash index in a higher level for you application? If gdk see a hash index already there, then it will choose that independent of the size.

lefteris

On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote:

...
Dear developers,

I would like to propose a change in GDK and hear opinions. It is about the following issue:

in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap.

Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT?

Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created.

Any suggestions whether such a change is useful? Which benchmarks will be influenced?

I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT?

Greetings, Wouter

Index: src/gdk/gdk_relop.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, + except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate);

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |

Niels Nes

1:59 p.m.

On Sat, Dec 19, 2009 at 12:21:09PM +0100, Wouter Alink wrote:

...

Lefteris, you are correct in that i meant 'second time the query was run' when I wrote 'hot run'.

I see that at GDK level reuse cannot be estimated. Although with current hardware which has an abundance of memory, and the fact that strings take up much more storage than a single BUN (so a hash-entry is usually relatively small compared to its data) GDK might weigh the additional costs. GDK also decides which things to keep in memory or throw it out, which in turn is also based on reuse. The costs for performing the initial join are dominated by the strHash function, and building the hashtable on the big BAT or the smaller BAT makes (almost) no difference, except for the additional memory use. If on such a big bat again a join is performed, it will be beneficial to have the hashtable in place.

What I was hoping for were explanations of situations where it makes no sense to build the hashtable on the bigger string BAT, but a good counter-example I haven't seen. In general, i can see, it would not be beneficial if the big BAT is not joined twice, but if it doesn't hurt too much, couldn't it just be the default?

It cannot be the default as hashtables are simply to big. Assume a single column which barely fits into your main memory. In that case the hashtable will not fit leading to sub optimal performance. One global rule could be build hash on largest if it still fits into L2. build hash on smallest once we run out of L2. Niels

...

Eventually I would like to be using the SQL layer only. Here there would be plenty of tables with string-columns, and some will be joined against. Should a MAL optimizer detect that I am about to join two string-BATs, and that one BAT is bigger than the other and has many different values, and therefore should build a hashtable on the bigger one? The MAL optimizer can only guess about my next query (although I agree that it could do a better job at guessing), and calculating heapsize/batsize seems to be an operation that is also difficult to do on a MAL layer.

is really nobody in favour of changing the behavior of joining string bats for large bats with many different values? well, than I give up. Wouter

2009/12/18 Stefan Manegold :

...
Hi Wouter,

in the lines of Lefteris' reply: for a single join with no hash table present a priori, the number of hash function calls is euqal to the number of BUNs in both BATs; for each inner BUN the function need to be called to build the hash table, for each outer BUN it needs to be called to probe the hash table. Hence, the pure hashing costs are independent of which BAT is inner and which outer. Given that, the reason the choose the smaller as inner is indeed to increase spacial locallity (and thus performance) of the inherently random access while building and probing the hashtable.

As Lefteris pointed out, the "operational optimization" in GDK is a pure peephole optimization dealing only with the very operation at hand. I.e., in general it cannot anticipate the benefits of future re-use of efforts, like investing in the (more expensive) building of a larger hash table to be able to re-use this in several later operations --- which IMHO is independent of the data type. Such descisions need to be made at higher levels, either in MAL optimizers or in the front-end that generates the MAL plan.

Stefan

On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:

...
Hi Wouter,

funny think, I had the same exact problem and we were thinking about this issue. The idea here is that this observation for strings might not be always true, and it is a situation that cannot be always determined on the kernel level. Correct me if I am wrong, but your benefit on query comes because the hash in the large BAT is already there, that's why the second time you get 0.01? You mention hot run so I assume the BAT is already there with a hash index. While in the original situation the hash is on the small BAT thus you don't benefit from the hot run. But if a big BAT of strings is to be used again it is unknown in the gdk level. So, I solved the problem by forcing the hash index on the big BAT in a higher level (in Monet5) where it knows something more about the application (in my case RDF store). Can you do instead that? force the hash index in a higher level for you application? If gdk see a hash index already there, then it will choose that independent of the size.

lefteris

On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote:

...
Dear developers,

I would like to propose a change in GDK and hear opinions. It is about the following issue:

in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap.

Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT?

Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created.

Any suggestions whether such a change is useful? Which benchmarks will be influenced?

I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT?

Greetings, Wouter

Index: src/gdk/gdk_relop.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, + except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate);

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

Lefteris

2:28 p.m.

On Sat, Dec 19, 2009 at 2:59 PM, Niels Nes wrote:

...

On Sat, Dec 19, 2009 at 12:21:09PM +0100, Wouter Alink wrote:

...
Lefteris, you are correct in that i meant 'second time the query was run' when I wrote 'hot run'.

I see that at GDK level reuse cannot be estimated. Although with current hardware which has an abundance of memory, and the fact that strings take up much more storage than a single BUN (so a hash-entry is usually relatively small compared to its data) GDK might weigh the additional costs. GDK also decides which things to keep in memory or throw it out, which in turn is also based on reuse. The costs for performing the initial join are dominated by the strHash function, and building the hashtable on the big BAT or the smaller BAT makes (almost) no difference, except for the additional memory use. If on such a big bat again a join is performed, it will be beneficial to have the hashtable in place.

What I was hoping for were explanations of situations where it makes no sense to build the hashtable on the bigger string BAT, but a good counter-example I haven't seen. In general, i can see, it would not be beneficial if the big BAT is not joined twice, but if it doesn't hurt too much, couldn't it just be the default?

It cannot be the default as hashtables are simply to big. Assume a single column which barely fits into your main memory. In that case the hashtable will not fit leading to sub optimal performance.

One global rule could be build hash on largest if it still fits into L2. build hash on smallest once we run out of L2.

Niels

Fair enough, but if the big fits in L2, then it is not so big, and then the small is in L2 too, I think you will not see so much difference there, the entire join will be almost in L2, right? Wouter, is you big BAT string sorted? while the small unsorted? (many times happens that to me because the big is persistent and the small intermediate result). In such cases you can also trigger a sorting on the small and then merge join. there is some code in the gdk for that, but too strict in my opinion. Lefteris

...

...
Eventually I would like to be using the SQL layer only. Here there would be plenty of tables with string-columns, and some will be joined against. Should a MAL optimizer detect that I am about to join two string-BATs, and that one BAT is bigger than the other and has many different values, and therefore should build a hashtable on the bigger one? The MAL optimizer can only guess about my next query (although I agree that it could do a better job at guessing), and calculating heapsize/batsize seems to be an operation that is also difficult to do on a MAL layer.

is really nobody in favour of changing the behavior of joining string bats for large bats with many different values? well, than I give up. Wouter

2009/12/18 Stefan Manegold :

...
Hi Wouter,

in the lines of Lefteris' reply: for a single join with no hash table present a priori, the number of hash function calls is euqal to the number of BUNs in both BATs; for each inner BUN the function need to be called to build the hash table, for each outer BUN it needs to be called to probe the hash table. Hence, the pure hashing costs are independent of which BAT is inner and which outer. Given that, the reason the choose the smaller as inner is indeed to increase spacial locallity (and thus performance) of the inherently random access while building and probing the hashtable.

As Lefteris pointed out, the "operational optimization" in GDK is a pure peephole optimization dealing only with the very operation at hand. I.e., in general it cannot anticipate the benefits of future re-use of efforts, like investing in the (more expensive) building of a larger hash table to be able to re-use this in several later operations --- which IMHO is independent of the data type. Such descisions need to be made at higher levels, either in MAL optimizers or in the front-end that generates the MAL plan.

Stefan

On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:

...
Hi Wouter,

funny think, I had the same exact problem and we were thinking about this issue. The idea here is that this observation for strings might not be always true, and it is a situation that cannot be always determined on the kernel level. Correct me if I am wrong, but your benefit on query comes because the hash in the large BAT is already there, that's why the second time you get 0.01? You mention hot run so I assume the BAT is already there with a hash index. While in the original situation the hash is on the small BAT thus you don't benefit from the hot run. But if a big BAT of strings is to be used again it is unknown in the gdk level. So, I solved the problem by forcing the hash index on the big BAT in a higher level (in Monet5) where it knows something more about the application (in my case RDF store). Can you do instead that? force the hash index in a higher level for you application? If gdk see a hash index already there, then it will choose that independent of the size.

lefteris

On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote:

...
Dear developers,

I would like to propose a change in GDK and hear opinions. It is about the following issue:

in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap.

Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT?

Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created.

Any suggestions whether such a change is useful? Which benchmarks will be influenced?

I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT?

Greetings, Wouter

Index: src/gdk/gdk_relop.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, + except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate);

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Sjoerd Mullender

2:42 p.m.

On 2009-12-19 15:28, Lefteris wrote:

...

On Sat, Dec 19, 2009 at 2:59 PM, Niels Nes wrote:

...
On Sat, Dec 19, 2009 at 12:21:09PM +0100, Wouter Alink wrote:

...
Lefteris, you are correct in that i meant 'second time the query was run' when I wrote 'hot run'.

I see that at GDK level reuse cannot be estimated. Although with current hardware which has an abundance of memory, and the fact that strings take up much more storage than a single BUN (so a hash-entry is usually relatively small compared to its data) GDK might weigh the additional costs. GDK also decides which things to keep in memory or throw it out, which in turn is also based on reuse. The costs for performing the initial join are dominated by the strHash function, and building the hashtable on the big BAT or the smaller BAT makes (almost) no difference, except for the additional memory use. If on such a big bat again a join is performed, it will be beneficial to have the hashtable in place.

What I was hoping for were explanations of situations where it makes no sense to build the hashtable on the bigger string BAT, but a good counter-example I haven't seen. In general, i can see, it would not be beneficial if the big BAT is not joined twice, but if it doesn't hurt too much, couldn't it just be the default?

It cannot be the default as hashtables are simply to big. Assume a single column which barely fits into your main memory. In that case the hashtable will not fit leading to sub optimal performance.

One global rule could be build hash on largest if it still fits into L2. build hash on smallest once we run out of L2.

Niels

Fair enough, but if the big fits in L2, then it is not so big, and then the small is in L2 too, I think you will not see so much difference there, the entire join will be almost in L2, right?

Wouter, is you big BAT string sorted? while the small unsorted? (many times happens that to me because the big is persistent and the small intermediate result). In such cases you can also trigger a sorting on the small and then merge join. there is some code in the gdk for that, but too strict in my opinion.

We should also consider building the hash table on both sides. Since the hashes have to be calculated, no matter what, it makes sense to store those values. This doesn't necessarily have to be done up front. The hashes for the second table could be stored while doing the scan for the join.

...

Lefteris

...
...
Eventually I would like to be using the SQL layer only. Here there would be plenty of tables with string-columns, and some will be joined against. Should a MAL optimizer detect that I am about to join two string-BATs, and that one BAT is bigger than the other and has many different values, and therefore should build a hashtable on the bigger one? The MAL optimizer can only guess about my next query (although I agree that it could do a better job at guessing), and calculating heapsize/batsize seems to be an operation that is also difficult to do on a MAL layer.

is really nobody in favour of changing the behavior of joining string bats for large bats with many different values? well, than I give up. Wouter

2009/12/18 Stefan Manegold :

...
Hi Wouter,

in the lines of Lefteris' reply: for a single join with no hash table present a priori, the number of hash function calls is euqal to the number of BUNs in both BATs; for each inner BUN the function need to be called to build the hash table, for each outer BUN it needs to be called to probe the hash table. Hence, the pure hashing costs are independent of which BAT is inner and which outer. Given that, the reason the choose the smaller as inner is indeed to increase spacial locallity (and thus performance) of the inherently random access while building and probing the hashtable.

As Lefteris pointed out, the "operational optimization" in GDK is a pure peephole optimization dealing only with the very operation at hand. I.e., in general it cannot anticipate the benefits of future re-use of efforts, like investing in the (more expensive) building of a larger hash table to be able to re-use this in several later operations --- which IMHO is independent of the data type. Such descisions need to be made at higher levels, either in MAL optimizers or in the front-end that generates the MAL plan.

Stefan

On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:

...
Hi Wouter,

funny think, I had the same exact problem and we were thinking about this issue. The idea here is that this observation for strings might not be always true, and it is a situation that cannot be always determined on the kernel level. Correct me if I am wrong, but your benefit on query comes because the hash in the large BAT is already there, that's why the second time you get 0.01? You mention hot run so I assume the BAT is already there with a hash index. While in the original situation the hash is on the small BAT thus you don't benefit from the hot run. But if a big BAT of strings is to be used again it is unknown in the gdk level. So, I solved the problem by forcing the hash index on the big BAT in a higher level (in Monet5) where it knows something more about the application (in my case RDF store). Can you do instead that? force the hash index in a higher level for you application? If gdk see a hash index already there, then it will choose that independent of the size.

lefteris

On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote:

...
Dear developers,

I would like to propose a change in GDK and hear opinions. It is about the following issue:

in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap.

Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT?

Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created.

Any suggestions whether such a change is useful? Which benchmarks will be influenced?

I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT?

Greetings, Wouter

Index: src/gdk/gdk_relop.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, + except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate);

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- Sjoerd Mullender

Niels Nes

2:46 p.m.

On Sat, Dec 19, 2009 at 03:42:32PM +0100, Sjoerd Mullender wrote:

...

On 2009-12-19 15:28, Lefteris wrote:

...
On Sat, Dec 19, 2009 at 2:59 PM, Niels Nes wrote:

...
On Sat, Dec 19, 2009 at 12:21:09PM +0100, Wouter Alink wrote:

...
Lefteris, you are correct in that i meant 'second time the query was run' when I wrote 'hot run'.

I see that at GDK level reuse cannot be estimated. Although with current hardware which has an abundance of memory, and the fact that strings take up much more storage than a single BUN (so a hash-entry is usually relatively small compared to its data) GDK might weigh the additional costs. GDK also decides which things to keep in memory or throw it out, which in turn is also based on reuse. The costs for performing the initial join are dominated by the strHash function, and building the hashtable on the big BAT or the smaller BAT makes (almost) no difference, except for the additional memory use. If on such a big bat again a join is performed, it will be beneficial to have the hashtable in place.

What I was hoping for were explanations of situations where it makes no sense to build the hashtable on the bigger string BAT, but a good counter-example I haven't seen. In general, i can see, it would not be beneficial if the big BAT is not joined twice, but if it doesn't hurt too much, couldn't it just be the default?

It cannot be the default as hashtables are simply to big. Assume a single column which barely fits into your main memory. In that case the hashtable will not fit leading to sub optimal performance.

One global rule could be build hash on largest if it still fits into L2. build hash on smallest once we run out of L2.

Niels

Fair enough, but if the big fits in L2, then it is not so big, and then the small is in L2 too, I think you will not see so much difference there, the entire join will be almost in L2, right?

Wouter, is you big BAT string sorted? while the small unsorted? (many times happens that to me because the big is persistent and the small intermediate result). In such cases you can also trigger a sorting on the small and then merge join. there is some code in the gdk for that, but too strict in my opinion.

We should also consider building the hash table on both sides. Since the hashes have to be calculated, no matter what, it makes sense to store those values. This doesn't necessarily have to be done up front. The hashes for the second table could be stored while doing the scan for the join.

All true if you have the space to store them, and although you did calculate it doesn't make it free to store a hash, it will bring down the memory->cache bandwidth with about a factor 2. Niels

...

...
Lefteris

...
...
Eventually I would like to be using the SQL layer only. Here there would be plenty of tables with string-columns, and some will be joined against. Should a MAL optimizer detect that I am about to join two string-BATs, and that one BAT is bigger than the other and has many different values, and therefore should build a hashtable on the bigger one? The MAL optimizer can only guess about my next query (although I agree that it could do a better job at guessing), and calculating heapsize/batsize seems to be an operation that is also difficult to do on a MAL layer.

is really nobody in favour of changing the behavior of joining string bats for large bats with many different values? well, than I give up. Wouter

2009/12/18 Stefan Manegold :

...
Hi Wouter,

in the lines of Lefteris' reply: for a single join with no hash table present a priori, the number of hash function calls is euqal to the number of BUNs in both BATs; for each inner BUN the function need to be called to build the hash table, for each outer BUN it needs to be called to probe the hash table. Hence, the pure hashing costs are independent of which BAT is inner and which outer. Given that, the reason the choose the smaller as inner is indeed to increase spacial locallity (and thus performance) of the inherently random access while building and probing the hashtable.

As Lefteris pointed out, the "operational optimization" in GDK is a pure peephole optimization dealing only with the very operation at hand. I.e., in general it cannot anticipate the benefits of future re-use of efforts, like investing in the (more expensive) building of a larger hash table to be able to re-use this in several later operations --- which IMHO is independent of the data type. Such descisions need to be made at higher levels, either in MAL optimizers or in the front-end that generates the MAL plan.

Stefan

On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:

...
Hi Wouter,

funny think, I had the same exact problem and we were thinking about this issue. The idea here is that this observation for strings might not be always true, and it is a situation that cannot be always determined on the kernel level. Correct me if I am wrong, but your benefit on query comes because the hash in the large BAT is already there, that's why the second time you get 0.01? You mention hot run so I assume the BAT is already there with a hash index. While in the original situation the hash is on the small BAT thus you don't benefit from the hot run. But if a big BAT of strings is to be used again it is unknown in the gdk level. So, I solved the problem by forcing the hash index on the big BAT in a higher level (in Monet5) where it knows something more about the application (in my case RDF store). Can you do instead that? force the hash index in a higher level for you application? If gdk see a hash index already there, then it will choose that independent of the size.

lefteris

On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote: > Dear developers, > > I would like to propose a change in GDK and hear opinions. It is about > the following issue: > > in the BATjoin code, if there is no possibility to do a fetch or merge > join, a hashjoin is performed. A hashtable is created for the smallest > BAT. The reasons (i could think of) for choosing the smallest BAT for > the hashtable are that less space is required for the hashtable (which > in turn causes less cache misses when doing a lookup) and also because > the hashfunction used is assumed to be very inexpensive (it needs to > be calculated for each item in the large bat each time a join is > performed). > I can see that the hashfunction can be very efficient for data types > without indirection, but I feel that for data types like strings in > some cases this is a little different. If a string BAT for example > contains many different values (i.e. is not a bat which contains > enumeration values) the hashfunction will not be inexpensive anymore > (many cache misses), as each hashfunction call needs to hash a whole > (arbitrary long) string at an arbitrary place in the heap. > > Is it perhaps possible to specify that, when a BAT of type 'str' has > many different values a hashtable may be build on the large BAT > instead of on the small BAT? > > Reason that I ask this: I was analysing costs of a query in which I > had a few short strings (26 tuples, 1-column table: varchar) which I > wanted to look up in a dictionary (9M tuples, 2-column table: > int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON > a.strvalue=b.strvalue;" > The result is a small list of integers (26 or less tuples). This > operation currently takes roughly 1.5 seconds for a hot run, mostly > due to 9M strHash operations. By applying the patch below the > execution time for a hot run dropped down to .01 seconds. The > performance gain is caused by only having to perform strHash on the > items in the small bat once the hashtable for the large bat has been > created. > > Any suggestions whether such a change is useful? Which benchmarks will > be influenced? > > I guess this code change is probably not useful for large string BATs > with only few different values, but perhaps a guess could be made how > diverse the strings in a bat are (by taking a sample or perhaps simply > by looking at the ratio batsize/heapsize), and based on that determine > whether to build it on the large or small BAT? > > Greetings, > Wouter > > > Index: src/gdk/gdk_relop.mx > =================================================================== > RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v > retrieving revision 1.167.2.4 > diff -u -r1.167.2.4 gdk_relop.mx > --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 > +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 > @@ -1232,7 +1232,12 @@ > @- > hash join: the bread&butter join of monet > @c > - /* Simple rule, always build hash on the smallest */ > + /* Simple rule, always build hash on the smallest, > + except when it is a string-join, then we do the opposite */ > + if (swap && rcount < lcount && l->ttype == TYPE_str) { > + ALGODEBUG THRprintf(GDKout, "#BATjoin: > BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", > estimate); > + return BATmirror(BAThashjoin(BATmirror(r), > BATmirror(l), estimate)); > + } > if (swap && rcount > lcount) { > ALGODEBUG THRprintf(GDKout, "#BATjoin: > BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", > estimate); > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Monetdb-developers mailing list > Monetdb-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-developers >

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- Sjoerd Mullender

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

Stefan Manegold

3:21 p.m.

On Sat, Dec 19, 2009 at 12:21:09PM +0100, Wouter Alink wrote:

...

Lefteris, you are correct in that i meant 'second time the query was run' when I wrote 'hot run'.

I see that at GDK level reuse cannot be estimated. Although with current hardware which has an abundance of memory, and the fact that strings take up much more storage than a single BUN (so a hash-entry is usually relatively small compared to its data) GDK might weigh the additional costs. GDK also decides which things to keep in memory or throw it out, which in turn is also based on reuse. The costs for performing the initial join are dominated by the strHash function, and building the hashtable on the big BAT or the smaller BAT makes (almost) no difference, except for the additional memory use. If on such a big bat again a join is performed, it will be beneficial to have the hashtable in place.

What I was hoping for were explanations of situations where it makes no sense to build the hashtable on the bigger string BAT, but a good counter-example I haven't seen. In general, i can see, it would not be beneficial if the big BAT is not joined twice, but if it doesn't hurt too much, couldn't it just be the default?

Given it's inherently random access pattern, building a hash table "hurts", as soon as the hash table exceed L1 size, you can feel the pain once the has table exceed L2 size and you don't want to stand the pain once your hashtable exceeds main memory ... Recall that lookups to that hash table also need to go to the real data i.e. BUNs ans string heap; hence, the inherently random access with look ups (unless there are indeed only very few) will "hurt" as soon as hash table plus BAT exceed the above limits.

...

Eventually I would like to be using the SQL layer only. Here there would be plenty of tables with string-columns, and some will be joined against. Should a MAL optimizer detect that I am about to join two string-BATs, and that one BAT is bigger than the other and has many different values, and therefore should build a hashtable on the bigger one? The MAL optimizer can only guess about my next query (although I agree that it could do a better job at guessing), and calculating heapsize/batsize seems to be an operation that is also difficult to do on a MAL layer.

is really nobody in favour of changing the behavior of joining string bats for large bats with many different values? well, than I give up.

As I tried to explain in my previous mail, we need to find the proper way and location/layer to identify the scenario(s) where building the hash table on the larger BAT is better than building it on the smaller BAT; then we can do that; note that this also requires re-considering where (only in memory or also on disk) to store & keep the hash table; and when to discard it ... For now given the limited knowledge in GDK, I considere the current approach the "saver" one. Stefan

...

Wouter

2009/12/18 Stefan Manegold :

...
Hi Wouter,

in the lines of Lefteris' reply: for a single join with no hash table present a priori, the number of hash function calls is euqal to the number of BUNs in both BATs; for each inner BUN the function need to be called to build the hash table, for each outer BUN it needs to be called to probe the hash table. Hence, the pure hashing costs are independent of which BAT is inner and which outer. Given that, the reason the choose the smaller as inner is indeed to increase spacial locallity (and thus performance) of the inherently random access while building and probing the hashtable.

As Lefteris pointed out, the "operational optimization" in GDK is a pure peephole optimization dealing only with the very operation at hand. I.e., in general it cannot anticipate the benefits of future re-use of efforts, like investing in the (more expensive) building of a larger hash table to be able to re-use this in several later operations --- which IMHO is independent of the data type. Such descisions need to be made at higher levels, either in MAL optimizers or in the front-end that generates the MAL plan.

Stefan

On Fri, Dec 18, 2009 at 05:01:07PM +0100, Lefteris wrote:

...
Hi Wouter,

funny think, I had the same exact problem and we were thinking about this issue. The idea here is that this observation for strings might not be always true, and it is a situation that cannot be always determined on the kernel level. Correct me if I am wrong, but your benefit on query comes because the hash in the large BAT is already there, that's why the second time you get 0.01? You mention hot run so I assume the BAT is already there with a hash index. While in the original situation the hash is on the small BAT thus you don't benefit from the hot run. But if a big BAT of strings is to be used again it is unknown in the gdk level. So, I solved the problem by forcing the hash index on the big BAT in a higher level (in Monet5) where it knows something more about the application (in my case RDF store). Can you do instead that? force the hash index in a higher level for you application? If gdk see a hash index already there, then it will choose that independent of the size.

lefteris

On Fri, Dec 18, 2009 at 4:22 PM, Wouter Alink wrote:

...
Dear developers,

I would like to propose a change in GDK and hear opinions. It is about the following issue:

in the BATjoin code, if there is no possibility to do a fetch or merge join, a hashjoin is performed. A hashtable is created for the smallest BAT. The reasons (i could think of) for choosing the smallest BAT for the hashtable are that less space is required for the hashtable (which in turn causes less cache misses when doing a lookup) and also because the hashfunction used is assumed to be very inexpensive (it needs to be calculated for each item in the large bat each time a join is performed). I can see that the hashfunction can be very efficient for data types without indirection, but I feel that for data types like strings in some cases this is a little different. If a string BAT for example contains many different values (i.e. is not a bat which contains enumeration values) the hashfunction will not be inexpensive anymore (many cache misses), as each hashfunction call needs to hash a whole (arbitrary long) string at an arbitrary place in the heap.

Is it perhaps possible to specify that, when a BAT of type 'str' has many different values a hashtable may be build on the large BAT instead of on the small BAT?

Reason that I ask this: I was analysing costs of a query in which I had a few short strings (26 tuples, 1-column table: varchar) which I wanted to look up in a dictionary (9M tuples, 2-column table: int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON a.strvalue=b.strvalue;" The result is a small list of integers (26 or less tuples). This operation currently takes roughly 1.5 seconds for a hot run, mostly due to 9M strHash operations. By applying the patch below the execution time for a hot run dropped down to .01 seconds. The performance gain is caused by only having to perform strHash on the items in the small bat once the hashtable for the large bat has been created.

Any suggestions whether such a change is useful? Which benchmarks will be influenced?

I guess this code change is probably not useful for large string BATs with only few different values, but perhaps a guess could be made how diverse the strings in a bat are (by taking a sample or perhaps simply by looking at the ratio batsize/heapsize), and based on that determine whether to build it on the large or small BAT?

Greetings, Wouter

Index: src/gdk/gdk_relop.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v retrieving revision 1.167.2.4 diff -u -r1.167.2.4 gdk_relop.mx --- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4 +++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000 @@ -1232,7 +1232,12 @@ @- hash join: the bread&butter join of monet @c - /* Simple rule, always build hash on the smallest */ + /* Simple rule, always build hash on the smallest, + except when it is a string-join, then we do the opposite */ + if (swap && rcount < lcount && l->ttype == TYPE_str) { + ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate); + return BATmirror(BAThashjoin(BATmirror(r), BATmirror(l), estimate)); + } if (swap && rcount > lcount) { ALGODEBUG THRprintf(GDKout, "#BATjoin: BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n", estimate);

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |

5518

Age (days ago)

5519

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

Lefteris
Niels Nes
Sjoerd Mullender
Stefan Manegold
Wouter Alink

[Monetdb-developers] hashjoin and strHash

tags

participants (5)