[Monetdb-developers] Single column select vs aggregation
Attached is a trace of a single column select operation, with a max, avg and equality. The equality takes about 4 times more time than the avg operation. From the plan I see that the operation is executed paralel, but I do wonder why suddenly the uselect takes such a high amount of time. Stefan
On Thu, Feb 18, 2010 at 05:46:31PM +0100, Stefan de Konink wrote:
Attached is a trace of a single column select operation, with a max, avg and equality.
The equality takes about 4 times more time than the avg operation.
From the plan I see that the operation is executed paralel, but I do wonder why suddenly the uselect takes such a high amount of time.
Which version of MonetDB? How configured / compiled? How was the mserver5 started (which command line options were given)? Any changes compared to the default monetdb5.conf? Is the "kvk" column sorted? Could you run the select query in the MAL debugger (DEBUG select kvk from kvk where kvk = 412657690010;), single step through it, set the debug mask to 2097152 (debug 2097152) just before the algebra.uselect() is executed, and report the output on the server console after the algebra.uselect() has been executed? Stefan
Stefan
sql>TRACE select max(kvk) from kvk; +--------------+ | L7 | +==============+ | 412657690010 | +--------------+ 1 tuple +-------+---------------------------------------------------------------------------------------------------------+ | ticks | stmt | +=======+=========================================================================================================+ | 39 | _33:bat[:oid,:lng]
[1196533] := sql.bind("sys","kvk","kvk",0,1196532@0,nil:oid); | | 7 | _11:bat[:oid,:oid] [0] := sql.bind_dbat("sys","kvk",1); | | 7 | _5:bat[:oid,:lng] [0] := sql.bind("sys","kvk","kvk",2); | | 12 | _31:bat[:oid,:lng] [1196532] := sql.bind("sys","kvk","kvk",0,0@0,1196532@0); | | 9 | _35 [1196533] := algebra.selectNotNil(_33= :bat[:oid,:lng][1196533]); | | 7 | _12 [0] := bat.reverse(_11= :bat[:oid,:oid][0]); | | 6 | _7 [0] := algebra.selectNotNil(_5= :bat[:oid,:lng][0]); | | 5 | _34 [1196532] := algebra.selectNotNil(_31= :bat[:oid,:lng][1196532]); | | 19 | _37 [1196533] := algebra.kdifference(_35= [1196533],_5= :bat[:oid,:lng][0]); | | 6 | _11:bat[:oid,:oid] := nil:BAT; | | 24 | _38 [0] := algebra.semijoin(_7= [0],_31= :bat[:oid,:lng][1196532]); | | 12 | _36 [1196532] := algebra.kdifference(_34= [1196532],_5= :bat[:oid,:lng][0]); | | 5 | _35 := nil:BAT; | | 15 | _40 [1196532] := algebra.kunion(_36= [1196532],_38= [0]); | | 4 | _31:bat[:oid,:lng] := nil:BAT; | | 17 | _34 := nil:BAT; | | 5 | _5:bat[:oid,:lng] := nil:BAT; | | 11 | _42 [1196532] := algebra.kdifference(_40= [1196532],_12= [0]); | | 9 | _38 := nil:BAT; | | 8 | _36 := nil:BAT; | | 6 | _8:bat[:oid,:lng] [0] := sql.bind("sys","kvk","kvk",1); | | 13 | _39 [0] := algebra.semijoin(_7= [0],_33= :bat[:oid,:lng][1196533]); | | 18 | _33:bat[:oid,:lng] := nil:BAT; | | 5 | _7 := nil:BAT; | | 8 | _40 := nil:BAT; | | 7 | _10 [0] := algebra.selectNotNil(_8= :bat[:oid,:lng][0]); | | 23 | _41 [1196533] := algebra.kunion(_37= [1196533],_39= [0]); | | 12 | _43 [1196533] := algebra.kdifference(_41= [1196533],_12= [0]); | | 9 | _39 := nil:BAT; | | 8 | _37 := nil:BAT; | | 5 | _8:bat[:oid,:lng] := nil:BAT; | | 12528 | _46 := aggr.max(_42= [1196532]); | | 25 | _41 := nil:BAT; | | 15285 | _47 := aggr.max(_43= [1196533]); | | 26 | _43 := nil:BAT; | | 117 | _44 [0] := algebra.kdifference(_10= [0],_12= [0]); | | 12 | _42 := nil:BAT; | | 10 | _48 := aggr.max(_44= [0]); | | 6 | _12 := nil:BAT; | | 6 | _10 := nil:BAT; | | 15 | _45 [3] := mat.pack(_46=243753400000:lng,_47=412657690010:lng,_48=nil:lng); | | 13 | _44 := nil:BAT; | | 27 | _49 [2] := algebra.selectNotNil(_45= [3]); | | 13 | _45 := nil:BAT; | | 53 | _13 := aggr.max(_49= [2]); | | 10 | _49 := nil:BAT; | | 23668 | barrier _72 := language.dataflow(); | | 15 | sql.exportValue(1,"sys.kvk","L7","bigint",64,0,6,_13=412657690010:lng,""); | | 11 | end s4_1; | | 23919 | user.s4_1(); | +-------+---------------------------------------------------------------------------------------------------------+ 50 tuples sql>TRACE select kvk from kvk where kvk = 412657690010; +--------------+ | kvk | +==============+ | 412657690010 | +--------------+ 1 tuple +--------+------------------------------------------------------------------------------------------------------------------------+ | ticks | stmt | +========+========================================================================================================================+ | 10 | _12:bat[:oid,:oid]
[0] := sql.bind_dbat("sys","kvk",1); | | 40 | _34:bat[:oid,:lng] [1196533] := sql.bind("sys","kvk","kvk",0,1196532@0,nil:oid); | | 9 | _13 [0] := bat.reverse(_12= :bat[:oid,:oid][0]); | | 8 | _6:bat[:oid,:lng] [0] := sql.bind("sys","kvk","kvk",2); | | 6 | _12:bat[:oid,:oid] := nil:BAT; | | 20 | _63 [1196533] := algebra.kdifference(_34= :bat[:oid,:lng][1196533],_6= :bat[:oid,:lng][0]); | | 23 | _65 [0] := algebra.semijoin(_6= :bat[:oid,:lng][0],_34= :bat[:oid,:lng][1196533]); | | 16 | _67 [1196533] := algebra.kunion(_63= [1196533],_65= [0]); | | 10 | _63 := nil:BAT; | | 11 | _65 := nil:BAT; | | 10 | _9:bat[:oid,:lng] [0] := sql.bind("sys","kvk","kvk",1); | | 21 | _11 [0] := algebra.uselect(_9= :bat[:oid,:lng][0],A0=412657690010:lng); | | 10 | _45 [0] := algebra.kdifference(_11= [0],_13= [0]); | | 7 | _11 := nil:BAT; | | 9 | _52 [0] := algebra.markT(_45= [0],3,2); | | 6 | _45 := nil:BAT; | | 7 | _57 [0] := bat.reverse(_52= [0]); | | 5 | _52 := nil:BAT; | | 18 | _71 [0] := algebra.leftjoin(_57= [0],_9= :bat[:oid,:lng][0]); | | 12 | _57 := nil:BAT; | | 6 | _9:bat[:oid,:lng] := nil:BAT; | | 33 | _8 [0] := algebra.uselect(_6= :bat[:oid,:lng][0],A0=412657690010:lng); | | 18 | _40 [0] := algebra.semijoin(_8= [0],_34= :bat[:oid,:lng][1196533]); | | 18 | _32:bat[:oid,:lng] [1196532] := sql.bind("sys","kvk","kvk",0,0@0,1196532@0); | | 14 | _60 [1196532] := algebra.kdifference(_32= :bat[:oid,:lng][1196532],_6= :bat[:oid,:lng][0]); | | 15 | _64 [0] := algebra.semijoin(_6= :bat[:oid,:lng][0],_32= :bat[:oid,:lng][1196532]); | | 15 | _66 [1196532] := algebra.kunion(_60= [1196532],_64= [0]); | | 11 | _60 := nil:BAT; | | 9 | _64 := nil:BAT; | | 13 | _39 [0] := algebra.semijoin(_8= [0],_32= :bat[:oid,:lng][1196532]); | | 9 | _8 := nil:BAT; | | 350476 | _36 [1] := algebra.uselect(_34= :bat[:oid,:lng][1196533],A0=412657690010:lng); | | 21359 | _34:bat[:oid,:lng] := nil:BAT; | | 37 | _38 [1] := algebra.kdifference(_36= [1],_6= :bat[:oid,:lng][0]); | | 17 | _42 [1] := algebra.kunion(_38= [1],_40= [0]); | | 6 | _36 := nil:BAT; | | 15 | _40 := nil:BAT; | | 13 | _44 [1] := algebra.kdifference(_42= [1],_13= [0]); | | 11 | _38 := nil:BAT; | | 16 | _49 [1] := algebra.markT(_44= [1],3,1); | | 10 | _42 := nil:BAT; | | 9 | _56 [1] := bat.reverse(_49= [1]); | | 9 | _44 := nil:BAT; | | 176 | _70 [1] := algebra.leftjoin(_56= [1],_67= [1196533]); | | 8 | _49 := nil:BAT; | | 11 | _67 := nil:BAT; | | 14 | _56 := nil:BAT; | | 385653 | _35 [0] := algebra.uselect(_32= :bat[:oid,:lng][1196532],A0=412657690010:lng); | | 33 | _37 [0] := algebra.kdifference(_35= [0],_6= :bat[:oid,:lng][0]); | | 8 | _6:bat[:oid,:lng] := nil:BAT; | | 17 | _41 [0] := algebra.kunion(_37= [0],_39= [0]); | | 20 | _39 := nil:BAT; | | 13 | _43 [0] := algebra.kdifference(_41= [0],_13= [0]); | | 9 | _41 := nil:BAT; | | 15 | _46 [0] := algebra.markT(_43= [0],3,0); | | 8 | _43 := nil:BAT; | | 21008 | _32:bat[:oid,:lng] := nil:BAT; | | 13 | _55 [0] := bat.reverse(_46= [0]); | | 18 | _37 := nil:BAT; | | 7 | _46 := nil:BAT; | | 8 | _13 := nil:BAT; | | 24 | _68 [0] := algebra.leftjoin(_55= [0],_66= [1196532]); | | 11 | _35 := nil:BAT; | | 24 | _14 [1] := mat.pack(_68= [0],_70= [1],_71= [0]); | | 12 | _66 := nil:BAT; | | 11 | _70 := nil:BAT; | | 9 | _68 := nil:BAT; | | 7 | _71 := nil:BAT; | | 12 | _55 := nil:BAT; | | 414894 | barrier _109 := language.dataflow(); | | 13 | _15 := sql.resultSet(1,1,_14= [1]); | | 12 | sql.rsColumn(_15=5,"sys.kvk","kvk","bigint",64,0,_14= [1]); | | 25 | _14 := nil:BAT; | | 6 | _20 := io.stdout(); | | 37 | sql.exportResult(_20=24201696,_15=5); | | 13 | end s5_1; | | 415505 | user.s5_1(412657690010:lng); | +--------+------------------------------------------------------------------------------------------------------------------------+ sql>TRACE select avg(kvk) from kvk; +------------------------+ | L10 | +========================+ | 180371715703.39435 | +------------------------+ 1 tuple +--------+------------------------------------------------------------------------------------------------------------------------+ | ticks | stmt | +========+========================================================================================================================+ | 40 | _36:bat[:oid,:lng]
[1196532] := sql.bind("sys","kvk","kvk",0,0@0,1196532@0); | | 8 | _9:bat[:oid,:oid] [0] := sql.bind_dbat("sys","kvk",1); | | 26 | _5:bat[:oid,:lng] [0] := sql.bind("sys","kvk","kvk",2); | | 8 | _7:bat[:oid,:lng] [0] := sql.bind("sys","kvk","kvk",1); | | 11 | _10 [0] := bat.reverse(_9= :bat[:oid,:oid][0]); | | 18 | _38:bat[:oid,:lng] [1196533] := sql.bind("sys","kvk","kvk",0,1196532@0,nil:oid); | | 58 | _41 [0] := algebra.semijoin(_5= :bat[:oid,:lng][0],_36= :bat[:oid,:lng][1196532]); | | 26 | _39 [1196532] := algebra.kdifference(_36= :bat[:oid,:lng][1196532],_5= :bat[:oid,:lng][0]); | | 29 | _40 [1196533] := algebra.kdifference(_38= :bat[:oid,:lng][1196533],_5= :bat[:oid,:lng][0]); | | 10 | _9:bat[:oid,:oid] := nil:BAT; | | 27 | _43 [1196532] := algebra.kunion(_39= [1196532],_41= [0]); | | 22 | _36:bat[:oid,:lng] := nil:BAT; | | 34 | _47 [0] := algebra.kdifference(_7= :bat[:oid,:lng][0],_10= [0]); | | 36 | _42 [0] := algebra.semijoin(_5= :bat[:oid,:lng][0],_38= :bat[:oid,:lng][1196533]); | | 24 | _41 := nil:BAT; | | 11 | _39 := nil:BAT; | | 9 | _5:bat[:oid,:lng] := nil:BAT; | | 16 | _45 [1196532] := algebra.kdifference(_43= [1196532],_10= [0]); | | 21 | _44 [1196533] := algebra.kunion(_40= [1196533],_42= [0]); | | 12 | _38:bat[:oid,:lng] := nil:BAT; | | 10 | _43 := nil:BAT; | | 13 | _46 [1196533] := algebra.kdifference(_44= [1196533],_10= [0]); | | 12 | _42 := nil:BAT; | | 10 | _40 := nil:BAT; | | 6 | _7:bat[:oid,:lng] := nil:BAT; | | 124025 | _49 [1196533] := batcalc.dbl(_46= [1196533]); | | 24 | _46 := nil:BAT; | | 8 | _52 [1196533] := algebra.selectNotNil(_49= [1196533]); | | 5 | _49 := nil:BAT; | | 143744 | _48 [1196532] := batcalc.dbl(_45= [1196532]); | | 18 | _45 := nil:BAT; | | 7 | _51 [1196532] := algebra.selectNotNil(_48= [1196532]); | | 5 | _48 := nil:BAT; | | 8902 | return sum := aggr.sum(b= [1196533],true); | | 37847 | _56 := aggr.sum(_52= [1196533]); | | 19 | _50 [0] := batcalc.dbl(_47= [0]); | | 22 | _47 := nil:BAT; | | 7 | _53 [0] := algebra.selectNotNil(_50= [0]); | | 5 | _50 := nil:BAT; | | 9 | return sum := aggr.sum(b= [0],true); | | 73 | _57 := aggr.sum(_53= [0]); | | 6 | _64 := aggr.count(_51= [1196532]); | | 5 | _65 := aggr.count(_52= [1196533]); | | 8776 | return sum := aggr.sum(b= [1196532],true); | | 33809 | _55 := aggr.sum(_51= [1196532]); | | 15 | _54 [3] := mat.pack(_55=1.396432356977703e+17,_56=2.9199800414197306e+17,_57=nil); | | 14669 | _52 := nil:BAT; | | 6346 | _58 [2] := algebra.selectNotNil(_54= [3]); | | 10 | _54 := nil:BAT; | | 11 | return sum := aggr.sum(b= [2],true); | | 80 | _11:dbl := aggr.sum(_58= [2]); | | 6 | _58 := nil:BAT; | | 6 | _66 := aggr.count(_53= [0]); | | 6 | _53 := nil:BAT; | | 10 | _63 [3] := mat.pack(_64=1196532:wrd,_65=1196533:wrd,_66=0:wrd); | | 6 | _67 [3] := algebra.selectNotNil(_63= [3]); | | 6 | _63 := nil:BAT; | | 5077 | return sum := aggr.sum(b= [3],true); | | 5203 | _12 := aggr.sum(_67= [3]); | | 10 | _67 := nil:BAT; | | 12456 | _51 := nil:BAT; | | 17 | _14 := calc.dbl(_12=2393065:wrd); | | 6 | _10 := nil:BAT; | | 14 | _44 := nil:BAT; | | 9195 | _13 := calc.==(_12=2393065:wrd,0:wrd); | | 21 | _15 := calc.ifthenelse(_13=false,nil,_14=2393065); | | 12 | _16 := calc./(_11=4.3164123983974336e+17:dbl,_15=2393065); | | 204610 | barrier _94 := language.dataflow(); | | 21 | sql.exportValue(1,"sys.","L10","double",53,0,9,_16=180371715703.39435,""); | | 12 | end s6_1; | | 204854 | user.s6_1(); | +--------+------------------------------------------------------------------------------------------------------------------------+ 71 tuples
------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev
_______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
Op 18-02-10 18:23, Stefan Manegold schreef:
On Thu, Feb 18, 2010 at 05:46:31PM +0100, Stefan de Konink wrote:
Attached is a trace of a single column select operation, with a max, avg and equality.
The equality takes about 4 times more time than the avg operation.
From the plan I see that the operation is executed paralel, but I do wonder why suddenly the uselect takes such a high amount of time.
Which version of MonetDB?
CVS HEAD, 2 am.
How configured / compiled?
--enable-assert --enable-debug
How was the mserver5 started (which command line options were given)?
/opt/monetdb-head/bin/mserver5 --dbfarm=/home/skinkie/monetb --dbname=kvk --dbinit="include sphinx; include sql;" --set mapi_port=50001
Any changes compared to the default monetdb5.conf?
No, not changed anything in there.
Is the "kvk" column sorted?
No, it does include a partial sort (msb are sorted). A 32bit column (kvks) that is (or either should be...) ends up with between 170ms ~ 220ms.
Could you run the select query in the MAL debugger (DEBUG select kvk from kvk where kvk = 412657690010;), single step through it, set the debug mask to 2097152 (debug 2097152) just before the algebra.uselect() is executed, and report the output on the server console after the algebra.uselect() has been executed?
Do you mean this?
mdb># _44 :=
algebra.uselect(_40=
On Thu, Feb 18, 2010 at 07:05:47PM +0100, Stefan de Konink wrote:
Op 18-02-10 18:23, Stefan Manegold schreef:
On Thu, Feb 18, 2010 at 05:46:31PM +0100, Stefan de Konink wrote:
Attached is a trace of a single column select operation, with a max, avg and equality.
The equality takes about 4 times more time than the avg operation.
From the plan I see that the operation is executed paralel, but I do wonder why suddenly the uselect takes such a high amount of time.
Which version of MonetDB?
CVS HEAD, 2 am.
How configured / compiled?
--enable-assert --enable-debug
How was the mserver5 started (which command line options were given)?
/opt/monetdb-head/bin/mserver5 --dbfarm=/home/skinkie/monetb --dbname=kvk --dbinit="include sphinx; include sql;" --set mapi_port=50001
Any changes compared to the default monetdb5.conf?
No, not changed anything in there.
Is the "kvk" column sorted?
No, it does include a partial sort (msb are sorted). A 32bit column (kvks) that is (or either should be...) ends up with between 170ms ~ 220ms.
Could you run the select query in the MAL debugger (DEBUG select kvk from kvk where kvk = 412657690010;), single step through it, set the debug mask to 2097152 (debug 2097152) just before the algebra.uselect() is executed, and report the output on the server console after the algebra.uselect() has been executed?
Do you mean this?
indeed
mdb># _44 := algebra.uselect(_40=
:bat[:oid,:lng][1196532],A0=412657690010:lng); mdb> #BAT_select_(b=tmp_5710): sampling: tmp1 = BATslice(b=tmp_5710, _lo=598266, _hi=598371); #BAT_select_(b=tmp_5710): sampling: tmp2 = BAT_select_(tmp1=tmp_5704, tl, th, tail); #BAT_select_(b=tmp_5704): BAT_scanselect(b=tmp_5704, bn=tmp_5703, tl, th, equi=1, nequi=0, lval=1, hval=1, nocheck=1); #seqscan_eq_lng_void_tloc_oid_vid_nocheck_noinc[if ( simple_EQ(tl ,v,lng) ),v,oid_ctr,oid_ctr++;](b=tmp_5704, bn=tmp_5703, tl, th, oid_ctr=598266, str_idx=0); #BAT_select_(b=tmp_5704): tmp_5703: hkey=1, tkey=0, hsorted=139659451564097, tsorted=0. #BAT_select_(b=tmp_5710): BAT_hashselect(b=tmp_5710, bn=tmp_5703, tl); (building hash-table on the fly) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"It" (in fact we) choose to do a hash select, and since there is no hash table, yet, we need to build it, which is infact more expensive than a simple scan select for this very operation (later operation *might* then benefit from the hash table ...): ======== $ grep -9n --color 'building hash-table on the fly' MonetDB/src/gdk/gdk_batop.mx 1210- int nocheck = (estimate >= batcnt); 1211- 1212- if (!preserve_order && equi && b->T->hash) { 1213- ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): BAT_hashselect(b=%s, bn=%s, tl); (using existing hash-table)\n", BATgetId(b), BATgetId(b), BATgetId(bn)); 1214- 1215- bn = BAT_hashselect(b, bn, tl); 1216- } else if (!preserve_order && equi && ATOMsize(b->ttype) > 1 && estimate * 100 < batcnt && batcnt * 2 * sizeof(int) < (GDK_mem_maxsize / 4)) { 1217- /* Build a hash-table on the fly for equi-select if the selectivity is low 1218- * and it is not too big */ 1219: ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): BAT_hashselect(b=%s, bn=%s, tl); (building hash-table on the fly)\n", BATgetId(b), BATgetId(b), BATgetId(bn)); 1220- 1221- bn = BAT_hashselect(b, bn, tl); 1222- } else { 1223- ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): BAT_scanselect(b=%s, bn=%s, tl, th, equi=%d, nequi=%d, lval=%d, hval=%d, nocheck=%d);\n", BATgetId(b), BATgetId(b), BATgetId(bn), equi, nequi, lval, hval, nocheck); 1224- 1225- bn = BAT_scanselect(b, bn, tl, th, li, hi, equi, nequi, lval, hval, nocheck); 1226- } 1227- } 1228- if (bn == NULL) { ======== In fact, I doubt whether investing in building a hash table is a good decision in such cases where we do/can not know whether it will ever pay off ... Hence, I'd propose to simply drop the choise to build a hash table on the fly, and rather fall through to the basic scan select also in this (rare?) case. For now, you can just locally disable/remove that alternative in the above code, try again, and report the result. Stefan
#BAThash: create hash(1196532); #BAT_select_(b=tmp_5710): tmp_5703: hkey=1, tkey=0, hsorted=139659451564032, tsorted=139659451564032.
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
On Thu, Feb 18, 2010 at 07:05:47PM +0100, Stefan de Konink wrote:
Op 18-02-10 18:23, Stefan Manegold schreef:
On Thu, Feb 18, 2010 at 05:46:31PM +0100, Stefan de Konink wrote:
Attached is a trace of a single column select operation, with a max, avg and equality.
The equality takes about 4 times more time than the avg operation.
From the plan I see that the operation is executed paralel, but I do wonder why suddenly the uselect takes such a high amount of time. Which version of MonetDB? CVS HEAD, 2 am.
How configured / compiled? --enable-assert --enable-debug
How was the mserver5 started (which command line options were given)? /opt/monetdb-head/bin/mserver5 --dbfarm=/home/skinkie/monetb --dbname=kvk --dbinit="include sphinx; include sql;" --set mapi_port=50001
Any changes compared to the default monetdb5.conf? No, not changed anything in there.
Is the "kvk" column sorted? No, it does include a partial sort (msb are sorted). A 32bit column (kvks) that is (or either should be...) ends up with between 170ms ~ 220ms.
Could you run the select query in the MAL debugger (DEBUG select kvk from kvk where kvk = 412657690010;), single step through it, set the debug mask to 2097152 (debug 2097152) just before the algebra.uselect() is executed, and report the output on the server console after the algebra.uselect() has been executed? Do you mean this?
indeed
mdb># _44 := algebra.uselect(_40=
:bat[:oid,:lng][1196532],A0=412657690010:lng); mdb> #BAT_select_(b=tmp_5710): sampling: tmp1 = BATslice(b=tmp_5710, _lo=598266, _hi=598371); #BAT_select_(b=tmp_5710): sampling: tmp2 = BAT_select_(tmp1=tmp_5704, tl, th, tail); #BAT_select_(b=tmp_5704): BAT_scanselect(b=tmp_5704, bn=tmp_5703, tl, th, equi=1, nequi=0, lval=1, hval=1, nocheck=1); #seqscan_eq_lng_void_tloc_oid_vid_nocheck_noinc[if ( simple_EQ(tl ,v,lng) ),v,oid_ctr,oid_ctr++;](b=tmp_5704, bn=tmp_5703, tl, th, oid_ctr=598266, str_idx=0); #BAT_select_(b=tmp_5704): tmp_5703: hkey=1, tkey=0, hsorted=139659451564097, tsorted=0. #BAT_select_(b=tmp_5710): BAT_hashselect(b=tmp_5710, bn=tmp_5703, tl); (building hash-table on the fly) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ "It" (in fact we) choose to do a hash select, and since there is no hash table, yet, we need to build it, which is infact more expensive than a simple scan select for this very operation (later operation *might* then benefit from the hash table ...): ======== $ grep -9n --color 'building hash-table on the fly' MonetDB/src/gdk/gdk_batop.mx 1210- int nocheck = (estimate >= batcnt); 1211- 1212- if (!preserve_order && equi && b->T->hash) { 1213- ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): BAT_hashselect(b=%s, bn=%s, tl); (using existing hash-table)\n", BATgetId(b), BATgetId(b), BATgetId(bn)); 1214- 1215- bn = BAT_hashselect(b, bn, tl); 1216- } else if (!preserve_order && equi && ATOMsize(b->ttype) > 1 && estimate * 100 < batcnt && batcnt * 2 * sizeof(int) < (GDK_mem_maxsize / 4)) { 1217- /* Build a hash-table on the fly for equi-select if the selectivity is low 1218- * and it is not too big */ 1219: ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): BAT_hashselect(b=%s, bn=%s, tl); (building hash-table on the fly)\n", BATgetId(b), BATgetId(b), BATgetId(bn)); 1220- 1221- bn = BAT_hashselect(b, bn, tl); 1222- } else { 1223- ALGODEBUG THRprintf(GDKout, "#BAT_select_(b=%s): BAT_scanselect(b=%s, bn=%s, tl, th, equi=%d, nequi=%d, lval=%d, hval=%d, nocheck=%d);\n", BATgetId(b), BATgetId(b), BATgetId(bn), equi, nequi, lval, hval, nocheck); 1224- 1225- bn = BAT_scanselect(b, bn, tl, th, li, hi, equi, nequi, lval, hval, nocheck); 1226- } 1227- } 1228- if (bn == NULL) { ========
In fact, I doubt whether investing in building a hash table is a good decision in such cases where we do/can not know whether it will ever pay off ...
Hence, I'd propose to simply drop the choise to build a hash table on the fly, and rather fall through to the basic scan select also in this (rare?) case. I support it. It might pay off to build the hash when the column is key, though, because
Stefan Manegold wrote: that is a good indicator for future point select. Provided the table is persistent and we can predict it to be used.
For now, you can just locally disable/remove that alternative in the above code, try again, and report the result.
Stefan
#BAThash: create hash(1196532); #BAT_select_(b=tmp_5710): tmp_5703: hkey=1, tkey=0, hsorted=139659451564032, tsorted=139659451564032.
Op 18-02-10 19:22, Stefan Manegold schreef:
"It" (in fact we) choose to do a hash select, and since there is no hash table, yet, we need to build it, which is infact more expensive than a simple scan select for this very operation (later operation *might* then benefit from the hash table ...):
Question about that then; if we make an on the fly hash. Why isn't it 'maintained' between queries (or does this depend on the chosen pipeline?) Because the query doesn't seem to get faster when running it multiple time?
For now, you can just locally disable/remove that alternative in the above code, try again, and report the result.
Cold: sql>select kvk from kvk where kvk = 412657690010; +--------------+ | kvk | +==============+ | 412657690010 | +--------------+ 1 tuple Timer 1174.737 msec 1 rows Hot: sql>select kvk from kvk where kvk = 412657690010; +--------------+ | kvk | +==============+ | 412657690010 | +--------------+ 1 tuple Timer 23.741 msec 1 rows sql>select kvk from kvk where kvk = 412657690010; Thanks for this 20x performance increase! (And it gets even better, because numbers that doesn't exist are excluded in ~13ms.) Stefan
On Thu, Feb 18, 2010 at 07:40:15PM +0100, Stefan de Konink wrote:
Op 18-02-10 19:22, Stefan Manegold schreef:
"It" (in fact we) choose to do a hash select, and since there is no hash table, yet, we need to build it, which is infact more expensive than a simple scan select for this very operation (later operation *might* then benefit from the hash table ...):
Question about that then; if we make an on the fly hash. Why isn't it 'maintained' between queries (or does this depend on the chosen pipeline?) Because the query doesn't seem to get faster when running it multiple time?
because it is built on an intermediate result (base BAT plus delta BATs applied) that is gone, again after the query has been executed. Stefan
For now, you can just locally disable/remove that alternative in the above code, try again, and report the result.
Cold: sql>select kvk from kvk where kvk = 412657690010; +--------------+ | kvk | +==============+ | 412657690010 | +--------------+ 1 tuple Timer 1174.737 msec 1 rows
Hot: sql>select kvk from kvk where kvk = 412657690010; +--------------+ | kvk | +==============+ | 412657690010 | +--------------+ 1 tuple Timer 23.741 msec 1 rows sql>select kvk from kvk where kvk = 412657690010;
Thanks for this 20x performance increase! (And it gets even better, because numbers that doesn't exist are excluded in ~13ms.)
Stefan
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
Op 18-02-10 19:44, Stefan Manegold schreef:
because it is built on an intermediate result (base BAT plus delta BATs applied) that is gone, again after the query has been executed.
I really feel I have learnt something now :) thanks :) Given this information, is there a way to get the cost function to be optimised? Either by taking reuse in account, or otherwise by the modifying the size? (Or more futuristic, storing the choice and the performance.) Stefan
participants (3)
-
Martin Kersten
-
Stefan de Konink
-
Stefan Manegold