[MonetDB-users] Slow range join performance / regression with Dec2011-SP1
Hi, I have a few queries that join a table with itself using a ranged predicate on the primary key. The simplest query is: SELECT count(*) AS count FROM rank_copy rank1, rank_copy rank2 WHERE rank1.cat_s = true AND rank2.pos_vvfin = true AND rank1.pre < rank2.pre AND rank2.pre < rank1.post As you can see, the range lookup on rank2.pre is restricted on both sides by rank1.pre and rank1.post. The selectivity of the attributes and the complete query is: - row count : 2431307 - query : 95977 (0.039) - cat_s : 148799 (0.061) - pos_vvfin : 71965 (0.030) PostgreSQL is able to evaluate the query in 2 seconds using an index on pre. (This index is automatically generated because pre is the primary key.) MonetDB Dec2011 requires about 35 seconds. The MAL plan is posted below. I've also tried sorting the table on pre, and created an (advisory) index on pre but this makes no difference. +---------------------------------------------------------------------------+ | mal | +===========================================================================+ | function user.s1_1{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_9:bat[:oid,:bit] := sql.bind(X_2,"sys","rank_copy","pos_vvfin",0); | | X_10 := algebra.uselect(X_9,true:bit); | | X_11 := algebra.markT(X_10,0@0:oid); | | X_12 := bat.reverse(X_11); | | X_4:bat[:oid,:int] := sql.bind(X_2,"sys","rank_copy","pre",0); | | X_13 := algebra.leftjoin(X_12,X_4); | | X_17:bat[:oid,:bit] := sql.bind(X_2,"sys","rank_copy","cat_s",0); | | X_18 := algebra.uselect(X_17,true:bit); | | X_19 := algebra.markT(X_18,0@0:oid); | | X_20 := bat.reverse(X_19); | | X_23 := algebra.leftjoin(X_20,X_4); | | X_15:bat[:oid,:int] := sql.bind(X_2,"sys","rank_copy","post",0); | | X_21 := algebra.leftjoin(X_20,X_15); | | X_24 := algebra.join(X_13,X_23,X_21,false,false); | | X_26 := algebra.markT(X_24,0@0:oid); | | X_27 := bat.reverse(X_26); | | X_28 := algebra.leftjoin(X_27,X_13); | | X_29 := aggr.count(X_28); | | sql.exportValue(1,"sys.rank2","count":str,"wrd",64,0,6,X_29,""); | | end s1_1; | +---------------------------------------------------------------------------+ 22 tuples If I use a slightly more complicated query that has the boolean attributes pos_vvfin and cat_s on another table that is joined by rank, the situation is even worse. SELECT count(*) AS count FROM node_copy node1 JOIN rank_copy rank1 ON (node1.id = rank1.node_ref), node_copy node2 JOIN rank_copy rank2 ON (node2.id = rank2.node_ref) WHERE node1.cat_s = true AND node2.pos_vvfin = true AND rank1.pre < rank2.pre AND rank2.pre < rank1.post PostgreSQL evaluates the query in 9 seconds, again using the primary key indexes on node.id and rank.pre. MonetDB Dec2011 requires 14 minutes. The MAL plan follows: +-----------------------------------------------------------------------------------------------+ | mal | +===============================================================================================+ | function user.s1_1{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_19:bat[:oid,:oid] := sql.bind_idxbat(X_2,"sys","rank_copy","FK_rank_copy_node_ref",0); | | X_13:bat[:oid,:bit] := sql.bind(X_2,"sys","node_copy","pos_vvfin",0); | | X_14 := algebra.uselect(X_13,true:bit); | | X_15 := algebra.markT(X_14,0@0:oid); | | X_16 := bat.reverse(X_15); | | X_9:bat[:oid,:int] := sql.bind(X_2,"sys","node_copy","id",0); | | X_11 := bat.mirror(X_9); | | X_17 := algebra.leftjoin(X_16,X_11); | | X_18 := bat.reverse(X_17); | | X_37:bat[:oid,:bit] := sql.bind(X_2,"sys","node_copy","cat_s",0); | | X_39 := algebra.uselect(X_37,true:bit); | | X_40 := algebra.markT(X_39,0@0:oid); | | X_41 := bat.reverse(X_40); | | X_42 := algebra.leftjoin(X_41,X_11); | | X_43 := bat.reverse(X_42); | | X_24 := algebra.join(X_19,X_18); | | X_25 := algebra.markT(X_24,0@0:oid); | | X_26 := bat.reverse(X_25); | | X_4:bat[:oid,:int] := sql.bind(X_2,"sys","rank_copy","pre",0); | | X_27 := algebra.leftjoin(X_26,X_4); | | X_29:bat[:oid,:int] := sql.bind(X_2,"sys","rank_copy","post",0); | | X_30 := algebra.join(X_27,X_4,X_29,false,false); | | X_32 := algebra.markT(X_30,0@0:oid); | | X_33 := bat.reverse(X_32); | | X_44 := bat.reverse(X_30); | | X_45 := algebra.markT(X_44,0@0:oid); | | X_46 := bat.reverse(X_45); | | X_47 := algebra.leftjoin(X_46,X_19); | | X_48 := algebra.join(X_47,X_43); | | X_49 := algebra.markT(X_48,0@0:oid); | | X_50 := bat.reverse(X_49); | | X_51:bat[:oid,:int] := algebra.leftjoinPath(X_50,X_33,X_27); | | X_52 := aggr.count(X_51); | | sql.exportValue(1,"sys.rank2","count","wrd",64,0,6,X_52,""); | | end s1_1; | +-----------------------------------------------------------------------------------------------+ 37 tuples The Dec2011-SP1 release takes even longer, about 18 minutes. The MAL plan is unchanged. The culprit appears to be the ranged lookup because if I replace it with an equi-join on another attribute MonetDB evaluates it in about 100 ms. SELECT count(*) AS count FROM node_copy node1 JOIN rank_copy rank1 ON (node1.id = rank1.node_ref), node_copy node2 JOIN rank_copy rank2 ON (node2.id = rank2.node_ref) WHERE node1.cat_s = true AND node2.pos_vvfin = true AND rank1.pre = rank2.parent I've tried rewriting the query using WITH and had some success. The first query again takes 35 seconds: WITH rank1 AS (SELECT * FROM rank_copy WHERE cat_s = true), rank2 AS (SELECT * FROM rank_copy WHERE pos_vvfin = true) SELECT count(*) FROM rank1, rank2 WHERE rank1.pre < rank2.pre AND rank2.pre < rank1.post The second query also takes 35 seconds when rewritten: WITH rank1 AS (SELECT * FROM rank_copy JOIN node_copy ON (rank_copy.node_ref = node_copy.id) WHERE node_copy.cat_s = true), rank2 AS (SELECT * FROM rank_copy JOIN node_copy ON (rank_copy.node_ref = node_copy.id) WHERE node_copy.pos_vvfin = true) SELECT count(*) FROM rank1, rank2 WHERE rank1.pre < rank2.pre AND rank2.pre < rank1.post MAL plan: +-----------------------------------------------------------------------------------------------+ | mal | +===============================================================================================+ | function user.s1_2{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_20:bat[:oid,:oid] := sql.bind_idxbat(X_2,"sys","rank_copy","FK_rank_copy_node_ref",0); | | X_13:bat[:oid,:bit] := sql.bind(X_2,"sys","node_copy","pos_vvfin",0); | | X_14 := algebra.uselect(X_13,true:bit); | | X_15 := algebra.markT(X_14,0@0:oid); | | X_16 := bat.reverse(X_15); | | X_9:bat[:oid,:int] := sql.bind(X_2,"sys","node_copy","id",0); | | X_11 := bat.mirror(X_9); | | X_17 := algebra.leftjoin(X_16,X_11); | | X_18 := bat.reverse(X_17); | | X_32:bat[:oid,:bit] := sql.bind(X_2,"sys","node_copy","cat_s",0); | | X_33 := algebra.uselect(X_32,true:bit); | | X_34 := algebra.markT(X_33,0@0:oid); | | X_35 := bat.reverse(X_34); | | X_36 := algebra.leftjoin(X_35,X_11); | | X_37 := bat.reverse(X_36); | | X_22 := algebra.join(X_20,X_18); | | X_23 := algebra.markT(X_22,0@0:oid); | | X_24 := bat.reverse(X_23); | | X_5:bat[:oid,:int] := sql.bind(X_2,"sys","rank_copy","pre",0); | | X_25 := algebra.leftjoin(X_24,X_5); | | X_38 := algebra.join(X_20,X_37); | | X_39 := algebra.markT(X_38,0@0:oid); | | X_40 := bat.reverse(X_39); | | X_43 := algebra.leftjoin(X_40,X_5); | | X_27:bat[:oid,:int] := sql.bind(X_2,"sys","rank_copy","post",0); | | X_41 := algebra.leftjoin(X_40,X_27); | | X_44 := algebra.join(X_25,X_43,X_41,false,false); | | X_46 := algebra.markT(X_44,0@0:oid); | | X_47 := bat.reverse(X_46); | | X_48 := algebra.leftjoin(X_47,X_25); | | X_49 := aggr.count(X_48); | | sql.exportValue(1,"sys.rank2","L1","wrd",64,0,6,X_49,""); | | end s1_2; | +-----------------------------------------------------------------------------------------------+ 35 tuples Is there anything I can do to further speed up the execution of these queries? Thanks, Viktor
participants (1)
-
Viktor Rosenfeld