Costly bat.new/bat.append not optimized out
Hi,
Unless I'm missing something, I do not understand why this apparent
pattern is not easily optimized out:
X_3:bat[:oid,:lng] =
Hi Frederik, Unless I misunderstand the problem, I think you can't just use X_3 because that is the bat that corresponds to the physical column of your table a. So you can't take it and append b to it. Union creates a copy of a, then appends b, then c. Roberto On 22 August 2014 17:42, Frédéric Jolliton < frederic.jolliton+monetdb@securactive.net> wrote:
Hi,
Unless I'm missing something, I do not understand why this apparent pattern is not easily optimized out:
X_3:bat[:oid,:lng] =
[0] := bat.new(nil:oid,nil:lng); .. # X_3 is never used before this line. X_19= [23068647] := <some computation> X_20= [23068647] := bat.append(X_3= :bat[:oid,:lng][23068647], X_19=
[23068647],true); # X_3 is never used after this line. An empty BAT is created for the sole purpose of appending a BAT with millions of rows, discarding the original one.. instead of just using it!
This account for up to 30% of the time spent for some SQL queries! Like this one:
SELECT name, avg(value) FROM some_view GROUP BY name LIMIT 10;
I tried to manually remove these extra operation, and I got the exact same result, in less time. (I took the output of EXPLAIN, removed the lines in question, and added a bit of type information for the function to compile, before playing it in mclient -l mal.)
Regarding the setup, for our tests, we have 3 tables (say, a, b and c) of 11M rows each, then a view declared as:
CREATE VIEW some_view AS (SELECT * FROM a UNION ALL SELECT * FROM b UNION ALL SELECT * FROM c);
I've tested it with v11.17.21 and v11.19.0, with the "default" optimizer.
My question: is this extra operation required, or could it be optimized in MonetDB?
-- Frédéric Jolliton SecurActive _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Roberto, I wrote:
X_3:bat[:oid,:lng] =
[0] := bat.new(nil:oid,nil:lng); .. # X_3 is never used before this line. X_19= [23068647] := <some computation> X_20= [23068647] := bat.append(X_3= :bat[:oid,:lng][23068647], X_19= [23068647],true); # X_3 is never used after this line.
Roberto Cornacchia
Unless I misunderstand the problem, I think you can't just use X_3 because that is the bat that corresponds to the physical column of your table a. So you can't take it and append b to it. Union creates a copy of a, then appends b, then c.
Thanks for your reply. No, here X_3 is a new BAT (it is the result of bat.new). The BAT from a physical column come from call to sql.bind. But I'm new to MonetDB, so I might be missing some details too. The value between brackets ([]) indicate the number of rows after the call, and we can see that at first X_3 is empty (as expected), that X_19 contains a lot of rows (23M). And then this one is concatened to the empty one. As a result of bat.append, we get the same BAT referenced by X_3 (we see its internal name between <> as temp_321264), not some another BAT created by the call itself. Moreover, what I didn't specify, but was implied in my original post, is that X_19 is no longer used after the call to bat.append. We see that X_3 become of copy of X_19, and it replaces it entirely. This is damageable for the performance. -- Frédéric Jolliton Sécuractive
Sorry, I mixed up the names. What i meant is that X_3 seems the result container created by the union, and X_19 seems the column of a (or a view a view of it). If X_19 were really the column of a, then you could not reuse it. Most probably though X_19 is the result of the original column and the delta bats (updates,deletions). In that case i guess that yes, it could be reused as you say, but the pattern is not detected by any optimizer. I'd advise you to post a bit more complete MAL subplan, then it will be easier for the developers to help you. Roberto On 22 Aug 2014 21:18, "Frédéric Jolliton" < frederic.jolliton+monetdb@securactive.net> wrote:
Hi Roberto,
I wrote:
X_3:bat[:oid,:lng] =
[0] := bat.new(nil:oid,nil:lng); .. # X_3 is never used before this line. X_19= [23068647] := <some computation> X_20= [23068647] := bat.append(X_3=
:bat[:oid,:lng][23068647], X_19=
[23068647],true); # X_3 is never used after this line.
Roberto Cornacchia
writes: Unless I misunderstand the problem, I think you can't just use X_3 because that is the bat that corresponds to the physical column of your table a. So you can't take it and append b to it. Union creates a copy of a, then appends b, then c.
Thanks for your reply.
No, here X_3 is a new BAT (it is the result of bat.new).
The BAT from a physical column come from call to sql.bind.
But I'm new to MonetDB, so I might be missing some details too.
The value between brackets ([]) indicate the number of rows after the call, and we can see that at first X_3 is empty (as expected), that X_19 contains a lot of rows (23M). And then this one is concatened to the empty one. As a result of bat.append, we get the same BAT referenced by X_3 (we see its internal name between <> as temp_321264), not some another BAT created by the call itself.
Moreover, what I didn't specify, but was implied in my original post, is that X_19 is no longer used after the call to bat.append. We see that X_3 become of copy of X_19, and it replaces it entirely. This is damageable for the performance.
-- Frédéric Jolliton Sécuractive _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Roberto Cornacchia
Sorry, I mixed up the names. What i meant is that X_3 seems the result container created by the union, and X_19 seems the column of a (or a view a view of it). If X_19 were really the column of a, then you could not reuse it. Most probably though X_19 is the result of the original column and the delta bats (updates,deletions). In that case i guess that yes, it could be reused as you say, but the pattern is not detected by any optimizer. I'd advise you to post a bit more complete MAL subplan, then it will be easier for the developers to help you.
Ok sorry, I don't know people here.
Here is a detailed scenario:
1. I create three tables "a", "b" and "c", each containing 11M
rows. (I've not tried to find the best set / best configuration that
emphasize this problem.)
sql>\d a
CREATE TABLE "sys"."a" (
"name" CHARACTER LARGE OBJECT,
"value" BIGINT
);
sql>select count(*) from a;
+----------+
| L1 |
+==========+
| 11217990 |
+----------+
1 tuple (0.553ms)
Same for "b" and "c" (same content actually.)
2. I create a view that concatenate all the tables.
CREATE VIEW my_view AS (SELECT * FROM a UNION ALL SELECT * FROM b UNION ALL SELECT * FROM c);
3. I perform the following query. See the output below.
TRACE SELECT name, avg(value) FROM my_view GROUP BY name;
Note: there is only one name actually in the table, so the GROUP BY
gives only one resulting row.
4. From a tool I made, I also check the resulting graph to figure where
the time is spent. The red box near the middle shows the operation
that is the most onerous (yellow = almost no time, red = highest
cost, blue are intermediate results). A lot easier to read that the
raw trace output (assuming my tool is not broken.)
See http://tuxee.net/monetdb-prof.svg
Here is the output of EXPLAIN (easier to read), before showing TRACE:
sql>\f raw
sql>explain SELECT name, avg(value) FROM my_view GROUP BY name;
% .explain # table_name
% mal # name
% clob # type
% 95 # length
function user.s2_2{autoCommit=true}():void;
X_3:bat[:oid,:str] := bat.new(nil:oid,nil:str);
X_6:bat[:oid,:str] := bat.new(nil:oid,nil:str);
X_2 := sql.mvc();
X_7:bat[:oid,:oid] := sql.tid(X_2,"sys","a");
X_10 := sql.bind(X_2,"sys","a","name",0);
(X_13,r1_13) := sql.bind(X_2,"sys","a","name",2);
X_16 := sql.bind(X_2,"sys","a","name",1);
X_18 := sql.delta(X_10,X_13,r1_13,X_16);
X_19 := algebra.leftfetchjoin(X_7,X_18);
X_20 := bat.append(X_6,X_19,true);
X_21:bat[:oid,:oid] := sql.tid(X_2,"sys","b");
X_23 := sql.bind(X_2,"sys","b","name",0);
(X_24,r1_24) := sql.bind(X_2,"sys","b","name",2);
X_26 := sql.bind(X_2,"sys","b","name",1);
X_27 := sql.delta(X_23,X_24,r1_24,X_26);
X_28 := algebra.leftfetchjoin(X_21,X_27);
X_29 := bat.append(X_20,X_28,true);
X_30 := bat.append(X_3,X_29,true);
X_31:bat[:oid,:oid] := sql.tid(X_2,"sys","c");
X_33 := sql.bind(X_2,"sys","c","name",0);
(X_34,r1_34) := sql.bind(X_2,"sys","c","name",2);
X_36 := sql.bind(X_2,"sys","c","name",1);
X_37 := sql.delta(X_33,X_34,r1_34,X_36);
X_38 := algebra.leftfetchjoin(X_31,X_37);
X_39 := bat.append(X_30,X_38,true);
(X_41,r1_41,r2_41) := group.subgroupdone(X_39);
X_44 := algebra.leftfetchjoin(r1_41,X_39);
X_45:bat[:oid,:lng] := bat.new(nil:oid,nil:lng);
X_48:bat[:oid,:lng] := bat.new(nil:oid,nil:lng);
X_49 := sql.bind(X_2,"sys","a","value",0);
(X_54,r1_54) := sql.bind(X_2,"sys","a","value",2);
X_57 := sql.bind(X_2,"sys","a","value",1);
X_59 := sql.delta(X_49,X_54,r1_54,X_57);
X_60 := algebra.leftfetchjoin(X_7,X_59);
X_61 := bat.append(X_48,X_60,true);
X_62 := sql.bind(X_2,"sys","b","value",0);
(X_64,r1_64) := sql.bind(X_2,"sys","b","value",2);
X_66 := sql.bind(X_2,"sys","b","value",1);
X_67 := sql.delta(X_62,X_64,r1_64,X_66);
X_68 := algebra.leftfetchjoin(X_21,X_67);
X_69 := bat.append(X_61,X_68,true);
X_70 := bat.append(X_45,X_69,true);
X_71 := sql.bind(X_2,"sys","c","value",0);
(X_73,r1_73) := sql.bind(X_2,"sys","c","value",2);
X_75 := sql.bind(X_2,"sys","c","value",1);
X_76 := sql.delta(X_71,X_73,r1_73,X_75);
X_77 := algebra.leftfetchjoin(X_31,X_76);
X_78 := bat.append(X_70,X_77,true);
X_80 := batcalc.dbl(X_78);
X_81:bat[:oid,:dbl] := aggr.subavg(X_80,X_41,r1_41,true,true);
X_82 := sql.resultSet(2,1,X_44);
sql.rsColumn(X_82,".my_view","name","clob",0,0,X_44);
sql.rsColumn(X_82,".L1","L1","double",53,0,X_81);
X_93 := io.stdout();
sql.exportResult(X_93,X_82);
end s2_2;
# querylog.define("explain select name, avg(value) from my_view group by name;","default_pipe")
And the output of TRACE:
sql>\f tab
sql>TRACE SELECT name, avg(value) FROM my_view GROUP BY name;
foobar 509350.54538192693
30 X_3:bat[:oid,:str] =
Fredrik, I think that you are correct, the pattern can be optimized away. Actually many of the queries I usually deal with would benefit from it. Sometime certain optimizations are not used or limited on purpose, because of performance reasons (the time spent in optimization steals from query time). I don't think this would be particularly costly, but it's up to the developers to decide about it (I'm not one of them). Best, Roberto Ps. It's actually not very difficult to write it yourself if you feel like it. There is a template in the code. On 22 Aug 2014 22:27, "Frédéric Jolliton" < frederic.jolliton+monetdb@securactive.net> wrote:
Roberto Cornacchia
writes: Sorry, I mixed up the names. What i meant is that X_3 seems the result container created by the union, and X_19 seems the column of a (or a view a view of it). If X_19 were really the column of a, then you could not reuse it. Most probably though X_19 is the result of the original column and the delta bats (updates,deletions). In that case i guess that yes, it could be reused as you say, but the pattern is not detected by any optimizer. I'd advise you to post a bit more complete MAL subplan, then it will be easier for the developers to help you.
Ok sorry, I don't know people here.
Here is a detailed scenario:
1. I create three tables "a", "b" and "c", each containing 11M rows. (I've not tried to find the best set / best configuration that emphasize this problem.)
sql>\d a CREATE TABLE "sys"."a" ( "name" CHARACTER LARGE OBJECT, "value" BIGINT ); sql>select count(*) from a; +----------+ | L1 | +==========+ | 11217990 | +----------+ 1 tuple (0.553ms)
Same for "b" and "c" (same content actually.)
2. I create a view that concatenate all the tables.
CREATE VIEW my_view AS (SELECT * FROM a UNION ALL SELECT * FROM b UNION ALL SELECT * FROM c);
3. I perform the following query. See the output below.
TRACE SELECT name, avg(value) FROM my_view GROUP BY name;
Note: there is only one name actually in the table, so the GROUP BY gives only one resulting row.
4. From a tool I made, I also check the resulting graph to figure where the time is spent. The red box near the middle shows the operation that is the most onerous (yellow = almost no time, red = highest cost, blue are intermediate results). A lot easier to read that the raw trace output (assuming my tool is not broken.)
See http://tuxee.net/monetdb-prof.svg
Here is the output of EXPLAIN (easier to read), before showing TRACE:
sql>\f raw sql>explain SELECT name, avg(value) FROM my_view GROUP BY name; % .explain # table_name % mal # name % clob # type % 95 # length function user.s2_2{autoCommit=true}():void; X_3:bat[:oid,:str] := bat.new(nil:oid,nil:str); X_6:bat[:oid,:str] := bat.new(nil:oid,nil:str); X_2 := sql.mvc(); X_7:bat[:oid,:oid] := sql.tid(X_2,"sys","a"); X_10 := sql.bind(X_2,"sys","a","name",0); (X_13,r1_13) := sql.bind(X_2,"sys","a","name",2); X_16 := sql.bind(X_2,"sys","a","name",1); X_18 := sql.delta(X_10,X_13,r1_13,X_16); X_19 := algebra.leftfetchjoin(X_7,X_18); X_20 := bat.append(X_6,X_19,true); X_21:bat[:oid,:oid] := sql.tid(X_2,"sys","b"); X_23 := sql.bind(X_2,"sys","b","name",0); (X_24,r1_24) := sql.bind(X_2,"sys","b","name",2); X_26 := sql.bind(X_2,"sys","b","name",1); X_27 := sql.delta(X_23,X_24,r1_24,X_26); X_28 := algebra.leftfetchjoin(X_21,X_27); X_29 := bat.append(X_20,X_28,true); X_30 := bat.append(X_3,X_29,true); X_31:bat[:oid,:oid] := sql.tid(X_2,"sys","c"); X_33 := sql.bind(X_2,"sys","c","name",0); (X_34,r1_34) := sql.bind(X_2,"sys","c","name",2); X_36 := sql.bind(X_2,"sys","c","name",1); X_37 := sql.delta(X_33,X_34,r1_34,X_36); X_38 := algebra.leftfetchjoin(X_31,X_37); X_39 := bat.append(X_30,X_38,true); (X_41,r1_41,r2_41) := group.subgroupdone(X_39); X_44 := algebra.leftfetchjoin(r1_41,X_39); X_45:bat[:oid,:lng] := bat.new(nil:oid,nil:lng); X_48:bat[:oid,:lng] := bat.new(nil:oid,nil:lng); X_49 := sql.bind(X_2,"sys","a","value",0); (X_54,r1_54) := sql.bind(X_2,"sys","a","value",2); X_57 := sql.bind(X_2,"sys","a","value",1); X_59 := sql.delta(X_49,X_54,r1_54,X_57); X_60 := algebra.leftfetchjoin(X_7,X_59); X_61 := bat.append(X_48,X_60,true); X_62 := sql.bind(X_2,"sys","b","value",0); (X_64,r1_64) := sql.bind(X_2,"sys","b","value",2); X_66 := sql.bind(X_2,"sys","b","value",1); X_67 := sql.delta(X_62,X_64,r1_64,X_66); X_68 := algebra.leftfetchjoin(X_21,X_67); X_69 := bat.append(X_61,X_68,true); X_70 := bat.append(X_45,X_69,true); X_71 := sql.bind(X_2,"sys","c","value",0); (X_73,r1_73) := sql.bind(X_2,"sys","c","value",2); X_75 := sql.bind(X_2,"sys","c","value",1); X_76 := sql.delta(X_71,X_73,r1_73,X_75); X_77 := algebra.leftfetchjoin(X_31,X_76); X_78 := bat.append(X_70,X_77,true); X_80 := batcalc.dbl(X_78); X_81:bat[:oid,:dbl] := aggr.subavg(X_80,X_41,r1_41,true,true); X_82 := sql.resultSet(2,1,X_44);
sql.rsColumn(X_82,".my_view","name","clob",0,0,X_44);
sql.rsColumn(X_82,".L1","L1","double",53,0,X_81); X_93 := io.stdout(); sql.exportResult(X_93,X_82); end s2_2; # querylog.define("explain select name, avg(value) from my_view group by name;","default_pipe")
And the output of TRACE:
sql>\f tab sql>TRACE SELECT name, avg(value) FROM my_view GROUP BY name; foobar 509350.54538192693 30 X_3:bat[:oid,:str] =
[0] := bat.new(nil:oid,nil:str); 13 X_6:bat[:oid,:str] = [0] := bat.new(nil:oid,nil:str); 4 X_2 := sql.mvc(); 15 "X_7:bat[:oid,:oid] = [11217990] := sql.tid(X_2=0,""sys"",""a"");" 9 "X_10= [11217990] := sql.bind(X_2=0,""sys"",""a"",""name"",0);" 11 "(X_13= [0],r1_13= [0]) := sql.bind(X_2=0,""sys"",""a"",""name"",2);" 6 "X_16= [0] := sql.bind(X_2=0,""sys"",""a"",""name"",1);" 4 X_18= [11217990] := sql.delta(X_10= [11217990],X_13= [0],r1_13= [0],X_16= [0]); 21 X_19= [11217990] := algebra.leftfetchjoin(X_7= :bat[:oid,:oid][11217990],X_18= [11217990]); 2427642 X_20= [11217990] := bat.append(X_6= :bat[:oid,:str][11217990],X_19= [11217990],true); 27 "X_21:bat[:oid,:oid] = [11217990] := sql.tid(X_2=0,""sys"",""b"");" 10 "X_23= [11217990] := sql.bind(X_2=0,""sys"",""b"",""name"",0);" 9 "(X_24= [0],r1_24= [0]) := sql.bind(X_2=0,""sys"",""b"",""name"",2);" 6 "X_26= [0] := sql.bind(X_2=0,""sys"",""b"",""name"",1);" 3 X_27= [11217990] := sql.delta(X_23= [11217990],X_24= [0],r1_24= [0],X_26= [0]); 22 X_28= [11217990] := algebra.leftfetchjoin(X_21= :bat[:oid,:oid][11217990],X_27= [11217990]); 2512634 X_29= [22435980] := bat.append(X_20= [22435980],X_28= [11217990],true); 4840265 X_30= [22435980] := bat.append(X_3= :bat[:oid,:str][22435980],X_29= [22435980],true); 30 "X_31:bat[:oid,:oid] = [11217990] := sql.tid(X_2=0,""sys"",""c"");" 12 "X_33= [11217990] := sql.bind(X_2=0,""sys"",""c"",""name"",0);" 10 "(X_34= [0],r1_34= [0]) := sql.bind(X_2=0,""sys"",""c"",""name"",2);" 5 "X_36= [0] := sql.bind(X_2=0,""sys"",""c"",""name"",1);" 3 X_37= [11217990] := sql.delta(X_33= [11217990],X_34= [0],r1_34= [0],X_36= [0]); 20 X_38= [11217990] := algebra.leftfetchjoin(X_31= :bat[:oid,:oid][11217990],X_37= [11217990]); 2604326 X_39= [33653970] := bat.append(X_30= [33653970],X_38= [11217990],true); 533287 (X_41= [33653970],r1_41= [1],r2_41= [1]) := group.subgroupdone(X_39= [33653970]); 30 X_44= [1] := algebra.leftfetchjoin(r1_41= [1],X_39= [33653970]); 13 X_45:bat[:oid,:lng] = [0] := bat.new(nil:oid,nil:lng); 8 X_48:bat[:oid,:lng] = [0] := bat.new(nil:oid,nil:lng); 18 "X_49= [11217990] := sql.bind(X_2=0,""sys"",""a"",""value"",0);" 9 "(X_54= [0],r1_54= [0]) := sql.bind(X_2=0,""sys"",""a"",""value"",2);" 5 "X_57= [0] := sql.bind(X_2=0,""sys"",""a"",""value"",1);" 3 X_59= [11217990] := sql.delta(X_49= [11217990],X_54= [0],r1_54= [0],X_57= [0]); 14 X_60= [11217990] := algebra.leftfetchjoin(X_7= :bat[:oid,:oid][11217990],X_59= [11217990]); 145390 X_61= [11217990] := bat.append(X_48= :bat[:oid,:lng][11217990],X_60= [11217990],true); 18 "X_62= [11217990] := sql.bind(X_2=0,""sys"",""b"",""value"",0);" 10 "(X_64= [0],r1_64= [0]) := sql.bind(X_2=0,""sys"",""b"",""value"",2);" 4 "X_66= [0] := sql.bind(X_2=0,""sys"",""b"",""value"",1);" 4 X_67= [11217990] := sql.delta(X_62= [11217990],X_64= [0],r1_64= [0],X_66= [0]); 23 X_68= [11217990] := algebra.leftfetchjoin(X_21= :bat[:oid,:oid][11217990],X_67= [11217990]); 1092752 X_69= [22435980] := bat.append(X_61= [22435980],X_68= [11217990],true); 287822 X_70= [22435980] := bat.append(X_45= :bat[:oid,:lng][22435980],X_69= [22435980],true); 29 "X_71= [11217990] := sql.bind(X_2=0,""sys"",""c"",""value"",0);" 11 "(X_73= [0],r1_73= [0]) := sql.bind(X_2=0,""sys"",""c"",""value"",2);" 6 "X_75= [0] := sql.bind(X_2=0,""sys"",""c"",""value"",1);" 3 X_76= [11217990] := sql.delta(X_71= [11217990],X_73= [0],r1_73= [0],X_75= [0]); 26 X_77= [11217990] := algebra.leftfetchjoin(X_31= :bat[:oid,:oid][11217990],X_76= [11217990]); 1512937 X_78= [33653970] := bat.append(X_70= [33653970],X_77= [11217990],true); 545408 X_80= [33653970] := batcalc.dbl(X_78= [33653970]); 1199559 X_81:bat[:oid,:dbl] = [1] := aggr.subavg(X_80= [33653970],X_41= [33653970],r1_41= [1],true,true); 18 X_82 := sql.resultSet(2,1,X_44= [1]); 10 "sql.rsColumn(X_82=9,"".my_view"",""name"",""clob"",0,0,X_44= [1]);" 4 "sql.rsColumn(X_82=9,"".L1"",""L1"",""double"",53,0,X_81= :bat[:oid,:dbl][1]);" 1 X_93 := io.stdout(); 63 "sql.exportResult(X_93==""104d2"":streams,X_82=9);" 1 end s3_2; 17943579 function user.s3_2(); 8975401 X_5:void := user.s3_2(); -- Frédéric Jolliton Sécuractive _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Roberto Cornacchia
writes: [..] Ps. It's actually not very difficult to write it yourself if you feel like it. There is a template in the code.
Interesting. I think we will go for this.
We're working on it. Fortunately, it seems rather easy to plug a custom optimizer, which is rather nice. To give a recap, we would like to optimize away the following case: big := <big BAT> a := bat.new(..) a := bat.append(a, big) .. from this point use `a', and no longer use `big' .. However, it is not as trivial as we thought, because to eliminate useless (in appearence) bat.new/bat.append pair, we need to know that the BAT that is processed and for which we want to avoid a costly copy is not used elsewhere. So we decided to whitelist some MAL functions which are known as producing a new BAT rather than returning one of their arguments. This is not nice obviously, and we might make mistake when deciding if such functions are really functional or not (ie mutating/returning one of the input BAT.) We need to review the code of each function to ensure that. A better solution, but much more complicated for us, and that would require major changes would be to postpone the operation until really needed by using a copy-on-write mechanism. That is, in the example above, when the bat.append function is called, it see that it try to append to an empty BAT, in which case it would return a BAT *proxy* that could act as a regular BAT. Then whenever an attempt would be made to change either of these BAT (`a` or `big`), a copy would be triggered. This way the copy would occur only when needed. Simpler to say than to implement I know. (Especially when not knowing all the intricacies of the internal processing of BAT.) Likewise, we see another optimization opportunity with a query like this: SELECT our_computation(some_col) FROM some_table LIMIT 10; which produce the following MAL code: t := ... u := udf.complex_computation(t) # process 10M rows v := algebra.subslice(u, 0, 9) w := algebra.leftfetchjoin(v, u) # get only the ten first rows! This code is not optimal because we could have processed 10 rows instead of millions if our custom operator was run after the slice. But to figure that, we would need a way to put attribute on functions to tell if the order of the rows or the number of rows matter for the operation. Knowing that the code could be rewritten as: t := ... v := algebra.subslice(t, 0, 9) w := algebra.leftfetchjoin(v, t) # get the ten first rows u := udf.complex_computation(w) # process 10 rows only As a workaround we tried to write the query as: SELECT our_computation(some_col) FROM (SELECT some_col FROM some_table LIMIT 10) as q; But it seems that the LIMIT is not supported in subselect. We could also write an optimizer for that (and we will do I think), again by whitelisting functions. But overall, what is missing is a way to flag function to let the optimizer know more about them. -- Frédéric Jolliton SecurActive
participants (2)
-
Frédéric Jolliton
-
Roberto Cornacchia