Re: [Monetdb-developers] MonetDB: default - More advice on the optimizer template.
On Sat, Feb 11, 2012 at 02:06:17PM +0100, Martin Kersten wrote:
On Wed, Feb 08, 2012 at 10:27:11AM +0100, Martin Kersten wrote:
Changeset: 67c12a700166 for MonetDB URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=67c12a700166 Modified Files: monetdb5/extras/mal_optimizer_template/opt_sql_append.mx Branch: default Log Message:
More advice on the optimizer template.
diffs (140 lines):
diff --git a/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx b/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx --- a/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx +++ b/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx [...] @@ -39,6 +39,8 @@ All Rights Reserved. * i.e., an sql.append() statement that is eventually followed by some other * statement later on in the MAL program that uses the same v0 BAT as * argument as the sql.append() statement does, + * Do you assume a single re-use of the variable v0?
No. Why? Use assign-once and use-many-times policy. It can improve parallel
On 2/11/12 11:03 AM, Stefan Manegold wrote: processing and simplifies scope analysis.
v0 is (as far as I know) created (assigned) once (by Niels, or preceeding optimizers). If it is used only once (only by sql_append), my optimizer does not (have to) do anything. Otherwise, it replaces one use v0 (by sql_append) by a view of v0. That's the very purpose of this optimizer.
+ * Do you assume a non-nested MAL block ?
Not necessarily.
Analysis may become complex if you have something like
V0:= expr barrier E1:=expr V0:= expr2 exit E1 now V0 depends on runtime use
same holds for barrier E1:= expr V0:=expr exit E1 z:= f(V0)
will be flagged as an error because V0 may be uninitialized
I must admit, that I do not know how the oprimizer framework handles nested MAL blocks, and what an optimizer needs to do to be aware of nested MAL blocks and to handle them correctly. Preferrably the MAL blocks are linear programs (until you reach the dataflow optimizer).
How do I know / see that in my optimizer? Do I have to check for barrier / exit statements / constructs myself?
In the sample optimizer, for now, I'd be fine if there are no false-positives, i.e., the optimizer triggers in case it should not trigger or in cases it cannot handle correctly. I can accept false-negatives, i.e., not triggering in all case it could handle correctly.
* * and transform them into * @@ -52,6 +54,7 @@ All Rights Reserved. * * i.e., handing a BAT view v2 of BAT v0 as argument to the sql.append() * statement, rather than the original BAT v0. + * My advice, always use new variable names, it may capture some easy to make errors.
I/my optimizer does use new variables for all new statements/results. I/my optimizer re-use variable names only for identical results.
* * As a refinement, patterns like * [...] @@ -181,13 +195,17 @@ OPTsql_appendImplementation(Client cntxt pushInstruction(mb, q); q1 = q; i++; - actions++; + actions++; /* to keep track if anything has been done */ } }
- /* look for + /* look for * v5 := ... v0 ...; */ + /* an expensive loop, better would be to remember that v0 has a different role. + * A typical method is to keep a map from variable -> instruction where it was + * detected. The you can check each assignment for use of v0 + */
This is general support functionality. Is this already available in the optimizer framework?
I try to use single pass algorithms in the optimizers. Even in the case of commonterms optimizer, we may have to traverse the history. This can become a n^2 process
If so, where is it and how can I use it? Mimic how it is done in other optimizers (e.g. opt_reorder). Typically, a buffer is maintained per variable to keep optimization properties around.
If not, where/how could we add it?
for (j = i+1; !found&& j< limit; j++) for (k = old[j]->retc; !found&& k< old[j]->argc; k++) found = (getArg(old[j], k) == getArg(p, 5)); @@ -202,6 +220,8 @@ OPTsql_appendImplementation(Client cntxt
/* push new v1 := aggr.count( v0 ); unless already available */ if (q1 == NULL) { + /* use mal_buil.mx primitives q1 = newStmt(mb, aggrRef,countRef); setArgType(mb,q1,TYPE_wrd) */ + /* it will be added to the block and even my re-use MAL instructions */
Is this (supposed to be) documentation of the existing code below, or rather advice how to implement the below functionality differently? Use the mal_builder to simplify your code base.
q1 = newInstruction(mb,ASSIGNsymbol); getArg(q1,0) = newTmpVariable(mb, TYPE_wrd); setModuleId(q1, aggrRef); @@ -211,6 +231,7 @@ OPTsql_appendImplementation(Client cntxt }
/* push new v2 := algebra.slice( v0, 0, v1 ); */ + /* use mal_buil.mx primitives q1 = newStmt(mb, algebraRef,sliceRef); */
Is this (supposed to be) documentation of the existing code below, or rather advice how to implement the below functionality differently?
q2 = newInstruction(mb,ASSIGNsymbol); getArg(q2,0) = newTmpVariable(mb, TYPE_any); setModuleId(q2, algebraRef); @@ -240,6 +261,7 @@ OPTsql_appendImplementation(Client cntxt for(i++; i
+/* optimizers have to be registered in the optcatalog in opt_support.c.
Why?
SQL needs a place to pick up all optimizers known. You may also have to extend the optimizer pipeline validity code.
If at all possible, I'd prefer to be able to add a new optimizer without the need to change existing code ... yes understood, but you have to patch Makefile.ag, youroptimizer.mx, and opt_support. Possibly, you may have to extend opt_prelude as well
+ * you have to path the file accordingly.
"path"
^^^^ parse?
What does this mean? What am I supposed to do in detail?
+ */ @include ../../optimizer/optimizerWrapper.mx @c #include "opt_statistics.h" _______________________________________________ Checkin-list mailing list Checkin-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/checkin-list
Thanks!
Stefan
_______________________________________________ Checkin-list mailing list Checkin-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/checkin-list
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
On Sat, Feb 11, 2012 at 02:06:17PM +0100, Martin Kersten wrote:
On Wed, Feb 08, 2012 at 10:27:11AM +0100, Martin Kersten wrote:
Changeset: 67c12a700166 for MonetDB URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=67c12a700166 Modified Files: monetdb5/extras/mal_optimizer_template/opt_sql_append.mx Branch: default Log Message:
More advice on the optimizer template.
diffs (140 lines):
diff --git a/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx b/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx --- a/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx +++ b/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx [...] @@ -39,6 +39,8 @@ All Rights Reserved. * i.e., an sql.append() statement that is eventually followed by some other * statement later on in the MAL program that uses the same v0 BAT as * argument as the sql.append() statement does, + * Do you assume a single re-use of the variable v0?
No. Why? Use assign-once and use-many-times policy. It can improve parallel
On 2/11/12 11:03 AM, Stefan Manegold wrote: processing and simplifies scope analysis.
v0 is (as far as I know) created (assigned) once (by Niels, or preceeding optimizers).
My comments are mostly advisory for optimizers in general ;) On 2/11/12 3:11 PM, Stefan Manegold wrote: true, on purpose
If it is used only once (only by sql_append), my optimizer does not (have to) do anything. Otherwise, it replaces one use v0 (by sql_append) by a view of v0. That's the very purpose of this optimizer.
+ * Do you assume a non-nested MAL block ?
Not necessarily.
Analysis may become complex if you have something like
V0:= expr barrier E1:=expr V0:= expr2 exit E1 now V0 depends on runtime use
same holds for barrier E1:= expr V0:=expr exit E1 z:= f(V0)
will be flagged as an error because V0 may be uninitialized
I must admit, that I do not know how the oprimizer framework handles nested MAL blocks, and what an optimizer needs to do to be aware of nested MAL blocks and to handle them correctly. Preferrably the MAL blocks are linear programs (until you reach the dataflow optimizer).
How do I know / see that in my optimizer? While looping through the plan you check if p->barrier is set. You can always safely exit an optimizer. Do I have to check for barrier / exit statements / constructs myself? in principle, yes Optimizers in the pipeline preceeding yours could introduce them.
In the sample optimizer, for now, I'd be fine if there are no false-positives, i.e., the optimizer triggers in case it should not trigger or in cases it cannot handle correctly. I can accept false-negatives, i.e., not triggering in all case it could handle correctly.
* * and transform them into * @@ -52,6 +54,7 @@ All Rights Reserved. * * i.e., handing a BAT view v2 of BAT v0 as argument to the sql.append() * statement, rather than the original BAT v0. + * My advice, always use new variable names, it may capture some easy to make errors.
I/my optimizer does use new variables for all new statements/results. I/my optimizer re-use variable names only for identical results.
* * As a refinement, patterns like * [...] @@ -181,13 +195,17 @@ OPTsql_appendImplementation(Client cntxt pushInstruction(mb, q); q1 = q; i++; - actions++; + actions++; /* to keep track if anything has been done */ } }
- /* look for + /* look for * v5 := ... v0 ...; */ + /* an expensive loop, better would be to remember that v0 has a different role. + * A typical method is to keep a map from variable -> instruction where it was + * detected. The you can check each assignment for use of v0 + */
This is general support functionality. Is this already available in the optimizer framework?
I try to use single pass algorithms in the optimizers. Even in the case of commonterms optimizer, we may have to traverse the history. This can become a n^2 process
If so, where is it and how can I use it? Mimic how it is done in other optimizers (e.g. opt_reorder). Typically, a buffer is maintained per variable to keep optimization properties around.
If not, where/how could we add it?
for (j = i+1; !found&& j< limit; j++) for (k = old[j]->retc; !found&& k< old[j]->argc; k++) found = (getArg(old[j], k) == getArg(p, 5)); @@ -202,6 +220,8 @@ OPTsql_appendImplementation(Client cntxt
/* push new v1 := aggr.count( v0 ); unless already available */ if (q1 == NULL) { + /* use mal_buil.mx primitives q1 = newStmt(mb, aggrRef,countRef); setArgType(mb,q1,TYPE_wrd) */ + /* it will be added to the block and even my re-use MAL instructions */
Is this (supposed to be) documentation of the existing code below, or rather advice how to implement the below functionality differently? Use the mal_builder to simplify your code base.
q1 = newInstruction(mb,ASSIGNsymbol); getArg(q1,0) = newTmpVariable(mb, TYPE_wrd); setModuleId(q1, aggrRef); @@ -211,6 +231,7 @@ OPTsql_appendImplementation(Client cntxt }
/* push new v2 := algebra.slice( v0, 0, v1 ); */ + /* use mal_buil.mx primitives q1 = newStmt(mb, algebraRef,sliceRef); */
Is this (supposed to be) documentation of the existing code below, or rather advice how to implement the below functionality differently?
q2 = newInstruction(mb,ASSIGNsymbol); getArg(q2,0) = newTmpVariable(mb, TYPE_any); setModuleId(q2, algebraRef); @@ -240,6 +261,7 @@ OPTsql_appendImplementation(Client cntxt for(i++; i
+/* optimizers have to be registered in the optcatalog in opt_support.c.
Why?
SQL needs a place to pick up all optimizers known. You may also have to extend the optimizer pipeline validity code.
If at all possible, I'd prefer to be able to add a new optimizer without the need to change existing code ... yes understood, but you have to patch Makefile.ag, youroptimizer.mx, and opt_support. Possibly, you may have to extend opt_prelude as well
+ * you have to path the file accordingly.
"path"
^^^^ parse?
What does this mean? What am I supposed to do in detail?
+ */ @include ../../optimizer/optimizerWrapper.mx @c #include "opt_statistics.h" _______________________________________________ Checkin-list mailing list Checkin-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/checkin-list
Thanks!
Stefan
_______________________________________________ Checkin-list mailing list Checkin-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/checkin-list
participants (2)
-
Martin Kersten
-
Stefan Manegold