Cost models form the basis for many optimization decisions. The cost parameters are typically the size of the (intermediate) results and response time. Alternatively, they are running aggregates, e.g. max memory and total execution time, obtained from a simulated run. The current implementation contains a framework and an example for building your own cost-based optimized.
The optimizer.costModel()
works its way through a MAL program in
search for relational operators and estimates their result size. The
estimated size is left behind as the property rows.
r{rows=100} := bat.new(:oid,:int);
s{rows=1000}:= bat.new(:oid,:int);
rs:= algebra.select(s,1,1);
rr:= bat.reverse(r);
j:= algebra.join(rs,rr);
optimizer.costModel();
changes the properties of the instructions as follows:
r{rows=100} := bat.new(:oid,:int);
s{rows=1000} := bat.new(:oid,:int);
rs{rows=501} := algebra.select(s,1,1);
rr{rows=100} := bat.reverse(r);
j{rows=100} := algebra.join(rs,rr);
The cost estimation does not use any statistics on the actual data
distribution yet. It relies on the rows property provided by the
front-end or other optimizers. It just applies a few heuristic cost
estimators. However, it ensures that empty results are only tagged
with rows=0
if the estimate is accurate, otherwise it assumes at
least one result row. This property makes it possible to safely pass
the result of the cost estimation to the emptySet optimizer for code
reduction.