Modern cost-based query optimizers use a memo structure to organize the search space for an efficient query execution plan. For example, consider an oid join path 'A.B.C.D'. We can start the evaluation at any point in this path.
Its memo structure can be represented by a (large) MAL program.
The memo levels are encapsulated with a choice
operator. The arguments of the second dictate
which instructions to consider for cost evaluation.
...
scheduler.choice("getVolume");
T1:= algebra.join(A,B);
T2:= algebra.join(B,C);
T3:= algebra.join(C,D);
scheduler.choice("getVolume",T1,T2,T3);
T4:= algebra.join(T1,C);
T5:= algebra.join(A,T2);
T6:= algebra.join(T2,D);
T7:= algebra.join(B,T3);
T8:= algebra.join(C,D);
scheduler.choice("getVolume",T4,T5,T6,T7,T8);
T9:= algebra.join(T4,D);
T10:= algebra.join(T5,D);
T11:= algebra.join(A,T6);
T12:= algebra.join(A,T7);
T13:= algebra.join(T1,T8);
scheduler.choice("getVolume",T9,T10,T11,T12,T13);
answer:= scheduler.pick(T9, T10, T11, T12, T13);
The scheduler.choice()
operator calls a builtin getVolume
for each target variable and expects an integer-valued cost.
In this case it returns the total number of bytes uses as arguments.
The target variable with the lowest cost is chosen for execution and remaining variables are turned into a temporary NOOP operation. (You may want to reuse the memo) They are skipped by the interpreter, but also in subsequent calls to the scheduler. It reduces the alternatives as we proceed in the plan.
A built-in naive cost function is used. It would be nice if the user could provide a private cost function defined as a pattern
with a polymorphic argument for the target and a :lng
result.
Its implementation can use the complete context information to make a decision.
For example, it can trace the potential use of the target variable in subsequent statements to determine a total cost when this step is taken towards the final result.
A complete plan likely includes other expressions to prepare or use the target variables before reaching the next choice point. It is the task of the choice operator to avoid any superfluous operation.
The MAL block should be privately owned by the caller, which can be assured with scheduler.isolation()
.
A refinement of the scheme is to make cost analysis part of the plan as well. Then you don't have to include a hardwired cost function.
Acost:= aggr.count(A);
Bcost:= aggr.count(B);
Ccost:= aggr.count(C);
T1cost:= Acost+Bcost;
T2cost:= Bcost+Ccost;
T3cost:= Ccost+Dcost;
scheduler.choice(T1cost,T1, T2cost,T2, T3cost,T3);
T1:= algebra.join(A,B);
T2:= algebra.join(B,C);
T3:= algebra.join(C,D);
...