# Join Paths

Join Paths mk Tue, 04/28/2020 - 17:49#### 5.2.15 Join Paths

The routine optimizer.joinPath() walks through the program looking for join operations and cascades them into multiple join paths. To illustrate, consider

a:= bat.new(:oid,:oid); b:= bat.new(:oid,:oid); c:= bat.new(:oid,:str); j1:= algebra.join(a,b); j2:= algebra.join(j1,c); j3:= algebra.join(b,b); j4:= algebra.join(b,j3);

The optimizer will first replace all arguments by their join sequence. The underlying instructions are left behind for the deadcode optimizer to be removed.

a:= bat.new(:oid,:oid); j1:= algebra.join(a,b); j2:= algebra.joinPath(a,b,c); j3:= algebra.join(b,b); j4:= algebra.joinPath(b,b,b);

The same operation is applied to sequences of leftjoin operations.

Any join result used multiple times should be calculated once. A list of inlined paths is kept around to detect those.

In principle, the joinpaths may contain common subpaths, whose materialization would improve performance.

The SQL front-end produces often produces snippets of the following structure

t1:= algebra.join(b,c); z1:= algebra.join(a,t1); ... t2:= algebra.join(b,d); z2:= algebra.join(a,t2);

The joinpath would merge them into

z1:= algebra.joinPath(a,b,c); ... z2:= algebra.joinPath(a,b,d);

which are handle by a heuristic looking at the first two argments and re-uses a materialized join.

_13:= algebra.join(a,b); z1:= algebra.join(_13,c); ... z2:= algebra.join(_13,d);

An alternative is to make recognition of any common re-useable paths an integral part of the joinPath body.

x3:= algebra.join(a,b); r3:= bat.reverse(x3); j1:= join(c,r3);

rb:= bat.reverse(b); ra:= bat.reverse(a); j1:= algebra.joinpath(c,rb,ra);

As a final step in the speed up of the joinpath we consider clustering large operands if that is expected to improve IO behavior.