Re: Costly bat.new/bat.append not optimized out

28 Aug 2014

      ...
Roberto Cornacchia  writes:
[..]
...
Ps. It's actually not very difficult to write it yourself if you feel
like it. There is a template in the code.
Interesting. I think we will go for this.
We're working on it. Fortunately, it seems rather easy to plug a custom
optimizer, which is rather nice.

To give a recap, we would like to optimize away the following case:

 big := <big BAT>
 a := bat.new(..)
 a := bat.append(a, big)
 .. from this point use `a', and no longer use `big' ..

However, it is not as trivial as we thought, because to eliminate
useless (in appearence) bat.new/bat.append pair, we need to know that
the BAT that is processed and for which we want to avoid a costly copy
is not used elsewhere.

So we decided to whitelist some MAL functions which are known as
producing a new BAT rather than returning one of their arguments. This
is not nice obviously, and we might make mistake when deciding if such
functions are really functional or not (ie mutating/returning one of the
input BAT.) We need to review the code of each function to ensure that.

A better solution, but much more complicated for us, and that would
require major changes would be to postpone the operation until really
needed by using a copy-on-write mechanism.

That is, in the example above, when the bat.append function is called,
it see that it try to append to an empty BAT, in which case it would
return a BAT *proxy* that could act as a regular BAT. Then whenever an
attempt would be made to change either of these BAT (`a` or `big`), a
copy would be triggered. This way the copy would occur only when needed.

Simpler to say than to implement I know. (Especially when not knowing
all the intricacies of the internal processing of BAT.)

Likewise, we see another optimization opportunity with a query like
this:

  SELECT our_computation(some_col) FROM some_table LIMIT 10;

which produce the following MAL code:

  t := ...
  u := udf.complex_computation(t)   # process 10M rows
  v := algebra.subslice(u, 0, 9)
  w := algebra.leftfetchjoin(v, u)  # get only the ten first rows!

This code is not optimal because we could have processed 10 rows instead
of millions if our custom operator was run after the slice. But to
figure that, we would need a way to put attribute on functions to tell
if the order of the rows or the number of rows matter for the
operation. Knowing that the code could be rewritten as:

  t := ...
  v := algebra.subslice(t, 0, 9)
  w := algebra.leftfetchjoin(v, t)  # get the ten first rows
  u := udf.complex_computation(w)   # process 10 rows only

As a workaround we tried to write the query as:

  SELECT our_computation(some_col) FROM (SELECT some_col FROM some_table LIMIT 10) as q;

But it seems that the LIMIT is not supported in subselect.

We could also write an optimizer for that (and we will do I think),
again by whitelisting functions.

But overall, what is missing is a way to flag function to let the
optimizer know more about them.

-- 
Frédéric Jolliton
SecurActive