Re: no_mitosis_optimizer - is it advisable?

5 Jan 2015

      On 2015-01-05 at 07:50, Vijay Krishna wrote:
...
I have been working on join performances with MonetDB. I tried
using various optimizer pipelines. 
Few costly queries which took 15 seconds with the default pipeline,
returned results as fast as 5 seconds with the 'no_mitosis_optimizer' pipeline. 
If everyone's offering their two cents on mitosis, then I'll mention
that my experience is documented in
https://www.monetdb.org/bugzilla/show_bug.cgi?id=3548. That bug
includes a patch implementing some broadly similar ideas to those in
Masood's paper[1], although I was unaware of his work before today.

You haven't told us much about your queries, your dataset or your
hardware, so all the replies you're going to get will involve
guesswork.

I'm assuming that your data set easily fits in RAM, i.e. you're not
I/O-bound.

Basically all cases I've found where mitosis makes things slower
happen when MonetDB splits a large table into N pieces (where N=# of
CPU cores) only to re-merge all (or nearly all) of it again. Roberto's
bug 3437 is like this, as he confirms in comment 9, and it's even
possible to come up with a query which splits a large table, does
absolutely no work on it, then merges it all back together again
before continuing with the rest of the plan. Ironically, the easiest
way to provoke this is with the simplest query possible: select col
from big_table;. As soon as you put a WHERE clause with decent
selectivity on that query then the split makes sense, but for just
dumping an entire table it doesn't.

Mitosis will split the largest source table in your query, whatever
and wherever that is (as long as it's big enough). Think about how you
would go about implementing the join you have written if you were
doing it yourself, and consider whether subdividing the largest table
will make the job easier or harder. MonetDB does have many tricks to
make even crazy queries run well, but it also has cases where it's not
quite smart enough.

Richard.

[1] http://www.researchgate.net/publication/264700632_Cost-Based_Data-Partitioni...

Re: no_mitosis_optimizer - is it advisable?

Richard Hughes