
On 2015-01-05 at 07:50, Vijay Krishna wrote:
I have been working on join performances with MonetDB. I tried using various optimizer pipelines.
Few costly queries which took 15 seconds with the default pipeline, returned results as fast as 5 seconds with the 'no_mitosis_optimizer' pipeline.
If everyone's offering their two cents on mitosis, then I'll mention that my experience is documented in https://www.monetdb.org/bugzilla/show_bug.cgi?id=3548. That bug includes a patch implementing some broadly similar ideas to those in Masood's paper[1], although I was unaware of his work before today. You haven't told us much about your queries, your dataset or your hardware, so all the replies you're going to get will involve guesswork. I'm assuming that your data set easily fits in RAM, i.e. you're not I/O-bound. Basically all cases I've found where mitosis makes things slower happen when MonetDB splits a large table into N pieces (where N=# of CPU cores) only to re-merge all (or nearly all) of it again. Roberto's bug 3437 is like this, as he confirms in comment 9, and it's even possible to come up with a query which splits a large table, does absolutely no work on it, then merges it all back together again before continuing with the rest of the plan. Ironically, the easiest way to provoke this is with the simplest query possible: select col from big_table;. As soon as you put a WHERE clause with decent selectivity on that query then the split makes sense, but for just dumping an entire table it doesn't. Mitosis will split the largest source table in your query, whatever and wherever that is (as long as it's big enough). Think about how you would go about implementing the join you have written if you were doing it yourself, and consider whether subdividing the largest table will make the job easier or harder. MonetDB does have many tricks to make even crazy queries run well, but it also has cases where it's not quite smart enough. Richard. [1] http://www.researchgate.net/publication/264700632_Cost-Based_Data-Partitioni...