no_mitosis_optimizer - is it advisable?
From the mserver5 man page, I got this - "forcefully activate mitosis even on small tables, i.e., split small tables in as many (tiny) pieces as there are cores (threads) available;" So, does this mean with the mitosis optimizer, the tables are split and
Hi, I have been working on join performances with MonetDB. I tried using various optimizer pipelines. Few costly queries which took 15 seconds with the default pipeline, returned results as fast as 5 seconds with the 'no_mitosis_optimizer' pipeline. I am looking to study more on mitosis optimizer. Is there any reference on what does it do? processed? If so, then why are queries slower with mitosis optimizer in the pipeline? Also, from the monetdb man page, I was alerted that "Changing this setting is discouraged at all times." What is the disadvantage of changing the optimizer pipeline to something other than the 'default_pipe'? Though the 'no_mitosis_optimizer' is stable, is it worth for production? Any help much appreciated. Thanks & Regards, Vijayakrishna.P. Mobile : (+91) 9500402305.
Your question is somewhat confusing but here I share some of our experience
related to the same topic.
Our experiments with MonetFB shows that the default mitosis
optimization MonetDB provides (splitting the longest column in the
execution plan to initiate plan graph mitosis) works quite well on
multicore systems for typical queries.
We did some further work on adding an optimization module where the
table(s) selected for "mitosis"-splitting (and number of splits -- always
being kept to be less, in total, than the number of available cores) can
vary according to some "optimization strategy" related to our estimate
of total processing cost. In brief form, we published this work and some
simplifications and findings last year.
Regards,
Masood Mortazavi
On Sunday, January 4, 2015, Vijay Krishna
Hi,
I have been working on join performances with MonetDB. I tried using various optimizer pipelines.
Few costly queries which took 15 seconds with the default pipeline, returned results as fast as 5 seconds with the 'no_mitosis_optimizer' pipeline.
I am looking to study more on mitosis optimizer. Is there any reference on what does it do? From the mserver5 man page, I got this - "forcefully activate mitosis even on small tables, i.e., split small tables in as many (tiny) pieces as there are cores (threads) available;" So, does this mean with the mitosis optimizer, the tables are split and processed? If so, then why are queries slower with mitosis optimizer in the pipeline?
Also, from the monetdb man page, I was alerted that "Changing this setting is discouraged at all times." What is the disadvantage of changing the optimizer pipeline to something other than the 'default_pipe'? Though the 'no_mitosis_optimizer' is stable, is it worth for production?
Any help much appreciated.
Thanks & Regards,
Vijayakrishna.P. Mobile : (+91) 9500402305.
I personally have a love-hate relationship with mitosis.
My experience is that overall it does provide nice performance improvements
when considering a reasonably large spectrum of query / data combinations.
That is just a very rough indication.
Do not expect it will always be beneficial though, because it won't be. As
you have found yourself, it does slow down considerably certain query /
data combinations.
Notice that I talk about query / data combinations, not just queries. Data
splitting for parallel processing is a bet. The query plan becomes more
complicated and it can happen that you end up doing much more work than you
would do without splitting (not to mention that mitosis considers only very
simple strategies for splitting).
An example of how things can go very wrong (8 minutes vs. 14 seconds) -
just because of data distribution:
https://www.monetdb.org/bugzilla/show_bug.cgi?id=3437
The current version of mitosis, together with the underlying data
statistics available at optimization time, cannot do much better, I'm
afraid.
If your data / query pool doesn't vary very much, I suggest you take a
close look at what gets slow and why, and decide whether to keep mitosis or
not. Or, better: decide which data / query patterns prefer mitosis and
which not.
Regards,
Roberto
On 5 January 2015 at 11:36, Masood Mortazavi
Your question is somewhat confusing but here I share some of our experience related to the same topic.
Our experiments with MonetFB shows that the default mitosis optimization MonetDB provides (splitting the longest column in the execution plan to initiate plan graph mitosis) works quite well on multicore systems for typical queries.
We did some further work on adding an optimization module where the table(s) selected for "mitosis"-splitting (and number of splits -- always being kept to be less, in total, than the number of available cores) can vary according to some "optimization strategy" related to our estimate of total processing cost. In brief form, we published this work and some simplifications and findings last year.
Regards, Masood Mortazavi
On Sunday, January 4, 2015, Vijay Krishna
wrote: Hi,
I have been working on join performances with MonetDB. I tried using various optimizer pipelines.
Few costly queries which took 15 seconds with the default pipeline, returned results as fast as 5 seconds with the 'no_mitosis_optimizer' pipeline.
I am looking to study more on mitosis optimizer. Is there any reference on what does it do? From the mserver5 man page, I got this - "forcefully activate mitosis even on small tables, i.e., split small tables in as many (tiny) pieces as there are cores (threads) available;" So, does this mean with the mitosis optimizer, the tables are split and processed? If so, then why are queries slower with mitosis optimizer in the pipeline?
Also, from the monetdb man page, I was alerted that "Changing this setting is discouraged at all times." What is the disadvantage of changing the optimizer pipeline to something other than the 'default_pipe'? Though the 'no_mitosis_optimizer' is stable, is it worth for production?
Any help much appreciated.
Thanks & Regards,
Vijayakrishna.P. Mobile : (+91) 9500402305.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 2015-01-05 at 07:50, Vijay Krishna wrote:
I have been working on join performances with MonetDB. I tried using various optimizer pipelines.
Few costly queries which took 15 seconds with the default pipeline, returned results as fast as 5 seconds with the 'no_mitosis_optimizer' pipeline.
If everyone's offering their two cents on mitosis, then I'll mention that my experience is documented in https://www.monetdb.org/bugzilla/show_bug.cgi?id=3548. That bug includes a patch implementing some broadly similar ideas to those in Masood's paper[1], although I was unaware of his work before today. You haven't told us much about your queries, your dataset or your hardware, so all the replies you're going to get will involve guesswork. I'm assuming that your data set easily fits in RAM, i.e. you're not I/O-bound. Basically all cases I've found where mitosis makes things slower happen when MonetDB splits a large table into N pieces (where N=# of CPU cores) only to re-merge all (or nearly all) of it again. Roberto's bug 3437 is like this, as he confirms in comment 9, and it's even possible to come up with a query which splits a large table, does absolutely no work on it, then merges it all back together again before continuing with the rest of the plan. Ironically, the easiest way to provoke this is with the simplest query possible: select col from big_table;. As soon as you put a WHERE clause with decent selectivity on that query then the split makes sense, but for just dumping an entire table it doesn't. Mitosis will split the largest source table in your query, whatever and wherever that is (as long as it's big enough). Think about how you would go about implementing the join you have written if you were doing it yourself, and consider whether subdividing the largest table will make the job easier or harder. MonetDB does have many tricks to make even crazy queries run well, but it also has cases where it's not quite smart enough. Richard. [1] http://www.researchgate.net/publication/264700632_Cost-Based_Data-Partitioni...
participants (4)
-
Masood Mortazavi
-
Richard Hughes
-
Roberto Cornacchia
-
Vijay Krishna