On 12/6/10 2:07 PM, Mike De-La-Columnar wrote:
Hi fellas,
Hi Mike
If its not too much of a hassle, i have a few questions regarding parallelism& the internal algorithms
1. What is the difference between Data-Flow, and Mitosis?
See optimizer descriptions: http://monetdb.cwi.nl/MonetDB/Documentation/Optimizer-Toolkit.html Orthogonal issues.
2. Is there a special reason why the distinct count aggregation is not parallelized just like the other aggregation functions? Is it just a matter of not making it into the release, or is there a fundamental problem here.
Distinct is a global property, not a local one. A distributed version may easily lead to throwing the complete underlying table around.
Note: even though the parallel distinct can be slower in certain distributions, it could still be useful...
3. Are there any plans on supporting parallel frameworks like OpenMP/Intel TBB directly within the low-level MAL functions? Providing data-parallelism on the loop-code level could even increase performance.
Or did you already play with it and decided that the performance gain is just not worth it? Depends on the amount of work per byte unit.
4. Is there a way for me to prevent MonetDB server from cleaning up all intermediate BATs between server restarts. I've read here that some intermediates (like join indices etc.) are lost between restarts... is it still correct? All temporaries are indeed lost. Join indices for foreign key checking are retained on disk. Rebuilding hashes is cheaper (mostly) then write/read
It could improve, but there should be a clear driving application. them from disk
Thank you very much for your time, Keep up the amazing job of building the most powerful columnar database out there!
Thanks for the support !. Martin