
I know Monet keeps a lot of metadata around, and one thing I have done in a similar system is to leverage any existing grouping key ordering to partition for parallelism. It is not load balanced and vulnerable to skew, but it is not serial either.
On Mar 28, 2018, at 07:07, Stefan Manegold
wrote: parallelism by group: - (maximal) degree of parallelism is limited by the number of groups (of course, only a problem if there are very few groups) - (perfect) load-balancing between groups is difficult (NP-hard) - partitioning by groups requires an extra copy of the entire column currently, mitosis-based parallelism simply "slices" the input columns horizontally (irrespective of the actual data) and thus (1) does not require any (extra) data movement/copy, (2) allows (almost) arbitrary degree of parallelism (well, expect when correctness prevents such straight-forward parallelism as in the given case), and (2) "ensures" (almost) perfect load-balancing at virtually no cost.
Having say that, we are looking into data-partitioned parallelism (rather than simple slice-based parallelism), and then your suggestion of parallelism by group is to be considered as well.
Best regards, ------------------------------------------------- Richard Wesley Senior Research Scientist Tableau Software t: 206.633.3400 x6335249 f: 206.633.3004 e: hawkfish@tableau.com