is to leverage any existing grouping key ordering to partition for parallelism.

It is not load balanced and vulnerable to skew, but it is not serial either.

On Mar 28, 2018, at 07:07, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:

parallelism by group:
- (maximal) degree of parallelism is limited by the number of groups
(of course, only a problem if there are very few groups)
- (perfect) load-balancing between groups is difficult (NP-hard)
- partitioning by groups requires an extra copy of the entire column
currently, mitosis-based parallelism simply "slices" the input columns
horizontally (irrespective of the actual data) and thus (1) does not require any
(extra) data movement/copy, (2) allows (almost) arbitrary degree of parallelism
(well, expect when correctness prevents such straight-forward parallelism as in
the given case), and (2) "ensures" (almost) perfect load-balancing at virtually
no cost.

Having say that, we are looking into data-partitioned parallelism (rather than
simple slice-based parallelism), and then your suggestion of parallelism by group
is to be considered as well.

Best regards,
-------------------------------------------------
Richard Wesley
Senior Research Scientist
Tableau Software

t: 206.633.3400 x6335249
f: 206.633.3004
e: hawkfish@tableau.com