parallelism by group:
- (maximal) degree of parallelism is limited by the number of groups
(of course, only a problem if there are very few groups)
- (perfect) load-balancing between groups is difficult (NP-hard)
- partitioning by groups requires an extra copy of the entire column
currently, mitosis-based parallelism simply "slices" the input columns
horizontally (irrespective of the actual data) and thus (1) does not require any
(extra) data movement/copy, (2) allows (almost) arbitrary degree of parallelism
(well, expect when correctness prevents such straight-forward parallelism as in
the given case), and (2) "ensures" (almost) perfect load-balancing at virtually
no cost.
Having say that, we are looking into data-partitioned parallelism (rather than
simple slice-based parallelism), and then your suggestion of parallelism by group
is to be considered as well.