
Here's some additional observations on this puzzling issue: The slow table is "cmdb", the normal table is "cmdb2". Both have the same number of rows according to "SELECT count(*)" (73553310). However, sys.statistics shows something a bit different: sql>SELECT x.name, count(*) AS columns_with, z.count FROM sys.tables AS x, sys.columns AS y, sys.statistics AS z WHERE x.id = y.table_id AND y.id = z.column_id GROUP BY x.name, z.count; +-------+--------------+----------+ | name | columns_with | count | +=======+==============+==========+ | cmdb | 8 | 73553310 | | cmdb | 145 | 73553311 | | cmdb2 | 153 | 73553310 | +-------+--------------+----------+ 3 tuples sql> For the slow table, "cmdb", there are 145/153 columns that have an extra "row". This row count mismatch causes it to skip parallelization optimizations in "monetdb5/optimizer/opt_mitosis.c", line 212. A few related questions: 1) In the columnar model, is is valid for conceptual tuples to exist which don't have values for each column like this? (This seems strange as now there's a difference between an explicit NULL and "not existing"). How might this situation have happened? (Only transactional full-row inserts were applied to this DB). Can this be repaired? 2) Is the conditional in "monetdb5/optimizer/opt_mitosis.c", line 212 sufficiently robust? I may be misunderstanding the full context of this line, but it seems like we could still benefit from splitting/parallelization when r < rowcnt, no? Thanks, -Jeremy Norris
participants (1)
-
Jeremy Norris