Re: Performance degradation

12 Jan 2019

      Here's some additional observations on this puzzling issue:

The slow table is "cmdb", the normal table is "cmdb2".  Both have the same
number of rows according to "SELECT count(*)" (73553310).  However,
sys.statistics shows something a bit different:

sql>SELECT x.name, count(*) AS columns_with, z.count FROM sys.tables AS x,
sys.columns AS y, sys.statistics AS z WHERE x.id = y.table_id AND y.id =
z.column_id GROUP BY x.name, z.count;
+-------+--------------+----------+
| name  | columns_with | count    |
+=======+==============+==========+
| cmdb  |            8 | 73553310 |
| cmdb  |          145 | 73553311 |
| cmdb2 |          153 | 73553310 |
+-------+--------------+----------+
3 tuples
sql>

For the slow table, "cmdb", there are 145/153 columns that have an extra
"row".  This row count mismatch causes it to skip parallelization
optimizations in "monetdb5/optimizer/opt_mitosis.c", line 212.

A few related questions:

1) In the columnar model, is is valid for conceptual tuples to exist which
don't have values for each column like this?  (This seems strange as now
there's a difference between an explicit NULL and "not existing").  How
might this situation have happened?  (Only transactional full-row inserts
were applied to this DB).  Can this be repaired?
2) Is the conditional in "monetdb5/optimizer/opt_mitosis.c", line 212
sufficiently robust?  I may be misunderstanding the full context of this
line, but it seems like we could still benefit from
splitting/parallelization when r < rowcnt, no?

Thanks,
-Jeremy Norris

Jeremy Norris

tags

participants (1)