Mercurial > hg > MonetDB-extend
changeset 47:c8140b1fabf5
Fix for dense group bats with non-zero tseqbase.
author | Sjoerd Mullender <sjoerd@acm.org> |
---|---|
date | Fri, 11 Jun 2021 10:19:12 +0200 (2021-06-11) |
parents | 3b9611f1b048 |
children | 099ce41179e9 |
files | gmean/README.rst gmean/gmean.c |
diffstat | 2 files changed, 19 insertions(+), 12 deletions(-) [+] |
line wrap: on
line diff
--- a/gmean/README.rst Fri Jun 11 10:02:09 2021 +0200 +++ b/gmean/README.rst Fri Jun 11 10:19:12 2021 +0200 @@ -426,8 +426,9 @@ oid o = canditer_next(&ci); BUN p = o - b->hseqbase; int v = vals[p]; - oid grp = gids ? gids[p] : g ? (oid) p : 0; /* group id */ + oid grp = gids ? gids[p] : g ? min + (oid) p : 0; /* group id */ if (grp >= min && grp <= max) { /* extra check */ + grp -= min; ... } } @@ -436,17 +437,22 @@ (``g`` is ``NULL``), then all values are in a single group. The group BAT may be dense (``BATtdense(g)`` is ``true``), then there is no C array that we can obtain using the ``Tloc`` macro described earlier, -but then we know that all values are in their own group. The third -possibility is the most common one where there is a group BAT and we -can retrieve the data using the ``Tloc`` macro. In this last case, -the ``gids`` pointer will be non-NULL and it can be used to find the -group ID for each value. +but then we know that all values are in their own group (*dense* means +that each value is one larger than the previous, and for ``oid`` +columns, these values are not actually stored). The third possibility +is the most common one where there is a group BAT and we can retrieve +the data using the ``Tloc`` macro. In this last case, the ``gids`` +pointer will be non-NULL and it can be used to find the group ID for +each value. The assignment of the ``grp`` variable follows this +logic: if ``gids`` is non-NULL, it is used to retrieve the group ID, +and if it is ``NULL``, we check the value of ``g`` to distinguish +between the other two cases. -For the algorithm we use two temporary bit masks to have two bits for -each group. One bit in one mask is used to indicate whether we have -seen any values in the group. The bit in the other mask is used to -maintain whether we have seen an even or an odd number of negative -values in the group. +For the algorithm we use two temporary bit masks to have two extra +bits for each group. One bit in one mask is used to indicate whether +we have seen any values in the group. The bit in the other mask is +used to maintain whether we have seen an even or an odd number of +negative values in the group. At the end of the function, we free all resources, set the properties of the result BAT and return it.
--- a/gmean/gmean.c Fri Jun 11 10:02:09 2021 +0200 +++ b/gmean/gmean.c Fri Jun 11 10:19:12 2021 +0200 @@ -162,8 +162,9 @@ oid o = canditer_next(&ci); /* id of candidate */ BUN p = o - b->hseqbase; /* BUN (index) of candidate */ int v = vals[p]; /* the actual value */ - oid grp = gids ? gids[p] : g ? (oid) p : 0; /* group id */ + oid grp = gids ? gids[p] : g ? min + (oid) p : 0; /* group id */ if (grp >= min && grp <= max) { /* extra check */ + grp -= min; /* zero based access */ if (is_int_nil(v)) continue; /* skip nils (no matter skip_nils) */ else if (v == 0)