changeset 47:c8140b1fabf5

Fix for dense group bats with non-zero tseqbase.
author Sjoerd Mullender <sjoerd@acm.org>
date Fri, 11 Jun 2021 10:19:12 +0200 (2021-06-11)
parents 3b9611f1b048
children 099ce41179e9
files gmean/README.rst gmean/gmean.c
diffstat 2 files changed, 19 insertions(+), 12 deletions(-) [+]
line wrap: on
line diff
--- a/gmean/README.rst	Fri Jun 11 10:02:09 2021 +0200
+++ b/gmean/README.rst	Fri Jun 11 10:19:12 2021 +0200
@@ -426,8 +426,9 @@
 	  oid o = canditer_next(&ci);
 	  BUN p = o - b->hseqbase;
 	  int v = vals[p];
-	  oid grp = gids ? gids[p] : g ? (oid) p : 0; /* group id */
+	  oid grp = gids ? gids[p] : g ? min + (oid) p : 0; /* group id */
 	  if (grp >= min && grp <= max) { /* extra check */
+		grp -= min;
 	  	...
 	  }
   }
@@ -436,17 +437,22 @@
 (``g`` is ``NULL``), then all values are in a single group.  The group
 BAT may be dense (``BATtdense(g)`` is ``true``), then there is no C
 array that we can obtain using the ``Tloc`` macro described earlier,
-but then we know that all values are in their own group.  The third
-possibility is the most common one where there is a group BAT and we
-can retrieve the data using the ``Tloc`` macro.  In this last case,
-the ``gids`` pointer will be non-NULL and it can be used to find the
-group ID for each value.
+but then we know that all values are in their own group (*dense* means
+that each value is one larger than the previous, and for ``oid``
+columns, these values are not actually stored).  The third possibility
+is the most common one where there is a group BAT and we can retrieve
+the data using the ``Tloc`` macro.  In this last case, the ``gids``
+pointer will be non-NULL and it can be used to find the group ID for
+each value.  The assignment of the ``grp`` variable follows this
+logic: if ``gids`` is non-NULL, it is used to retrieve the group ID,
+and if it is ``NULL``, we check the value of ``g`` to distinguish
+between the other two cases.
 
-For the algorithm we use two temporary bit masks to have two bits for
-each group.  One bit in one mask is used to indicate whether we have
-seen any values in the group.  The bit in the other mask is used to
-maintain whether we have seen an even or an odd number of negative
-values in the group.
+For the algorithm we use two temporary bit masks to have two extra
+bits for each group.  One bit in one mask is used to indicate whether
+we have seen any values in the group.  The bit in the other mask is
+used to maintain whether we have seen an even or an odd number of
+negative values in the group.
 
 At the end of the function, we free all resources, set the properties
 of the result BAT and return it.
--- a/gmean/gmean.c	Fri Jun 11 10:02:09 2021 +0200
+++ b/gmean/gmean.c	Fri Jun 11 10:19:12 2021 +0200
@@ -162,8 +162,9 @@
 		oid o = canditer_next(&ci); /* id of candidate */
 		BUN p = o - b->hseqbase;    /* BUN (index) of candidate */
 		int v = vals[p];	    /* the actual value */
-		oid grp = gids ? gids[p] : g ? (oid) p : 0; /* group id */
+		oid grp = gids ? gids[p] : g ? min + (oid) p : 0; /* group id */
 		if (grp >= min && grp <= max) { /* extra check */
+			grp -= min;		/* zero based access */
 			if (is_int_nil(v))
 				continue; /* skip nils (no matter skip_nils) */
 			else if (v == 0)