Hi Stefan,

Thanks for your help, however I'm still seeing the same issue with your suggested changes - the 'g' BAT only contains 4 rows (2 groups x 2 threads) and the 'b' BAT contains the 249k rows which is the size of the data table - I was assuming that either the 'g' and 'b' BATs are the same size or that there is some form of indexing of 'g' to 'b'.

The aggregation I'm testing is this:

select "Gender", bin_samp("CustomerAge"), count(*) from "Customers" group by "Gender";

and the result I get is:

+--------+----------+--------+

| Gender | L1 | L2 |

+========+==========+========+

| Female | 12||12|| | 106024 |

| Male | 12||50|| | 143703 |

+--------+----------+--------+

I would have expected the bin_samp aggregate to have returned strings of 1000 values pipe delimited per Gender value rather than only 2 values – I’m assuming something is wrong in the mapping of ‘g’ to ‘b’ entries

The ‘g’ bat look like this:

#-----------------#

# h t # name

# void oid # type

#-----------------#

[ 0@0, 0@0 ]

[ 1@0, 1@0 ]

[ 2@0, 0@0 ]

[ 3@0, 1@0 ]

And the ‘b’ bat is:

#-------------------------#

# h t # name

# void int # type

#-------------------------#

[ 0@0, 12 ]

[ 1@0, 12 ]

[ 2@0, 12 ]

[ 3@0, 50 ]

[ 4@0, 56 ]

[ 5@0, 12 ]

[ 6@0, 12 ]

[ 7@0, 34 ]

[ 8@0, 12 ]

[ 9@0, 12 ]

… continues to @249727@0

Kind reagards,

Scott Mathieson