Hi Fabian,
Thank you very much for your email.
I had the
same concern about the accumulation error of one-scan approach as you
do, even though mathematically they should be exactly the same. I
remember about 10 years ago, I did some experimentations on both of the
approaches and I did not found any noticeable differences between the
results, but the sample size I tried then is not huge.
For the sample size of the order 1 trillion, I have the same concern
even with the two-scan approach. For example, After 500 billion
entries have been added to the sum of square, the rest may not be able
to be added up anymore, since it may well with the range of accumulation
errors! For a huge BAT, if the precision is top priority, we may need
to use an hierarchical approach: divide the huge sample into some
manageable block, compute it for each block first, and then aggregate
the results.
Thanks,
Yinhe