Thanks Hannes,

Do you mean you don't get a SIGSEGV but a failure like in my first example (aggr_group), or do you get a correct result?

I am using Jul2015 updated to yesterday, 30 Sept 2015

Below the SIGSEGV.

This conversation should probably be in a bug report, I first posted it here because the 200000 looked too strange to be a coincidence, I thought it could have been due to some hard-coded debug limitation.


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f5d31e00700 (LWP 29057)]
0x00007f5d3499cf10 in RAPIeval (cntxt=0x7f5d35a12328, mb=0x7f5d2431cbe0, stk=0x7f5d24270ec0, pci=0x7f5d24270230, grouped=1 '\001') at /opt/spinque/MonetDBServer/MonetDB.Spinque_Jul2015/src/monetdb5/extras/rapi/rapi.c:518
518 BAT_TO_REALSXP(b, lng, varvalue);
(gdb) bt
#0  0x00007f5d3499cf10 in RAPIeval (cntxt=0x7f5d35a12328, mb=0x7f5d2431cbe0, stk=0x7f5d24270ec0, pci=0x7f5d24270230, grouped=1 '\001') at /opt/spinque/MonetDBServer/MonetDB.Spinque_Jul2015/src/monetdb5/extras/rapi/rapi.c:518
#1  0x00007f5d3499b276 in RAPIevalAggr (cntxt=0x7f5d35a12328, mb=0x7f5d2431cbe0, stk=0x7f5d24270ec0, pci=0x7f5d24270230) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Jul2015/src/monetdb5/extras/rapi/rapi.c:387
#2  0x00007f5d3d21766e in runMALsequence (cntxt=0x7f5d35a12328, mb=0x7f5d2431cbe0, startpc=39, stoppc=40, stk=0x7f5d24270ec0, env=0x0, pcicaller=0x0) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Jul2015/src/monetdb5/mal/mal_interpreter.c:631
#3  0x00007f5d3d21c9be in DFLOWworker (T=0x7f5d3d696900 <workers+128>) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Jul2015/src/monetdb5/mal/mal_dataflow.c:376
#4  0x000000394d007555 in start_thread () from /lib64/libpthread.so.0
#5  0x000000394cd02b9d in clone () from /lib64/libc.so.6


On 1 October 2015 at 11:46, Hannes Mühleisen <hannes.muehleisen@cwi.nl> wrote:
Hi Roberto,

thanks for the report and the test case, we have added it to our suite:

http://dev.monetdb.org/hg/MonetDB/file/3b913e66ba5d/sql/backends/monet5/Tests/rapi18.sql

Regarding the cause, we are not getting a crash, but will investigate.

Hannes


> On 01 Oct 2015, at 10:50, Roberto Cornacchia <roberto.cornacchia@gmail.com> wrote:
>
> The previous example is a simplification of a real aggregate I'm working on, hence the two columns.
>
> I just tried on one column, and it still fails at exactly 200000 tuples in input.
>
> Here, however, I get a SIGSEGV.
>
>
> This is a reproducible example:
>
> START TRANSACTION;
>
> -- the (fake) aggregate function
> CREATE AGGREGATE sow(n int) RETURNS DOUBLE LANGUAGE R {
>   sow_aggr <- function(df) { 42.0 }
>
>   aggregate(n, list(aggr_group), sow_aggr)$x
> };
>
> -- function to generate input data
> CREATE FUNCTION tt() RETURNS TABLE (g int, n int) LANGUAGE R {
>   g <- rep(1:500, rep(400,500))
>   data.frame(g,as.integer(10))
> };
>
> CREATE TABLE good as select * from tt() limit 199999 with data;
> CREATE TABLE bad as select * from tt() limit 200000 with data;
>
> select count(distinct g) from good;
> select count(distinct g) from bad;
>
> select g, sow(n) from good group by g;
> select g, sow(n) from bad group by g;
>
> ROLLBACK;
>
>
> On 1 October 2015 at 10:06, Roberto Cornacchia <roberto.cornacchia@gmail.com> wrote:
> Hi,
>
> I have a suspect failure at exactly 200K tuples in input.
>
> I declared a simple aggregate on two columns (here it doesn't aggregate, for simplicity, it just returns 42).
>
>
> START TRANSACTION;
>
> CREATE table test (customer int, d string, n int);
> INSERT INTO test VALUES(1,'2015-01-01', 100);
> INSERT INTO test VALUES(1, '2015-01-02', 100);
> INSERT INTO test VALUES(2, '2015-01-03', 100);
> INSERT INTO test VALUES(2, '2015-01-01', 100);
> INSERT INTO test VALUES(2, '2015-01-02', 100);
>
>
> CREATE AGGREGATE sow(d string, n int) RETURNS DOUBLE LANGUAGE R {
>
>   # aggregation function is constant, not important here
>   sow_aggr <- function(df) {
>    42.0
>   }
>
>   df <- cbind(d,n)
>   as.vector(by(df, aggr_group, sow_aggr))
> };
>
> select customer, sow(d,n) from test group by customer;
>
> ROLLBACK;
> +----------+--------------------------+
> | customer | L1                       |
> +==========+==========================+
> |        1 |                       42 |
> |        2 |                       42 |
> +----------+--------------------------+
>
>
> The result is what I had expected. That is true until table test is long 199999 tuples. When it's exactly 200000 tuples, I get:
>
> Error running R expression. Error message: Error in tapply(seq_len(200000L), list(INDICES = c(0, 1, 2, 3, 4, 5, 6,  :
>   arguments must have same length
> Calls: as.data.frame ... by.data.frame -> structure -> eval -> eval -> tapply
>
> I checked the vector aggr_group, and indeed it is not 200000 long, as it should be. Instead, it is just one longer than then number of distinct values for customer (the grouping column).
>
> Any thought?
>
> Roberto
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list


_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list