Hi Roberto, thanks for the report and the test case, we have added it to our suite: http://dev.monetdb.org/hg/MonetDB/file/3b913e66ba5d/sql/backends/monet5/Test... Regarding the cause, we are not getting a crash, but will investigate. Hannes
On 01 Oct 2015, at 10:50, Roberto Cornacchia
wrote: The previous example is a simplification of a real aggregate I'm working on, hence the two columns.
I just tried on one column, and it still fails at exactly 200000 tuples in input.
Here, however, I get a SIGSEGV.
This is a reproducible example:
-- the (fake) aggregate function CREATE AGGREGATE sow(n int) RETURNS DOUBLE LANGUAGE R { sow_aggr <- function(df) { 42.0 }
aggregate(n, list(aggr_group), sow_aggr)$x };
-- function to generate input data CREATE FUNCTION tt() RETURNS TABLE (g int, n int) LANGUAGE R { g <- rep(1:500, rep(400,500)) data.frame(g,as.integer(10)) };
CREATE TABLE good as select * from tt() limit 199999 with data; CREATE TABLE bad as select * from tt() limit 200000 with data;
select count(distinct g) from good; select count(distinct g) from bad;
select g, sow(n) from good group by g; select g, sow(n) from bad group by g;
On 1 October 2015 at 10:06, Roberto Cornacchia
wrote: Hi, I have a suspect failure at exactly 200K tuples in input.
I declared a simple aggregate on two columns (here it doesn't aggregate, for simplicity, it just returns 42).
CREATE table test (customer int, d string, n int); INSERT INTO test VALUES(1,'2015-01-01', 100); INSERT INTO test VALUES(1, '2015-01-02', 100); INSERT INTO test VALUES(2, '2015-01-03', 100); INSERT INTO test VALUES(2, '2015-01-01', 100); INSERT INTO test VALUES(2, '2015-01-02', 100);
# aggregation function is constant, not important here sow_aggr <- function(df) { 42.0 }
df <- cbind(d,n) as.vector(by(df, aggr_group, sow_aggr)) };
select customer, sow(d,n) from test group by customer;
ROLLBACK; +----------+--------------------------+ | customer | L1 | +==========+==========================+ | 1 | 42 | | 2 | 42 | +----------+--------------------------+
The result is what I had expected. That is true until table test is long 199999 tuples. When it's exactly 200000 tuples, I get:
Error running R expression. Error message: Error in tapply(seq_len(200000L), list(INDICES = c(0, 1, 2, 3, 4, 5, 6, : arguments must have same length Calls: as.data.frame ... by.data.frame -> structure -> eval -> eval -> tapply
I checked the vector aggr_group, and indeed it is not 200000 long, as it should be. Instead, it is just one longer than then number of distinct values for customer (the grouping column).
Any thought?
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list