Hi Roberto, thanks for the report and the test case, we have added it to our suite: http://dev.monetdb.org/hg/MonetDB/file/3b913e66ba5d/sql/backends/monet5/Test... Regarding the cause, we are not getting a crash, but will investigate. Hannes
On 01 Oct 2015, at 10:50, Roberto Cornacchia
wrote: The previous example is a simplification of a real aggregate I'm working on, hence the two columns.
I just tried on one column, and it still fails at exactly 200000 tuples in input.
Here, however, I get a SIGSEGV.
This is a reproducible example:
START TRANSACTION;
-- the (fake) aggregate function CREATE AGGREGATE sow(n int) RETURNS DOUBLE LANGUAGE R { sow_aggr <- function(df) { 42.0 }
aggregate(n, list(aggr_group), sow_aggr)$x };
-- function to generate input data CREATE FUNCTION tt() RETURNS TABLE (g int, n int) LANGUAGE R { g <- rep(1:500, rep(400,500)) data.frame(g,as.integer(10)) };
CREATE TABLE good as select * from tt() limit 199999 with data; CREATE TABLE bad as select * from tt() limit 200000 with data;
select count(distinct g) from good; select count(distinct g) from bad;
select g, sow(n) from good group by g; select g, sow(n) from bad group by g;
ROLLBACK;
On 1 October 2015 at 10:06, Roberto Cornacchia
wrote: Hi, I have a suspect failure at exactly 200K tuples in input.
I declared a simple aggregate on two columns (here it doesn't aggregate, for simplicity, it just returns 42).
START TRANSACTION;
CREATE table test (customer int, d string, n int); INSERT INTO test VALUES(1,'2015-01-01', 100); INSERT INTO test VALUES(1, '2015-01-02', 100); INSERT INTO test VALUES(2, '2015-01-03', 100); INSERT INTO test VALUES(2, '2015-01-01', 100); INSERT INTO test VALUES(2, '2015-01-02', 100);
CREATE AGGREGATE sow(d string, n int) RETURNS DOUBLE LANGUAGE R {
# aggregation function is constant, not important here sow_aggr <- function(df) { 42.0 }
df <- cbind(d,n) as.vector(by(df, aggr_group, sow_aggr)) };
select customer, sow(d,n) from test group by customer;
ROLLBACK; +----------+--------------------------+ | customer | L1 | +==========+==========================+ | 1 | 42 | | 2 | 42 | +----------+--------------------------+
The result is what I had expected. That is true until table test is long 199999 tuples. When it's exactly 200000 tuples, I get:
Error running R expression. Error message: Error in tapply(seq_len(200000L), list(INDICES = c(0, 1, 2, 3, 4, 5, 6, : arguments must have same length Calls: as.data.frame ... by.data.frame -> structure -> eval -> eval -> tapply
I checked the vector aggr_group, and indeed it is not 200000 long, as it should be. Instead, it is just one longer than then number of distinct values for customer (the grouping column).
Any thought?
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list