[MonetDB-users] ERROR GDKload failed....CMDthetajoin; operation failed
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key
column called 'seq'. The select is limited to less than 400 rows of
output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where
a1.seq<=400 and a1.seq>=17
and a1.seq
I assume that the first query was executed successfully in a separate
transaction.
The second query builds a huge intermediate result incrementally.
To understand such queries you might try a toy example first
(e.g. a table of 10 elements and look at the intermediate
results)
Consider a1.seq = 18 then there are 40M-18 combinations
for which a1.seq Hello again kind people. This is an sql self join on TableA which contains a serial primary key
column called 'seq'. The select is limited to less than 400 rows of
output by 'a1.seq<=400'. sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where
a1.seq<=400 and a1.seq>=17
and a1.seq This select statement results in 384 rows, as expected, with a newly
calculated column for sum(a2.num)/17 which yields a running average of
'num' between any given value for 'seq' and 'seq+17'. This next sql statement is exactly the same as above except the
limitation of 384 rows of output imposed by 'a1.seq<=400' has been
removed. Note that TableA contained 40 million plus rows. sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and
a1.seq>=17 and
a1.seq Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv
ERROR: CMDthetajoin: operation failed. Perhaps there is a better way to accomplish a running aggregate on 40
million rows? Thanks.........George -------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Tue, Oct 10, 2006 at 11:27:31AM -0700, George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq
ie select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc Not sure this will give a solution to the data explosion as our current mapping will have problems with turn both into bat theta joins, which will be close to the crosstable of 40M*40M rows. This will be an other item on my list to fix... Niels
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Sorry Niels I did not correctly report the statement I used. I used the limitation on the lower end of the band just as you wrote it with 'a1.seq>=a2.seq': select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc result is same error, same data explosion. Thanks for your comment though Best...George Niels Nes wrote:
On Tue, Oct 10, 2006 at 11:27:31AM -0700, George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq
I guess you want some limitation on the lower and of the band join here.
ie
select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc
Not sure this will give a solution to the data explosion as our current mapping will have problems with turn both into bat theta joins, which will be close to the crosstable of 40M*40M rows.
This will be an other item on my list to fix...
Niels
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Sorry:
The statement that generated the error is inccorectly reported in the
initial post. The statement was actually:
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and
a1.seq>=17 and
a1.seq>=a2.seq and a1.seq Hello again kind people. This is an sql self join on TableA which contains a serial primary key
column called 'seq'. The select is limited to less than 400 rows of
output by 'a1.seq<=400'. sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where
a1.seq<=400 and a1.seq>=17
and a1.seq This select statement results in 384 rows, as expected, with a newly
calculated column for sum(a2.num)/17 which yields a running average of
'num' between any given value for 'seq' and 'seq+17'. This next sql statement is exactly the same as above except the
limitation of 384 rows of output imposed by 'a1.seq<=400' has been
removed. Note that TableA contained 40 million plus rows. sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and
a1.seq>=17 and
a1.seq Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv
ERROR: CMDthetajoin: operation failed. Perhaps there is a better way to accomplish a running aggregate on 40
million rows? Thanks.........George
participants (3)
-
George H
-
Martin Kersten
-
Niels Nes