MonetDB crash / termination issue.
Hi All, I'm not quite sure if i'm doing something wrong here but what is see if i performing a 2 table join on a primary key field is tablea and tableb work fine yet tablec which has less data crashes the daemon. SELECT a.ip_dst as ip_dst, b.tcp_dport as tcp_dport, count(*) as num from "myschema".tablea a LEFT JOIN "myschema".tableb b ON (a.guid = b.guid) WHERE b.tcp_dport IS NOT NULL GROUP BY a.ip_dst, b.tcp_dport ORDER BY num DESC LIMIT 20; This one works fine. However when i perform the join from tablea to tablec I get the following. sql>SELECT a.ip_dst as ip_dst, b.udp_dport as udp_dport, count(*) as num from "myschema".tablea a LEFT JOIN "myschema".tablec b ON (a.guid = b.guid) WHERE b.udp_dport IS NOT NULL GROUP BY a.ip_dst, b.udp_dport ORDER BY num DESC LIMIT 20 more>; Connection terminated I've added alot more detail to the pastebin link. I had the same issue with 11.13.5 after upgrading i have run into the same thing also. http://pastebin.com/m4XseSnH If you want to schema information to reproduce please just email me off list. Regards, Brian Hood
Hi Brian Thank you for reporting. For clarity, all tables are joined on the primary key only? If not, what is the result of SELECT count(distinct guid) from table{a,b,c} Likewise SELECT count(distinct udp_port) from table... Could you attach gdb to the mserver and collect a stack trace? regards, Martin On 1/27/13 7:59 PM, Brian Hood wrote:
Hi All,
I'm not quite sure if i'm doing something wrong here but what is see if i performing a 2 table join on a primary key field is tablea and tableb work fine yet tablec which has less data crashes the daemon.
SELECT a.ip_dst as ip_dst, b.tcp_dport as tcp_dport, count(*) as num from "myschema".tablea a LEFT JOIN "myschema".tableb b ON (a.guid = b.guid) WHERE b.tcp_dport IS NOT NULL GROUP BY a.ip_dst, b.tcp_dport ORDER BY num DESC LIMIT 20;
This one works fine.
However when i perform the join from tablea to tablec
I get the following.
sql>SELECT a.ip_dst as ip_dst, b.udp_dport as udp_dport, count(*) as num from "myschema".tablea a LEFT JOIN "myschema".tablec b ON (a.guid = b.guid) WHERE b.udp_dport IS NOT NULL GROUP BY a.ip_dst, b.udp_dport ORDER BY num DESC LIMIT 20 more>; Connection terminated
I've added alot more detail to the pastebin link.
I had the same issue with 11.13.5 after upgrading i have run into the same thing also.
If you want to schema information to reproduce please just email me off list.
Regards,
Brian Hood
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Yes i'm always joining on the guid field which is the primary key on all 3
tables.
Ok thankyou initial findings are as follows tablec is by no means correct
which i'll have to investigate however it still shouldn't of crashed the
database but maybe of causing some serious main memory allocation trying to
join it.
however tableb also isn't quite correct either about 300 or so rows out.
I'll come back with the stack trace.
sql>SELECT count(distinct guid) from "myschema".tablea;
+----------+
| L1 |
+==========+
| 10153268 |
+----------+
1 tuple (7.6s)
sql>SELECT count(distinct guid) from "myschema".tableb;
+---------+
| L1 |
+=========+
| 8726522 |
+---------+
1 tuple (7.3s)
sql>SELECT count(distinct guid) from "myschema".tablec;
+--------+
| L1 |
+========+
| 169640 | << Something is wrong
+--------+
1 tuple (187.282ms)
sql>select count(*) as num from "myschema".tablea;
+----------+
| num |
+==========+
| 10153768 |
+----------+
1 tuple (3.317ms)
sql>select count(*) as num from "myschema".tableb;
+---------+
| num |
+=========+
| 8726980 | < Hmm
+---------+
1 tuple (3.345ms)
sql>select count(*) as num from "myschema".tablec;
+--------+
| num |
+========+
| 931376 | << Something is wrong
+--------+
1 tuple (3.285ms)
On Sun, Jan 27, 2013 at 7:18 PM, Martin Kersten
Hi Brian
Thank you for reporting. For clarity, all tables are joined on the primary key only? If not, what is the result of SELECT count(distinct guid) from table{a,b,c} Likewise SELECT count(distinct udp_port) from table...
Could you attach gdb to the mserver and collect a stack trace?
regards, Martin
On 1/27/13 7:59 PM, Brian Hood wrote:
Hi All,
I'm not quite sure if i'm doing something wrong here but what is see if i performing a 2 table join on a primary key field is tablea and tableb work fine yet tablec which has less data crashes the daemon.
SELECT a.ip_dst as ip_dst, b.tcp_dport as tcp_dport, count(*) as num from "myschema".tablea a LEFT JOIN "myschema".tableb b ON (a.guid = b.guid) WHERE b.tcp_dport IS NOT NULL GROUP BY a.ip_dst, b.tcp_dport ORDER BY num DESC LIMIT 20;
This one works fine.
However when i perform the join from tablea to tablec
I get the following.
sql>SELECT a.ip_dst as ip_dst, b.udp_dport as udp_dport, count(*) as num from "myschema".tablea a LEFT JOIN "myschema".tablec b ON (a.guid = b.guid) WHERE b.udp_dport IS NOT NULL GROUP BY a.ip_dst, b.udp_dport ORDER BY num DESC LIMIT 20 more>; Connection terminated
I've added alot more detail to the pastebin link.
I had the same issue with 11.13.5 after upgrading i have run into the same thing also.
If you want to schema information to reproduce please just email me off list.
Regards,
Brian Hood
______________________________**_________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/**mailman/listinfo/users-listhttp://mail.monetdb.org/mailman/listinfo/users-list
______________________________**_________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/**mailman/listinfo/users-listhttp://mail.monetdb.org/mailman/listinfo/users-list
Assuming your database was not corrupted in the first place and the error is reproducable, it would help to see the stack trace. TO get it on a Linux box, use 'ps' to get the process id of the database server, call 'gdb mserver <processid>' and let it continue to run. Then re-issue the query, which should lead to gdb to report on the error. The command 'thr apply all where' would show what all threads are doing during the crash. thanks, Martin On 1/27/13 8:37 PM, Brian Hood wrote:
Yes i'm always joining on the guid field which is the primary key on all 3 tables.
Ok thankyou initial findings are as follows tablec is by no means correct which i'll have to investigate however it still shouldn't of crashed the database but maybe of causing some serious main memory allocation trying to join it.
however tableb also isn't quite correct either about 300 or so rows out.
Hi Martin,
I don't believe my database is corrupt is there anyway in MonetDB to check
the integrity of data ?
http://pastebin.com/3u4vX4TN - GDB Session
http://pastebin.com/0cUVQwjU - Terminal Session for the SQL i was running.
Hope this helps.
Brian Hood
On Sun, Jan 27, 2013 at 7:42 PM, Martin Kersten
Assuming your database was not corrupted in the first place and the error is reproducable, it would help to see the stack trace.
TO get it on a Linux box, use 'ps' to get the process id of the database server, call 'gdb mserver <processid>' and let it continue to run. Then re-issue the query, which should lead to gdb to report on the error.
The command 'thr apply all where' would show what all threads are doing during the crash.
thanks, Martin
On 1/27/13 8:37 PM, Brian Hood wrote:
Yes i'm always joining on the guid field which is the primary key on all 3 tables.
Ok thankyou initial findings are as follows tablec is by no means correct which i'll have to investigate however it still shouldn't of crashed the database but maybe of causing some serious main memory allocation trying to join it.
however tableb also isn't quite correct either about 300 or so rows out.
______________________________**_________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/**mailman/listinfo/users-listhttp://mail.monetdb.org/mailman/listinfo/users-list
On Mon, Jan 28, 2013 at 06:44:20PM +0000, Brian Hood wrote:
Hi Martin,
I don't believe my database is corrupt is there anyway in MonetDB to check the integrity of data ?
http://pastebin.com/3u4vX4TN - GDB Session Brian
Did you by any change run out of memory? The trace you show indicates that Monet crashes in the Hash join operator, which is a used quite offten (doesn't help us much in pointing out why /where in the plan it crashes). (could you possibly run your query prefixed by 'debug') And the single step through your 'mal' plan. Niels
http://pastebin.com/0cUVQwjU - Terminal Session for the SQL i was running.
Hope this helps.
Brian Hood
On Sun, Jan 27, 2013 at 7:42 PM, Martin Kersten
wrote: Assuming your database was not corrupted in the first place and the error is reproducable, it would help to see the stack trace.
TO get it on a Linux box, use 'ps' to get the process id of the database server, call 'gdb mserver <processid>' and let it continue to run. Then re-issue the query, which should lead to gdb to report on the error.
The command 'thr apply all where' would show what all threads are doing during the crash.
thanks, Martin
On 1/27/13 8:37 PM, Brian Hood wrote:
Yes i'm always joining on the guid field which is the primary key on all 3 tables.
Ok thankyou initial findings are as follows tablec is by no means correct which i'll have to investigate however it still shouldn't of crashed the database but maybe of causing some serious main memory allocation trying to join it.
however tableb also isn't quite correct either about 300 or so rows out.
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Well ok was wondering about the out of memory thing but it all happens to
fast as RAM does take a little time to fill the termination happens in a
few seconds.
I will perform the step by step trace and the monitor metrics.
On Jan 28, 2013 6:49 PM, "Niels Nes"
On Mon, Jan 28, 2013 at 06:44:20PM +0000, Brian Hood wrote:
Hi Martin,
I don't believe my database is corrupt is there anyway in MonetDB to check the integrity of data ?
http://pastebin.com/3u4vX4TN - GDB Session Brian
Did you by any change run out of memory? The trace you show indicates that Monet crashes in the Hash join operator, which is a used quite offten (doesn't help us much in pointing out why /where in the plan it crashes).
(could you possibly run your query prefixed by 'debug') And the single step through your 'mal' plan.
Niels
http://pastebin.com/0cUVQwjU - Terminal Session for the SQL i was running.
Hope this helps.
Brian Hood
On Sun, Jan 27, 2013 at 7:42 PM, Martin Kersten
wrote: Assuming your database was not corrupted in the first place and the error is reproducable, it would help to see the stack trace.
TO get it on a Linux box, use 'ps' to get the process id of the database server, call 'gdb mserver <processid>' and let it continue to run. Then re-issue the query, which should lead to gdb to report on the error.
The command 'thr apply all where' would show what all threads are doing during the crash.
thanks, Martin
On 1/27/13 8:37 PM, Brian Hood wrote:
Yes i'm always joining on the guid field which is the primary key on all 3 tables.
Ok thankyou initial findings are as follows tablec is by no means correct which i'll have to investigate however it still shouldn't of crashed the database but maybe of causing some serious main memory allocation trying to join it.
however tableb also isn't quite correct either about 300 or so rows out.
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Brian Hood
-
Martin Kersten
-
Niels Nes