SELECT failure: "insufficient space for HEAP ..."
Hi, We have an application (http://tmt.gtri.gatech.edu/lytic.html) based on a relatively modest, single (~1-10M row, ~20 column) MonetDB table (Win7, 64 bit, Jan14-SP1 server release, quad-core, 8GB RAM) and have been encountering sporadic SELECT query failures with the error: "Insufficient space for HEAP of 1017856 bytes". Since I've been trying in earnest to diagnose this, that byte value has been consistent, FWIW. The failures are frustratingly intermittent; if I rerun a failing query (by hand, using mclient) it sometimes fails again, and sometimes works correctly. I've been unable to figure out any sort of pattern. Unfortunately, I can't post the exact table and query, but the general form of the queries (which have ~5-10 case clauses) are: SELECT CASE WHEN Foo = 'ValueA' and Bar LIKE 'ValueB%' THEN 100*(Baz-1234)/1234 WHEN Foo = 'ValueC' and Bar LIKE 'ValueD%' THEN 100*(Baz-3456)/3456 ... END FROM table WHERE <same case statement as above> IS NOT NULL LIMIT 1 Any ideas what I might do to narrow this down or other information I might provide? I captured an explain of a failing query--it's many hundreds (1000s?) of lines, so I didn't want to include it here in full. All suggestions appreciated. Thanks, Ed P.S. MonetDB has been a fantastic complement to our application, thanks to everyone involved!
Hi Edward, sorry for the inconveniences and thank you for your kind words! Could you check how large your merver5 process is at the time the error occurs? At that very time, how much main memory and how much disk space (on the partition/drive where your dbfarm is located) are still free? Instead of the (statically generated) explain, a "live" trace of a failing run would be more useful (if at all). To create a trace, you'd need to start the server by hand using the M5server.bat batch script, providing an extra "--trace" command line option, and collect the server console output. Also, if there is any more output before/after the error message you send, that might be helpful to analyze the problem. Best, Stefan ----- Original Message -----
Hi,
We have an application (http://tmt.gtri.gatech.edu/lytic.html) based on a relatively modest, single (~1-10M row, ~20 column) MonetDB table (Win7, 64 bit, Jan14-SP1 server release, quad-core, 8GB RAM) and have been encountering sporadic SELECT query failures with the error: "Insufficient space for HEAP of 1017856 bytes". Since I've been trying in earnest to diagnose this, that byte value has been consistent, FWIW.
The failures are frustratingly intermittent; if I rerun a failing query (by hand, using mclient) it sometimes fails again, and sometimes works correctly. I've been unable to figure out any sort of pattern.
Unfortunately, I can't post the exact table and query, but the general form of the queries (which have ~5-10 case clauses) are:
SELECT CASE WHEN Foo = 'ValueA' and Bar LIKE 'ValueB%' THEN 100*(Baz-1234)/1234 WHEN Foo = 'ValueC' and Bar LIKE 'ValueD%' THEN 100*(Baz-3456)/3456 ... END FROM table WHERE <same case statement as above> IS NOT NULL LIMIT 1
Any ideas what I might do to narrow this down or other information I might provide? I captured an explain of a failing query--it's many hundreds (1000s?) of lines, so I didn't want to include it here in full. All suggestions appreciated. Thanks,
Ed
P.S. MonetDB has been a fantastic complement to our application, thanks to everyone involved!
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi Stefan, Thanks for the response, see below:
Could you check how large your merver5 process is at the time the error occurs?
40-60 MB (private working set), which is pretty much in line with normal usage.
At that very time, how much main memory and how much disk space (on the partition/drive where your dbfarm is located) are still free?
I misspoke below: there are only 4 GB on the machine I'm using; however, there seems to be plenty of memory (>1.2 GB) and disk (>100 GB) free when the failure happens.
Instead of the (statically generated) explain, a "live" trace of a failing run would be more useful (if at all).
A difficulty is that I'm actually triggering this issue in a closed lab (i.e., heavy restrictions on data coming out, and no network access in or out), so I can't feasibly get the full output to attach to an email or put online. I transcribed the last few lines below (hopefully with no errors...) in case that helps. I did save the full trace log in the lab, so if you want me to look for anything specific, I'm happy to do so and report back. In reproducing this, I did notice something I hope is a clue: the number of bytes that the error consistently correlates with the table row count used in the query. Specifically: Table size | Byte Count in error 846721 847842 1016065 1017856 1693441 1693696
Also, if there is any more output before/after the error message you send, that might be helpful to analyze the problem.
I wish there were something more, but just that message. Just to clarify:
the error is *not* fatal. I get the error in response to my SELECT on
the client side, but the server stays up and doesn't report anything on
its console, and all subsequent queries (unless of the same form) work as
expected.
Thanks for the help,
Ed
Trace output follows:
--------------------------------------------
...
[ 16900, "done", "01:13:25.287000", 7, 11613,
"X_4227=
----- Original Message -----
Hi,
We have an application (http://tmt.gtri.gatech.edu/lytic.html) based on a relatively modest, single (~1-10M row, ~20 column) MonetDB table (Win7, 64 bit, Jan14-SP1 server release, quad-core, 8GB RAM) and have been encountering sporadic SELECT query failures with the error: "Insufficient space for HEAP of 1017856 bytes". Since I've been trying in earnest to diagnose this, that byte value has been consistent, FWIW.
The failures are frustratingly intermittent; if I rerun a failing query (by hand, using mclient) it sometimes fails again, and sometimes works correctly. I've been unable to figure out any sort of pattern.
Unfortunately, I can't post the exact table and query, but the general form of the queries (which have ~5-10 case clauses) are:
SELECT CASE WHEN Foo = 'ValueA' and Bar LIKE 'ValueB%' THEN 100*(Baz-1234)/1234 WHEN Foo = 'ValueC' and Bar LIKE 'ValueD%' THEN 100*(Baz-3456)/3456 ... END FROM table WHERE <same case statement as above> IS NOT NULL LIMIT 1
Any ideas what I might do to narrow this down or other information I might provide? I captured an explain of a failing query--it's many hundreds (1000s?) of lines, so I didn't want to include it here in full. All suggestions appreciated. Thanks,
Ed
P.S. MonetDB has been a fantastic complement to our application, thanks to everyone involved!
Also maybe one more clue--we are executing lots of SELECT queries against
these tables--even quite similar CASE statements--but the query that fails
I believe is the only one that uses a CASE statement with arithmetic (in
this case, multiplication/division) in the THEN clause. I.e., we have
plenty of arithmetic combinations of fields and constants, and plenty of
CASE statements (using both equality and LIKE comparisons), but I think
this is the only query we use that has both.
On 5/13/14, 1:13 PM, "Clarkson, Edward C."
Hi Stefan,
Thanks for the response, see below:
Could you check how large your merver5 process is at the time the error occurs?
40-60 MB (private working set), which is pretty much in line with normal usage.
At that very time, how much main memory and how much disk space (on the partition/drive where your dbfarm is located) are still free?
I misspoke below: there are only 4 GB on the machine I'm using; however, there seems to be plenty of memory (>1.2 GB) and disk (>100 GB) free when the failure happens.
Instead of the (statically generated) explain, a "live" trace of a failing run would be more useful (if at all).
A difficulty is that I'm actually triggering this issue in a closed lab (i.e., heavy restrictions on data coming out, and no network access in or out), so I can't feasibly get the full output to attach to an email or put online. I transcribed the last few lines below (hopefully with no errors...) in case that helps. I did save the full trace log in the lab, so if you want me to look for anything specific, I'm happy to do so and report back.
In reproducing this, I did notice something I hope is a clue: the number of bytes that the error consistently correlates with the table row count used in the query. Specifically:
Table size | Byte Count in error 846721 847842 1016065 1017856 1693441 1693696
Also, if there is any more output before/after the error message you send, that might be helpful to analyze the problem.
I wish there were something more, but just that message. Just to clarify: the error is *not* fatal. I get the error in response to my SELECT on the client side, but the server stays up and doesn't report anything on its console, and all subsequent queries (unless of the same form) work as expected.
Thanks for the help,
Ed
Trace output follows: -------------------------------------------- ... [ 16900, "done", "01:13:25.287000", 7, 11613, "X_4227=
[105840] := batcalc./(X3688= [105840],X4216= [105840]);", ] [ 16901, "done", "01:13:25.288000", 4, 40532, "X_2080=
[105840] := batstr.like(X_1175=<tmp2455>[105840],\"Static%\";", ] [ 16902, "done", "01:13:25.289000", 5, 39936, "X_2024=
[105840] := batstr.like(X_1175=<tmp2455>[105840],\"Static%\";", ] [ 16903, "done", "01:13:25.302000", 11, 57964, "X_2199= [105840] := batstr.like(X_1175=<tmp2455>[105840],\"OTM%\";", ] [ 16904, "done", "01:13:25.302000", 3, 2756824, "barrier X_4923 := language.dataflow();", ]
----- Original Message -----
Hi,
We have an application (http://tmt.gtri.gatech.edu/lytic.html) based on a relatively modest, single (~1-10M row, ~20 column) MonetDB table (Win7, 64 bit, Jan14-SP1 server release, quad-core, 8GB RAM) and have been encountering sporadic SELECT query failures with the error: "Insufficient space for HEAP of 1017856 bytes". Since I've been trying in earnest to diagnose this, that byte value has been consistent, FWIW.
The failures are frustratingly intermittent; if I rerun a failing query (by hand, using mclient) it sometimes fails again, and sometimes works correctly. I've been unable to figure out any sort of pattern.
Unfortunately, I can't post the exact table and query, but the general form of the queries (which have ~5-10 case clauses) are:
SELECT CASE WHEN Foo = 'ValueA' and Bar LIKE 'ValueB%' THEN 100*(Baz-1234)/1234 WHEN Foo = 'ValueC' and Bar LIKE 'ValueD%' THEN 100*(Baz-3456)/3456 ... END FROM table WHERE <same case statement as above> IS NOT NULL LIMIT 1
Any ideas what I might do to narrow this down or other information I might provide? I captured an explain of a failing query--it's many hundreds (1000s?) of lines, so I didn't want to include it here in full. All suggestions appreciated. Thanks,
Ed
P.S. MonetDB has been a fantastic complement to our application, thanks to everyone involved!
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Clarkson, Edward C.
-
Stefan Manegold