Hello, I will briefly recap the problem I have already reported a few weeks ago: I am using monetdb default branch (I did not compile the stable version because I need embedded python) and I have a large table (~80 GB). A simple query like SELECT * FROM large_table WHERE field=x, fails. mserver5 reports this error: mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity' failed. It seems that monet thinks it doesn’t have enough space to do a full scan of the table, maybe? (The machine has 800GB of free disk space and 30GB of RAM - Ubuntu 14.04). I was using a default branch of december 2015, I have tried to use the latest default branch (4/08/16), recompiled, installed but nothing, always the same error. So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now even a query which does not require monet to read the whole table (like SELECT * FROM large_table LIMIT 10 - This query worked fine in the default branch version) fails with ‘Program contains error’ message by client. Selecting a single column of the table works just fine. Moreover, with the stable release, the same issue arises with the second biggest table of the database (~35 GB). [Here you can find the configuration of MonetDB: http://pastie.org/private/r8gtssg944aqwnigrysa] All queries on all the other tables in the database (they are all much smaller than these two) work fine. I do not understand if the issue is in the size of table or somewhere else. How can I know what ‘error’ the message ‘program contains errors’ is referring to? NOTE: At the beginning, using the default compiled branch everything worked just fine! Every query on my large_table worked as expected. Then I had to recreate the database from scratch so I deleted the dbfarm and created a new one. That’s when this headache started… At this point I don’t know what to do, any help would be MUCH appreciated. Thanks, Stefano
Hi Stefano, did/do you re-create/re-load your database from scratch for each version of MonetDB you tested, or do you access that same database (same dbfarm) using the different versions? Please be aware that we only support database upgrades between two consecutive releases of MonetDB. All other cases are not supported and their behavior is undefined, and might thus include anything from the server refusing to start to "silent" corruption of the database. Having said that, it's generally hard for us to analyze problems, let alone solve them, without being provided detailed instructions (if necessary including data) how to reproduce them. More over, the assertion you get with the default branch has nothing to do with not enough space; it indicated that internally some administrative information is inconsistent --- why is hard, if not impossible, to understand without being able to reproduce the problem. Also, with the 'Program contains error' message of the client, their might be more detailed information in the merovingian log or on the server console? Best, Stefan ----- On Aug 4, 2016, at 10:09 PM, Stefano Fioravanzo fioravanzos@gmail.com wrote:
Hello,
I will briefly recap the problem I have already reported a few weeks ago: I am using monetdb default branch (I did not compile the stable version because I need embedded python) and I have a large table (~80 GB). A simple query like SELECT * FROM large_table WHERE field=x, fails. mserver5 reports this error: mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity' failed.
It seems that monet thinks it doesn’t have enough space to do a full scan of the table, maybe? (The machine has 800GB of free disk space and 30GB of RAM - Ubuntu 14.04). I was using a default branch of december 2015, I have tried to use the latest default branch (4/08/16), recompiled, installed but nothing, always the same error.
So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now even a query which does not require monet to read the whole table (like SELECT * FROM large_table LIMIT 10 - This query worked fine in the default branch version) fails with ‘Program contains error’ message by client. Selecting a single column of the table works just fine. Moreover, with the stable release, the same issue arises with the second biggest table of the database (~35 GB). [Here you can find the configuration of MonetDB: http://pastie.org/private/r8gtssg944aqwnigrysa]
All queries on all the other tables in the database (they are all much smaller than these two) work fine. I do not understand if the issue is in the size of table or somewhere else. How can I know what ‘error’ the message ‘program contains errors’ is referring to?
NOTE: At the beginning, using the default compiled branch everything worked just fine! Every query on my large_table worked as expected. Then I had to recreate the database from scratch so I deleted the dbfarm and created a new one. That’s when this headache started…
At this point I don’t know what to do, any help would be MUCH appreciated.
Thanks,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hello Stefan, Every time I have loaded the raw data from my csv files, so the are no issues in using different versions. Sadly the merovingian log does not give any information at all when this error occurs, and neither does the server console. Is there a way (in the stable version) to enable some verbose output, or to know in some way what actual error gave the message ‘program contains error’? To check if it is a space problem, I have created 80 GB of data using github.com/niocs/tqgen http://github.com/niocs/tqgen Command used: ./tqgen --OutFilePat /home/ubuntu/fake_data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20211131 And then to import: for i in /home/ubuntu/fake_data/*; do mclient -d testdatabase -s "copy offset 2 into fake_data from '$i' using delimiters ',' NULL AS '';"; done Well, even with this data some queries returned the same error, ‘Program contains error’. After a few more tests, after recreating the database and reimporting my data, the server console gave me this error on a query, I hove it can be more informative: !WARNING: gdk_bat.c:2083: assertion `b->batCount <= b->batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T->heap.free' failed !WARNING: gdk_bat.c:2083: assertion `b->batCount <= b->batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T->heap.free' failed Segmentation fault So there seem to be a problem with space indeed. The table used in the query that produced the error came from a csv of 80GB (reserved data which I cannot share), I have 30GB of RAM (There is nothing else running on the machine), 60GB of swap, 450GB of disk space…. Any suggestion? Stefano
On 05 Aug 2016, at 14:07, Stefan Manegold
wrote: Hi Stefano,
did/do you re-create/re-load your database from scratch for each version of MonetDB you tested, or do you access that same database (same dbfarm) using the different versions?
Please be aware that we only support database upgrades between two consecutive releases of MonetDB. All other cases are not supported and their behavior is undefined, and might thus include anything from the server refusing to start to "silent" corruption of the database.
Having said that, it's generally hard for us to analyze problems, let alone solve them, without being provided detailed instructions (if necessary including data) how to reproduce them.
More over, the assertion you get with the default branch has nothing to do with not enough space; it indicated that internally some administrative information is inconsistent --- why is hard, if not impossible, to understand without being able to reproduce the problem.
Also, with the 'Program contains error' message of the client, their might be more detailed information in the merovingian log or on the server console?
Best, Stefan
----- On Aug 4, 2016, at 10:09 PM, Stefano Fioravanzo fioravanzos@gmail.com wrote:
Hello,
I will briefly recap the problem I have already reported a few weeks ago: I am using monetdb default branch (I did not compile the stable version because I need embedded python) and I have a large table (~80 GB). A simple query like SELECT * FROM large_table WHERE field=x, fails. mserver5 reports this error: mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity' failed.
It seems that monet thinks it doesn’t have enough space to do a full scan of the table, maybe? (The machine has 800GB of free disk space and 30GB of RAM - Ubuntu 14.04). I was using a default branch of december 2015, I have tried to use the latest default branch (4/08/16), recompiled, installed but nothing, always the same error.
So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now even a query which does not require monet to read the whole table (like SELECT * FROM large_table LIMIT 10 - This query worked fine in the default branch version) fails with ‘Program contains error’ message by client. Selecting a single column of the table works just fine. Moreover, with the stable release, the same issue arises with the second biggest table of the database (~35 GB). [Here you can find the configuration of MonetDB: http://pastie.org/private/r8gtssg944aqwnigrysa]
All queries on all the other tables in the database (they are all much smaller than these two) work fine. I do not understand if the issue is in the size of table or somewhere else. How can I know what ‘error’ the message ‘program contains errors’ is referring to?
NOTE: At the beginning, using the default compiled branch everything worked just fine! Every query on my large_table worked as expected. Then I had to recreate the database from scratch so I deleted the dbfarm and created a new one. That’s when this headache started…
At this point I don’t know what to do, any help would be MUCH appreciated.
Thanks,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hello, After another trial with the queries that produced the error mentioned in the last mail, I got another error line which is: *** Error in `mserver5': corrupted double-linked list: 0x00007feb1c20aec0 *** As this issue seems to be a dead end, I’m just hoping that something could ring a bell to someone for a solution. Thanks, Stefano
On 10 Aug 2016, at 18:34, Stefano Fioravanzo
wrote: Hello Stefan,
Every time I have loaded the raw data from my csv files, so the are no issues in using different versions.
Sadly the merovingian log does not give any information at all when this error occurs, and neither does the server console. Is there a way (in the stable version) to enable some verbose output, or to know in some way what actual error gave the message ‘program contains error’?
To check if it is a space problem, I have created 80 GB of data using github.com/niocs/tqgen http://github.com/niocs/tqgen Command used: ./tqgen --OutFilePat /home/ubuntu/fake_data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20211131 And then to import: for i in /home/ubuntu/fake_data/*; do mclient -d testdatabase -s "copy offset 2 into fake_data from '$i' using delimiters ',' NULL AS '';"; done
Well, even with this data some queries returned the same error, ‘Program contains error’.
After a few more tests, after recreating the database and reimporting my data, the server console gave me this error on a query, I hove it can be more informative:
!WARNING: gdk_bat.c:2083: assertion `b->batCount <= b->batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T->heap.free' failed !WARNING: gdk_bat.c:2083: assertion `b->batCount <= b->batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T->heap.free' failed Segmentation fault
So there seem to be a problem with space indeed. The table used in the query that produced the error came from a csv of 80GB (reserved data which I cannot share), I have 30GB of RAM (There is nothing else running on the machine), 60GB of swap, 450GB of disk space…. Any suggestion?
Stefano
On 05 Aug 2016, at 14:07, Stefan Manegold
mailto:Stefan.Manegold@cwi.nl> wrote: Hi Stefano,
did/do you re-create/re-load your database from scratch for each version of MonetDB you tested, or do you access that same database (same dbfarm) using the different versions?
Please be aware that we only support database upgrades between two consecutive releases of MonetDB. All other cases are not supported and their behavior is undefined, and might thus include anything from the server refusing to start to "silent" corruption of the database.
Having said that, it's generally hard for us to analyze problems, let alone solve them, without being provided detailed instructions (if necessary including data) how to reproduce them.
More over, the assertion you get with the default branch has nothing to do with not enough space; it indicated that internally some administrative information is inconsistent --- why is hard, if not impossible, to understand without being able to reproduce the problem.
Also, with the 'Program contains error' message of the client, their might be more detailed information in the merovingian log or on the server console?
Best, Stefan
----- On Aug 4, 2016, at 10:09 PM, Stefano Fioravanzo fioravanzos@gmail.com mailto:fioravanzos@gmail.com wrote:
Hello,
I will briefly recap the problem I have already reported a few weeks ago: I am using monetdb default branch (I did not compile the stable version because I need embedded python) and I have a large table (~80 GB). A simple query like SELECT * FROM large_table WHERE field=x, fails. mserver5 reports this error: mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity' failed.
It seems that monet thinks it doesn’t have enough space to do a full scan of the table, maybe? (The machine has 800GB of free disk space and 30GB of RAM - Ubuntu 14.04). I was using a default branch of december 2015, I have tried to use the latest default branch (4/08/16), recompiled, installed but nothing, always the same error.
So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now even a query which does not require monet to read the whole table (like SELECT * FROM large_table LIMIT 10 - This query worked fine in the default branch version) fails with ‘Program contains error’ message by client. Selecting a single column of the table works just fine. Moreover, with the stable release, the same issue arises with the second biggest table of the database (~35 GB). [Here you can find the configuration of MonetDB: http://pastie.org/private/r8gtssg944aqwnigrysa] http://pastie.org/private/r8gtssg944aqwnigrysa]
All queries on all the other tables in the database (they are all much smaller than these two) work fine. I do not understand if the issue is in the size of table or somewhere else. How can I know what ‘error’ the message ‘program contains errors’ is referring to?
NOTE: At the beginning, using the default compiled branch everything worked just fine! Every query on my large_table worked as expected. Then I had to recreate the database from scratch so I deleted the dbfarm and created a new one. That’s when this headache started…
At this point I don’t know what to do, any help would be MUCH appreciated.
Thanks,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl mailto:Stefan.Manegold@cwi.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ http://www.cwi.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Have you tried running a debug version of mserver5 in gdb? That would give you the complete stack when the server throws the error/exception.
-----Original Message-----
From: Stefano Fioravanzo [fioravanzos@gmail.commailto:fioravanzos@gmail.com]
Sent: Sunday, August 14, 2016 03:46 AM Eastern Standard Time
To: Communication channel for MonetDB users
Subject: Re: Program contains errors
Hello,
After another trial with the queries that produced the error mentioned in the last mail, I got another error line which is:
*** Error in `mserver5': corrupted double-linked list: 0x00007feb1c20aec0 ***
As this issue seems to be a dead end, I’m just hoping that something could ring a bell to someone for a solution.
Thanks,
Stefano
On 10 Aug 2016, at 18:34, Stefano Fioravanzo
Hello, ————————————————————————— System: Linux Ubuntu 16.04 MonetDB Version: Tested both on stable version Jun2016 SP1 AND dev version, default branch - node id 7fda0b907c50 [19/08/16] ————————————————————————— I think I have finally found the culprit of this mess. With the CREATE TABLE statement of the problematic table, I had also three CREATE INDEX statements on three different columns. I know that monetdb handles indexes by itself and that the CREATE INDEX statement is just used as a suggestion to monet, but I left them anyway, thinking that it wouldn’t do any harm. Well, it seems that the indexed columns where broken someway. In fact all the queries that broke the server with the errors shown in my previous mails, involved those columns. Moreover, a query like select count(distinct indexed_field) from big_table, would return the number of rows of the table, not the actual number of distinct values of the field. After removing the CREATE INDEX statements and recreating the table, everything went back to normal. I would like to know if this is a know issue or if this problem may actually be caused by the CREATE INDEX statement. In fact, I did not create any index in any other table of the db (aside UNIQUE statements), so this is the only difference with the other table, which have always worked fine. Any insight on this would be much appreciated, I would regret finding similar errors in due future due to bad index management by monet (even if it is not forced). Thanks, Stefano
On 14 Aug 2016, at 09:44, Stefano Fioravanzo
wrote: Hello,
After another trial with the queries that produced the error mentioned in the last mail, I got another error line which is: *** Error in `mserver5': corrupted double-linked list: 0x00007feb1c20aec0 ***
As this issue seems to be a dead end, I’m just hoping that something could ring a bell to someone for a solution.
Thanks, Stefano
On 10 Aug 2016, at 18:34, Stefano Fioravanzo
mailto:fioravanzos@gmail.com> wrote: Hello Stefan,
Every time I have loaded the raw data from my csv files, so the are no issues in using different versions.
Sadly the merovingian log does not give any information at all when this error occurs, and neither does the server console. Is there a way (in the stable version) to enable some verbose output, or to know in some way what actual error gave the message ‘program contains error’?
To check if it is a space problem, I have created 80 GB of data using github.com/niocs/tqgen http://github.com/niocs/tqgen Command used: ./tqgen --OutFilePat /home/ubuntu/fake_data/tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20211131 And then to import: for i in /home/ubuntu/fake_data/*; do mclient -d testdatabase -s "copy offset 2 into fake_data from '$i' using delimiters ',' NULL AS '';"; done
Well, even with this data some queries returned the same error, ‘Program contains error’.
After a few more tests, after recreating the database and reimporting my data, the server console gave me this error on a query, I hove it can be more informative:
!WARNING: gdk_bat.c:2083: assertion `b->batCount <= b->batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T->heap.free' failed !WARNING: gdk_bat.c:2083: assertion `b->batCount <= b->batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T->heap.free' failed Segmentation fault
So there seem to be a problem with space indeed. The table used in the query that produced the error came from a csv of 80GB (reserved data which I cannot share), I have 30GB of RAM (There is nothing else running on the machine), 60GB of swap, 450GB of disk space…. Any suggestion?
Stefano
On 05 Aug 2016, at 14:07, Stefan Manegold
mailto:Stefan.Manegold@cwi.nl> wrote: Hi Stefano,
did/do you re-create/re-load your database from scratch for each version of MonetDB you tested, or do you access that same database (same dbfarm) using the different versions?
Please be aware that we only support database upgrades between two consecutive releases of MonetDB. All other cases are not supported and their behavior is undefined, and might thus include anything from the server refusing to start to "silent" corruption of the database.
Having said that, it's generally hard for us to analyze problems, let alone solve them, without being provided detailed instructions (if necessary including data) how to reproduce them.
More over, the assertion you get with the default branch has nothing to do with not enough space; it indicated that internally some administrative information is inconsistent --- why is hard, if not impossible, to understand without being able to reproduce the problem.
Also, with the 'Program contains error' message of the client, their might be more detailed information in the merovingian log or on the server console?
Best, Stefan
----- On Aug 4, 2016, at 10:09 PM, Stefano Fioravanzo fioravanzos@gmail.com mailto:fioravanzos@gmail.com wrote:
Hello,
I will briefly recap the problem I have already reported a few weeks ago: I am using monetdb default branch (I did not compile the stable version because I need embedded python) and I have a large table (~80 GB). A simple query like SELECT * FROM large_table WHERE field=x, fails. mserver5 reports this error: mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity' failed.
It seems that monet thinks it doesn’t have enough space to do a full scan of the table, maybe? (The machine has 800GB of free disk space and 30GB of RAM - Ubuntu 14.04). I was using a default branch of december 2015, I have tried to use the latest default branch (4/08/16), recompiled, installed but nothing, always the same error.
So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now even a query which does not require monet to read the whole table (like SELECT * FROM large_table LIMIT 10 - This query worked fine in the default branch version) fails with ‘Program contains error’ message by client. Selecting a single column of the table works just fine. Moreover, with the stable release, the same issue arises with the second biggest table of the database (~35 GB). [Here you can find the configuration of MonetDB: http://pastie.org/private/r8gtssg944aqwnigrysa] http://pastie.org/private/r8gtssg944aqwnigrysa]
All queries on all the other tables in the database (they are all much smaller than these two) work fine. I do not understand if the issue is in the size of table or somewhere else. How can I know what ‘error’ the message ‘program contains errors’ is referring to?
NOTE: At the beginning, using the default compiled branch everything worked just fine! Every query on my large_table worked as expected. Then I had to recreate the database from scratch so I deleted the dbfarm and created a new one. That’s when this headache started…
At this point I don’t know what to do, any help would be MUCH appreciated.
Thanks,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl mailto:Stefan.Manegold@cwi.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ http://www.cwi.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list
Stefano Thank you for reporting these details. This will help fixing the problem. Could you report your problem on bugzilla (see http://bugs.monetdb.org/) ? For fixing this bug, we need at least the full (including create index) ddl statements. We could generate data using the tqgen. Niels On Fri, Aug 19, 2016 at 02:03:53PM +0200, Stefano Fioravanzo wrote:
Hello,
————————————————————————— System: Linux Ubuntu 16.04 MonetDB Version: Tested both on stable version Jun2016 SP1 AND dev version, default branch - node id 7fda0b907c50 [19/08/16] —————————————————————————
I think I have finally found the culprit of this mess. With the CREATE TABLE statement of the problematic table, I had also three CREATE INDEX statements on three different columns. I know that monetdb handles indexes by itself and that the CREATE INDEX statement is just used as a suggestion to monet, but I left them anyway, thinking that it wouldn’t do any harm.
Well, it seems that the indexed columns where broken someway. In fact all the queries that broke the server with the errors shown in my previous mails, involved those columns. Moreover, a query like select count(distinct indexed_field) from big_table, would return the number of rows of the table, not the actual number of distinct values of the field.
After removing the CREATE INDEX statements and recreating the table, everything went back to normal.
I would like to know if this is a know issue or if this problem may actually be caused by the CREATE INDEX statement. In fact, I did not create any index in any other table of the db (aside UNIQUE statements), so this is the only difference with the other table, which have always worked fine.
Any insight on this would be much appreciated, I would regret finding similar errors in due future due to bad index management by monet (even if it is not forced).
Thanks, Stefano
On 14 Aug 2016, at 09:44, Stefano Fioravanzo
wrote: Hello,
After another trial with the queries that produced the error mentioned in the last mail, I got another error line which is: *** Error in `mserver5': corrupted double-linked list: 0x00007feb1c20aec0 ***
As this issue seems to be a dead end, I’m just hoping that something could ring a bell to someone for a solution.
Thanks, Stefano
On 10 Aug 2016, at 18:34, Stefano Fioravanzo < fioravanzos@gmail.com> wrote:
Hello Stefan,
Every time I have loaded the raw data from my csv files, so the are no issues in using different versions.
Sadly the merovingian log does not give any information at all when this error occurs, and neither does the server console. Is there a way (in the stable version) to enable some verbose output, or to know in some way what actual error gave the message ‘program contains error’?
To check if it is a space problem, I have created 80 GB of data using github.com/niocs/tqgen Command used: ./tqgen --OutFilePat /home/ubuntu/fake_data/ tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20211131 And then to import: for i in /home/ubuntu/fake_data/*; do mclient -d testdatabase -s "copy offset 2 into fake_data from '$i' using delimiters ',' NULL AS '';"; done
Well, even with this data some queries returned the same error, ‘Program contains error’.
After a few more tests, after recreating the database and reimporting my data, the server console gave me this error on a query, I hove it can be more informative:
!WARNING: gdk_bat.c:2083: assertion `b->batCount <= b-> batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T-> heap.free' failed !WARNING: gdk_bat.c:2083: assertion `b->batCount <= b-> batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T-> heap.free' failed Segmentation fault
So there seem to be a problem with space indeed. The table used in the query that produced the error came from a csv of 80GB (reserved data which I cannot share), I have 30GB of RAM (There is nothing else running on the machine), 60GB of swap, 450GB of disk space…. Any suggestion?
Stefano
On 05 Aug 2016, at 14:07, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:
Hi Stefano,
did/do you re-create/re-load your database from scratch for each version of MonetDB you tested, or do you access that same database (same dbfarm) using the different versions?
Please be aware that we only support database upgrades between two consecutive releases of MonetDB. All other cases are not supported and their behavior is undefined, and might thus include anything from the server refusing to start to "silent" corruption of the database.
Having said that, it's generally hard for us to analyze problems, let alone solve them, without being provided detailed instructions (if necessary including data) how to reproduce them.
More over, the assertion you get with the default branch has nothing to do with not enough space; it indicated that internally some administrative information is inconsistent --- why is hard, if not impossible, to understand without being able to reproduce the problem.
Also, with the 'Program contains error' message of the client, their might be more detailed information in the merovingian log or on the server console?
Best, Stefan
----- On Aug 4, 2016, at 10:09 PM, Stefano Fioravanzo fioravanzos@gmail.com wrote:
Hello,
I will briefly recap the problem I have already reported a few weeks ago: I am using monetdb default branch (I did not compile the stable version because I need embedded python) and I have a large table (~80 GB). A simple query like SELECT * FROM large_table WHERE field=x, fails. mserver5 reports this error: mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity' failed.
It seems that monet thinks it doesn’t have enough space to do a full scan of the table, maybe? (The machine has 800GB of free disk space and 30GB of RAM - Ubuntu 14.04). I was using a default branch of december 2015, I have tried to use the latest default branch (4/08/16), recompiled, installed but nothing, always the same error.
So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now even a query which does not require monet to read the whole table (like SELECT * FROM large_table LIMIT 10 - This query worked fine in the default branch version) fails with ‘Program contains error’ message by client. Selecting a single column of the table works just fine. Moreover, with the stable release, the same issue arises with the second biggest table of the database (~35 GB). [Here you can find the configuration of MonetDB: http://pastie.org/private/r8gtssg944aqwnigrysa]
All queries on all the other tables in the database (they are all much smaller than these two) work fine. I do not understand if the issue is in the size of table or somewhere else. How can I know what ‘error’ the message ‘program contains errors’ is referring to?
NOTE: At the beginning, using the default compiled branch everything worked just fine! Every query on my large_table worked as expected. Then I had to recreate the database from scratch so I deleted the dbfarm and created a new one. That’s when this headache started…
At this point I don’t know what to do, any help would be MUCH appreciated.
Thanks,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl
I will file a report as soon as possible. I have tries to generate some data using tqgen, but I was not able to reproduce the issue. Maybe one of the factors is the size of the column, I will try again with more data and see if something happens. Note: the fields I used were INTEGER, BIGINT and SMALLINT. Maybe the data type could be another variable factor for the issue. Stefano
On 19 Aug 2016, at 14:32, Niels Nes
wrote: Stefano
Thank you for reporting these details. This will help fixing the problem.
Could you report your problem on bugzilla (see http://bugs.monetdb.org/) ? For fixing this bug, we need at least the full (including create index) ddl statements. We could generate data using the tqgen.
Niels
On Fri, Aug 19, 2016 at 02:03:53PM +0200, Stefano Fioravanzo wrote:
Hello,
————————————————————————— System: Linux Ubuntu 16.04 MonetDB Version: Tested both on stable version Jun2016 SP1 AND dev version, default branch - node id 7fda0b907c50 [19/08/16] —————————————————————————
I think I have finally found the culprit of this mess. With the CREATE TABLE statement of the problematic table, I had also three CREATE INDEX statements on three different columns. I know that monetdb handles indexes by itself and that the CREATE INDEX statement is just used as a suggestion to monet, but I left them anyway, thinking that it wouldn’t do any harm.
Well, it seems that the indexed columns where broken someway. In fact all the queries that broke the server with the errors shown in my previous mails, involved those columns. Moreover, a query like select count(distinct indexed_field) from big_table, would return the number of rows of the table, not the actual number of distinct values of the field.
After removing the CREATE INDEX statements and recreating the table, everything went back to normal.
I would like to know if this is a know issue or if this problem may actually be caused by the CREATE INDEX statement. In fact, I did not create any index in any other table of the db (aside UNIQUE statements), so this is the only difference with the other table, which have always worked fine.
Any insight on this would be much appreciated, I would regret finding similar errors in due future due to bad index management by monet (even if it is not forced).
Thanks, Stefano
On 14 Aug 2016, at 09:44, Stefano Fioravanzo
wrote:
Hello,
After another trial with the queries that produced the error mentioned in the last mail, I got another error line which is: *** Error in `mserver5': corrupted double-linked list: 0x00007feb1c20aec0 ***
As this issue seems to be a dead end, I’m just hoping that something could ring a bell to someone for a solution.
Thanks, Stefano
On 10 Aug 2016, at 18:34, Stefano Fioravanzo < fioravanzos@gmail.com> wrote:
Hello Stefan,
Every time I have loaded the raw data from my csv files, so the are no issues in using different versions.
Sadly the merovingian log does not give any information at all when this error occurs, and neither does the server console. Is there a way (in the stable version) to enable some verbose output, or to know in some way what actual error gave the message ‘program contains error’?
To check if it is a space problem, I have created 80 GB of data using github.com/niocs/tqgen Command used: ./tqgen --OutFilePat /home/ubuntu/fake_data/ tq.YYYYMMDD.csv --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20211131 And then to import: for i in /home/ubuntu/fake_data/*; do mclient -d testdatabase -s "copy offset 2 into fake_data from '$i' using delimiters ',' NULL AS '';"; done
Well, even with this data some queries returned the same error, ‘Program contains error’.
After a few more tests, after recreating the database and reimporting my data, the server console gave me this error on a query, I hove it can be more informative:
!WARNING: gdk_bat.c:2083: assertion `b->batCount <= b-> batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T-> heap.free' failed !WARNING: gdk_bat.c:2083: assertion `b->batCount <= b-> batCapacity' failed !WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T-> heap.free' failed Segmentation fault
So there seem to be a problem with space indeed. The table used in the query that produced the error came from a csv of 80GB (reserved data which I cannot share), I have 30GB of RAM (There is nothing else running on the machine), 60GB of swap, 450GB of disk space…. Any suggestion?
Stefano
On 05 Aug 2016, at 14:07, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:
Hi Stefano,
did/do you re-create/re-load your database from scratch for each version of MonetDB you tested, or do you access that same database (same dbfarm) using the different versions?
Please be aware that we only support database upgrades between two consecutive releases of MonetDB. All other cases are not supported and their behavior is undefined, and might thus include anything from the server refusing to start to "silent" corruption of the database.
Having said that, it's generally hard for us to analyze problems, let alone solve them, without being provided detailed instructions (if necessary including data) how to reproduce them.
More over, the assertion you get with the default branch has nothing to do with not enough space; it indicated that internally some administrative information is inconsistent --- why is hard, if not impossible, to understand without being able to reproduce the problem.
Also, with the 'Program contains error' message of the client, their might be more detailed information in the merovingian log or on the server console?
Best, Stefan
----- On Aug 4, 2016, at 10:09 PM, Stefano Fioravanzo fioravanzos@gmail.com wrote:
Hello,
I will briefly recap the problem I have already reported a few weeks ago: I am using monetdb default branch (I did not compile the stable version because I need embedded python) and I have a large table (~80 GB). A simple query like SELECT * FROM large_table WHERE field=x, fails. mserver5 reports this error: mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity' failed.
It seems that monet thinks it doesn’t have enough space to do a full scan of the table, maybe? (The machine has 800GB of free disk space and 30GB of RAM - Ubuntu 14.04). I was using a default branch of december 2015, I have tried to use the latest default branch (4/08/16), recompiled, installed but nothing, always the same error.
So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now even a query which does not require monet to read the whole table (like SELECT * FROM large_table LIMIT 10 - This query worked fine in the default branch version) fails with ‘Program contains error’ message by client. Selecting a single column of the table works just fine. Moreover, with the stable release, the same issue arises with the second biggest table of the database (~35 GB). [Here you can find the configuration of MonetDB: http://pastie.org/private/r8gtssg944aqwnigrysa]
All queries on all the other tables in the database (they are all much smaller than these two) work fine. I do not understand if the issue is in the size of table or somewhere else. How can I know what ‘error’ the message ‘program contains errors’ is referring to?
NOTE: At the beginning, using the default compiled branch everything worked just fine! Every query on my large_table worked as expected. Then I had to recreate the database from scratch so I deleted the dbfarm and created a new one. That’s when this headache started…
At this point I don’t know what to do, any help would be MUCH appreciated.
Thanks,
Stefano
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (4)
-
Anderson, David B
-
Niels Nes
-
Stefan Manegold
-
Stefano Fioravanzo