Hi,

We are using MonetDB combined with Mondrian for a while ago, and ever since we started put more load on it, with several users using the same instance, we are experiencing some serious data corruption with no reason, leading lost data of some tables and some times loosing the entire database.

This weekend we had such an issue, for no apparent reason, the data got corrupted and apparently all data of a table got lost. We have no way of seeing what was being done before the data got corrupted, but since we were not loading data into it, the only queries that could be running were queries generated from Mondrian.

In the logs, the only thing that we can find related to this issue is the following:

2013-11-04 11:17:56 MSG merovingian[22802]: database 'normal' (16384) was killed by signal SIGSEGV

which, results in a "connection terminated"

After alot of debugging, we found out that issuing a simple query as "select count(*) from table" would cause this issue:

sql>select count(*) FROM table;
Connection terminated

Under the sql_logs in the database directory, there is a file named "log" with the following contents:

052001 
 
9516

and a file named  "log.9516" which is empty.

When I do "\d table", I get the following result. It seems the table lost all its columns:

sql>\d "table"
CREATE TABLE "sys"."table" (
);
 
It's also important to note that this only happened only some tables. The rest of tables are ok. Don't know if it's important, the entire database uses about 390MB, so it's quite small.

This seems to be quite a big issue leading to data loss, and worst, corrupted databases preventing its usage, unless the entire database is rebuilt.

I have some questions regarding this issue which I never saw them answered, and might be very useful to any users of MonetDB:

In a situation like this, is there any way to understand what happened to MonetDB in order to avoid such situation in the future?

What are the "sql_logs" for? I see there are some files there, but can't find any meaning of the contents. Could these files be used to recover lost data?

In previous mails to the mailing list Martin stated that the system logs all entries added to each table. Where are these logs located? Also, could they be used to recover lost data?

If you want, we can provide the entire database so you can debug and try to understand what happened in order to prevent such situations in the future.

Best regards,
Pedro Salgueiro