THEAP corruption after DB restart
Hi, My DB contains around 1000 tables consisting of 100K-15M records each. They all have the same schema, but I’m not using a single table because locality is required for the individual sets. After restarting MonetDB, I’m experiencing data corruption in some of the tables (around 10-20): - when selecting a single row from those tables (LIMIT 1), the mserver process crashes - when calling the storage() function, the mserver process crashes - the storage() function can only be used again after dropping the tables in question (by trial and error) More information: - The restart was performed as part of an EC2 instance stop/start, This should send a SIGTERM signal to monetdbd. - The DB farm is stored on EBS. Snapshots were made well before the restart, but the same tables are affected. - The corrupted tables were used before successfully and the content has been unchanged since. - No client connections were active at restart time. mserver info:
2015-08-13 15:57:42 MSG DB[10633]: arguments: /usr/bin/mserver5 --dbpath=/opt/db-farm/DB --set merovingian_uri=mapi:monetdb://ip-HOST:50000/DB --set mapi_open=false --set mapi_port=0 --set mapi_usock=/opt/db-farm/DB/.mapi.sock --set monet_vault_key=/opt/db-farm/DB/.vaultkey --set gdk_nr_threads=4 --set max_clients=64 --set sql_optimizer=default_pipe --set monet_daemon=yes 2015-08-13 15:57:42 MSG DB[10633]: # MonetDB 5 server v11.19.15 "Oct2014-SP4" 2015-08-13 15:57:42 MSG DB[10633]: # Serving database ‘DB', using 4 threads 2015-08-13 15:57:42 MSG DB[10633]: # Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked 2015-08-13 15:57:42 MSG DB[10633]: # Found 15.672 GiB available main-memory.
This error occurs a couple of times in the server logs, but only when first starting monetdbd:
2015-08-13 14:33:15 MSG: !ERROR: GDKload: cannot read: name=15/31/153162, ext=theap, 57144 bytes missing. 2015-08-13 14:33:15 MSG: !OS: No such file or directory 2015-08-13 14:33:15 MSG: !ERROR: GDKload: cannot read: name=15/25/152564, ext=theap, 58536 bytes missing.
2015-08-13 14:33:15 MSG: !OS: No such file or directory
And then each time an affected table is accessed:
2015-08-13 16:02:19 MSG merovingian[1041]: database ‘DB' (10640) was killed by signal SIGSEGV
I’m trying to reproduce the issue, but it would be helpful if anyone has any pointers. My hunch would be that the table catalog and the theap files are not persisted in the same way, so when snapshotting or restarting, one of the two is out of sync. Could it be that some dynamic column index is built, but then only persisted in mapped memory? The comments at the top of the gdk/gdk_storage.c file seem to imply that this is not the case, but I’m not sure. If so, how would one ensure that all changes are properly flushed to disk? Would fsfreeze help? What if an unexpected loss of power occurs? Thanks, - Dennis
Hi all, After some more digging into this issue, this is what I found: All the damaged tables have exactly one column that fails to load: one of two possible varchar cols, depending on the table. Since the contents of these column were inserted two weeks before the DB restart, and then never updated again, I’m not sure why this inconsistency has occurred. The data from that table/column was accessed before in a union query without issues. The segfault in mserver5 (11.19.15) occurs in this stack trace, when doing a from(table).select(varchar_col).limit(1):
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f9912924700 (LWP 12436)] 0x00007f991492f303 in delta_bind_bat (bat=0x16c64670, access=0, temp=0) at bat_storage.c:169 169 bat_set_access(b, BAT_READ); (gdb) bt #0 0x00007f991492f303 in delta_bind_bat (bat=0x16c64670, access=0, temp=0) at bat_storage.c:169 #1 0x00007f991492f423 in bind_col (tr=0x7f99040008d0, c=0x7f9904b47590, access=0) at bat_storage.c:186 #2 0x00007f991485eafb in SQLgetStatistics (cntxt=0x27fe830, m=0x7f990400c780, mb=0x7f9904e6ecc0) at sql_optimizer.c:161 #3 0x00007f991485f0b0 in addQueryToCache (c=0x27fe830) at sql_optimizer.c:279 […]
The BAT (ID 45416) is not loaded; the attempt at dereferencing the resulting pointer results in a segfault:
(gdb) p b $1 = (BAT *) 0x0 (gdb) p bat->bid $4 = 45416 (gdb) p temp_descriptor(bat->bid) $5 = (BAT *) 0x0 (gdb) p BATload_intern(45416, null) $20 = (BAT *) 0x0
This is a theap file, but the size doesn’t match the catalog size: * BACKUP/BBP.dir: (there is no BBP.dir in the main bat directory, is this normal?)
45416 32 tmp_130550 tmpr_130550 13/05/130550 331649785 2 143371 0 0 143371 143371 0 0 0 0 void 0 1 1793 0 0 0 0 0 17158459 0 0 0 str 4 3 1024 0 0 0 0 0 20806881 573484 589824 1 161354 163840 0
BAT load statement:
#HEAPload(13/05/130550.theap,storage=0,free=161354,size=163840)
* File system:
File: ‘130550.theap’ Size: 71160 Blocks: 144 IO Block: 4096 regular file Device: ca71h/51825d Inode: 664265 Links: 1 Access: (0600/-rw-------) Uid: ( 106/ monetdb) Gid: ( 111/ monetdb)
So it seems like either that BAT was truncated, or the catalog size is too large. I recreated the table, which works fine now. The new BBP/filesys entries look like this:
40288 32 tmp_116540 tmpr_116540 11/65/116540 445073369 2 143371 0 0 143371 147456 0 0 0 0 void 0 1 1793 0 0 0 0 0 84785446 0 0 0 str 4 3 1024 0 0 0 0 0 84857543 573484 589824 1 71160 81920 0
File: ‘116540.theap’ Size: 71160 Blocks: 144 IO Block: 4096 regular file Device: ca01h/51713d Inode: 657140 Links: 1 Access: (0600/-rw-------) Uid: ( 106/ monetdb) Gid: ( 111/ monetdb)
The BAT theap file has the same size, so it looks like it was fine all along. (the .tail sizes also match between old and new) But this time, the BBP entry “map_hheap” attribute is exactly the same as the theap filesize (2nd last entry in the BBP dir is 71160). Could it be that the catalog has an incorrect free/size value in the corrupted version? Any idea what might cause this? Thanks, Dennis
On 13 Aug 2015, at 19:38, Dennis Lorson
wrote: Hi,
My DB contains around 1000 tables consisting of 100K-15M records each. They all have the same schema, but I’m not using a single table because locality is required for the individual sets.
After restarting MonetDB, I’m experiencing data corruption in some of the tables (around 10-20): - when selecting a single row from those tables (LIMIT 1), the mserver process crashes - when calling the storage() function, the mserver process crashes - the storage() function can only be used again after dropping the tables in question (by trial and error)
More information: - The restart was performed as part of an EC2 instance stop/start, This should send a SIGTERM signal to monetdbd. - The DB farm is stored on EBS. Snapshots were made well before the restart, but the same tables are affected. - The corrupted tables were used before successfully and the content has been unchanged since. - No client connections were active at restart time.
mserver info:
2015-08-13 15:57:42 MSG DB[10633]: arguments: /usr/bin/mserver5 --dbpath=/opt/db-farm/DB --set merovingian_uri=mapi:monetdb://ip-HOST:50000/DB --set mapi_open=false --set mapi_port=0 --set mapi_usock=/opt/db-farm/DB/.mapi.sock --set monet_vault_key=/opt/db-farm/DB/.vaultkey --set gdk_nr_threads=4 --set max_clients=64 --set sql_optimizer=default_pipe --set monet_daemon=yes 2015-08-13 15:57:42 MSG DB[10633]: # MonetDB 5 server v11.19.15 "Oct2014-SP4" 2015-08-13 15:57:42 MSG DB[10633]: # Serving database ‘DB', using 4 threads 2015-08-13 15:57:42 MSG DB[10633]: # Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked 2015-08-13 15:57:42 MSG DB[10633]: # Found 15.672 GiB available main-memory.
This error occurs a couple of times in the server logs, but only when first starting monetdbd:
2015-08-13 14:33:15 MSG: !ERROR: GDKload: cannot read: name=15/31/153162, ext=theap, 57144 bytes missing. 2015-08-13 14:33:15 MSG: !OS: No such file or directory 2015-08-13 14:33:15 MSG: !ERROR: GDKload: cannot read: name=15/25/152564, ext=theap, 58536 bytes missing.
2015-08-13 14:33:15 MSG: !OS: No such file or directory
And then each time an affected table is accessed:
2015-08-13 16:02:19 MSG merovingian[1041]: database ‘DB' (10640) was killed by signal SIGSEGV
I’m trying to reproduce the issue, but it would be helpful if anyone has any pointers.
My hunch would be that the table catalog and the theap files are not persisted in the same way, so when snapshotting or restarting, one of the two is out of sync. Could it be that some dynamic column index is built, but then only persisted in mapped memory? The comments at the top of the gdk/gdk_storage.c file seem to imply that this is not the case, but I’m not sure.
If so, how would one ensure that all changes are properly flushed to disk? Would fsfreeze help? What if an unexpected loss of power occurs?
Thanks, - Dennis
participants (1)
-
Dennis Lorson