Hi, I've tried to load a good chunk of the Internet census into our MonetDB - about 100M rows. Although the import scripts completed, it seems it shot down Monet: 2013-08-28 04:10:35 MSG merovingian[9857]: database 'census' (10773) was killed by signal SIGBUS 2013-08-28 10:47:59 MSG merovingian[9857]: database 'census' has crashed after start on 2013-08-28 01:14:07, attempting restart, up min/avg/max: 4m/4d/3w, crash average: 1.00 0.10 0.03 (8-7=1) This is then followed by 2013-08-28 10:48:32 MSG census[19161]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (0% done) [...] 2013-08-28 10:53:12 MSG census[19161]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (4% done) 2013-08-28 10:53:34 MSG merovingian[9857]: database 'census' (19161) has crashed (dumped core) Restarting it, yielded the same results for a few times. Once or twice it got to 35% in the WAL 2013-08-28 15:34:27 MSG census[14420]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (35% done) 2013-08-28 15:34:59 MSG merovingian[9857]: database 'census' (14420) has crashed (dumped core) 2013-08-28 15:34:59 ERR merovingian[9857]: client error: database 'census' has crashed after starting, manual intervention needed, check monetdbd's logfile for details 2013-08-28 15:34:59 ERR merovingian[9857]: client error: client (local) sent challenge in incomplete block: 2013-08-28 15:34:59 ERR merovingian[9857]: client error: client (local) sent challenge in incomplete block: 2013-08-28 15:34:59 MSG merovingian[9857]: database 'census' has crashed after start on 2013-08-28 14:43:34, attempting restart, up min/avg/max: 4m/4d/3w, crash average: 1.00 0.90 0.30 (16-7=9) Since then, it won't start (I don't know what caused the client error BTW). Data loss is not a concern, but we'd sure like to know if this can be fixed or if we should start looking at another DB... Thanks for any help, Ralph -- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF
Hi Please provide some background information, such as - what operating system are you using - what version of MonetDB - if the input file is public, a reference to it, and the schema used. regards, Martin On 8/28/13 8:35 PM, Ralph Holz wrote:
Hi,
I've tried to load a good chunk of the Internet census into our MonetDB - about 100M rows. Although the import scripts completed, it seems it shot down Monet:
2013-08-28 04:10:35 MSG merovingian[9857]: database 'census' (10773) was killed by signal SIGBUS 2013-08-28 10:47:59 MSG merovingian[9857]: database 'census' has crashed after start on 2013-08-28 01:14:07, attempting restart, up min/avg/max: 4m/4d/3w, crash average: 1.00 0.10 0.03 (8-7=1)
This is then followed by
2013-08-28 10:48:32 MSG census[19161]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (0% done) [...] 2013-08-28 10:53:12 MSG census[19161]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (4% done) 2013-08-28 10:53:34 MSG merovingian[9857]: database 'census' (19161) has crashed (dumped core)
Restarting it, yielded the same results for a few times. Once or twice it got to 35% in the WAL
2013-08-28 15:34:27 MSG census[14420]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (35% done) 2013-08-28 15:34:59 MSG merovingian[9857]: database 'census' (14420) has crashed (dumped core) 2013-08-28 15:34:59 ERR merovingian[9857]: client error: database 'census' has crashed after starting, manual intervention needed, check monetdbd's logfile for details 2013-08-28 15:34:59 ERR merovingian[9857]: client error: client (local) sent challenge in incomplete block: 2013-08-28 15:34:59 ERR merovingian[9857]: client error: client (local) sent challenge in incomplete block: 2013-08-28 15:34:59 MSG merovingian[9857]: database 'census' has crashed after start on 2013-08-28 14:43:34, attempting restart, up min/avg/max: 4m/4d/3w, crash average: 1.00 0.90 0.30 (16-7=9)
Since then, it won't start (I don't know what caused the client error BTW).
Data loss is not a concern, but we'd sure like to know if this can be fixed or if we should start looking at another DB...
Thanks for any help, Ralph
Hi Martin, Thanks for getting back to me! Sorry I missed the information. Here goes: OS is a Debian Wheeze, stable, here's the uname: 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2+deb7u2 x86_64 GNU/Linux The machine has 24 cores and 140GB RAM. There is space left on the partition, although I am suspicious if that was also true when it crashed the first time. Server: MonetDB 5 server v11.15.11 "Feb2013-SP3" (64-bit, 64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2013 MonetDB B.V., all rights reserved Visit http://www.monetdb.org/ for further information Found 142.0GiB available memory, 24 available cpu cores Libraries: libpcre: 8.30 2012-02-04 (compiled with 8.02) openssl: OpenSSL 0.9.8o 01 Jun 2010 (compiled with OpenSSL 0.9.8o 01 Jun 2010) libxml2: 2.7.8 (compiled with 2.7.8) Compiled by: root@dev.monetdb.org (x86_64-pc-linux-gnu) Compilation: gcc -O3 -fomit-frame-pointer -pipe -Wp,-D_FORTIFY_SOURCE=2 Linking : /usr/bin/ld -m elf_x86_64 I can tell you the schema is (from memory): id int ip_addr inet ip_str varchar ip_start int ip_end int ptr varchar sld varchar tld varchar country varchar city varchar I have made a very short input file available on http://pkidata.net.in.tum.de/export.csv.tgz (33M) BTW, there was a typo in my earlier mail: we have 133 billion, not million, rows. Ralph On 08/28/2013 09:26 PM, Martin Kersten wrote:
Hi
Please provide some background information, such as - what operating system are you using - what version of MonetDB - if the input file is public, a reference to it, and the schema used.
regards, Martin
On 8/28/13 8:35 PM, Ralph Holz wrote:
Hi,
I've tried to load a good chunk of the Internet census into our MonetDB - about 100M rows. Although the import scripts completed, it seems it shot down Monet:
2013-08-28 04:10:35 MSG merovingian[9857]: database 'census' (10773) was killed by signal SIGBUS 2013-08-28 10:47:59 MSG merovingian[9857]: database 'census' has crashed after start on 2013-08-28 01:14:07, attempting restart, up min/avg/max: 4m/4d/3w, crash average: 1.00 0.10 0.03 (8-7=1)
This is then followed by
2013-08-28 10:48:32 MSG census[19161]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (0% done) [...] 2013-08-28 10:53:12 MSG census[19161]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (4% done) 2013-08-28 10:53:34 MSG merovingian[9857]: database 'census' (19161) has crashed (dumped core)
Restarting it, yielded the same results for a few times. Once or twice it got to 35% in the WAL
2013-08-28 15:34:27 MSG census[14420]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (35% done) 2013-08-28 15:34:59 MSG merovingian[9857]: database 'census' (14420) has crashed (dumped core) 2013-08-28 15:34:59 ERR merovingian[9857]: client error: database 'census' has crashed after starting, manual intervention needed, check monetdbd's logfile for details 2013-08-28 15:34:59 ERR merovingian[9857]: client error: client (local) sent challenge in incomplete block: 2013-08-28 15:34:59 ERR merovingian[9857]: client error: client (local) sent challenge in incomplete block: 2013-08-28 15:34:59 MSG merovingian[9857]: database 'census' has crashed after start on 2013-08-28 14:43:34, attempting restart, up min/avg/max: 4m/4d/3w, crash average: 1.00 0.90 0.30 (16-7=9)
Since then, it won't start (I don't know what caused the client error BTW).
Data loss is not a concern, but we'd sure like to know if this can be fixed or if we should start looking at another DB...
Thanks for any help, Ralph
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF
Hi,
I have made a very short input file available on
The delimiter is #. Looking at that file again, I must have missed some fields in the schema in my last mail. I think you can ignore those, they're mostly empty. Ralph -- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF
Hi Thanks, this little file contains 300.000 records. So your input file must be something like 433 GB I assume that you loaded the file with LOCKED mode, then at least you have a chance without the temporary copies needed. With 9 fields of 8 bytes each, your storage footprint becomes easily 8 * 133 GB = 1.2TB, excluding the string value dictionaries. Strings may use less then 8 bytes each when there are a lot of duplicates. To be on the safe size, I would aim for free disk-space of about 3-4TB You might consider having a look at http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/storage-model to get a better estimate after loading your small sample. On 8/28/13 10:48 PM, Ralph Holz wrote:
Hi,
I have made a very short input file available on
http://pkidata.net.in.tum.de/export.csv.tgz (33M) The delimiter is #.
Looking at that file again, I must have missed some fields in the schema in my last mail. I think you can ignore those, they're mostly empty. COPY into does not ignore them, they have to be specified. There is not (yet) a partial COPY into provided.
I have no clue what will happen if your scheme does not match the file, at least a good test. regards, Martin
Ralph
Hi,
Thanks, this little file contains 300.000 records. So your input file must be something like 433 GB I assume that you loaded the file with LOCKED mode, then at least you have a chance without the temporary copies needed.
I sheepishly admit I did not. Apologies, I am new to MonetDB and my background is networks rather than DBs (thus my psql experience). ;-/
With 9 fields of 8 bytes each, your storage footprint becomes easily 8 * 133 GB = 1.2TB, excluding the string value dictionaries. Strings may use less then 8 bytes each when there are a lot of duplicates.
There's a very large amount of duplicates, and also omissions. The footprint I am seeing is maybe on the order of 600-700GB.
To be on the safe size, I would aim for free disk-space of about 3-4TB
OK. This means we'll need to sample our data down to maybe 30% before loading it.
You might consider having a look at http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/storage-model to get a better estimate after loading your small sample.
On it now. Thanks a lot. However, as for getting the database to run again, should I delete the log files as suggested here? Ralph -- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF
Hello Ralph,
If data loss is not a concern try to delete sql logs (
/mnt/monetdb/census/sql_logs/sql/log.990 etc). You will loose some data but
the database will hopefully start up.
Before loading again check available disk space. In our case it is the most
usual cause of crashes.
Good luck,
Radovan
2013/8/28 Ralph Holz
Hi,
I've tried to load a good chunk of the Internet census into our MonetDB - about 100M rows. Although the import scripts completed, it seems it shot down Monet:
2013-08-28 04:10:35 MSG merovingian[9857]: database 'census' (10773) was killed by signal SIGBUS 2013-08-28 10:47:59 MSG merovingian[9857]: database 'census' has crashed after start on 2013-08-28 01:14:07, attempting restart, up min/avg/max: 4m/4d/3w, crash average: 1.00 0.10 0.03 (8-7=1)
This is then followed by
2013-08-28 10:48:32 MSG census[19161]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (0% done) [...] 2013-08-28 10:53:12 MSG census[19161]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (4% done) 2013-08-28 10:53:34 MSG merovingian[9857]: database 'census' (19161) has crashed (dumped core)
Restarting it, yielded the same results for a few times. Once or twice it got to 35% in the WAL
2013-08-28 15:34:27 MSG census[14420]: # still reading write-ahead log "/mnt/monetdb/census/sql_logs/sql/log.990" (35% done) 2013-08-28 15:34:59 MSG merovingian[9857]: database 'census' (14420) has crashed (dumped core) 2013-08-28 15:34:59 ERR merovingian[9857]: client error: database 'census' has crashed after starting, manual intervention needed, check monetdbd's logfile for details 2013-08-28 15:34:59 ERR merovingian[9857]: client error: client (local) sent challenge in incomplete block: 2013-08-28 15:34:59 ERR merovingian[9857]: client error: client (local) sent challenge in incomplete block: 2013-08-28 15:34:59 MSG merovingian[9857]: database 'census' has crashed after start on 2013-08-28 14:43:34, attempting restart, up min/avg/max: 4m/4d/3w, crash average: 1.00 0.90 0.30 (16-7=9)
Since then, it won't start (I don't know what caused the client error BTW).
Data loss is not a concern, but we'd sure like to know if this can be fixed or if we should start looking at another DB...
Thanks for any help, Ralph
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- __________________________ Radovan Bičiště ceos data s.r.o. Studentská 6202/17 708 00 Ostrava - Poruba Czech Republic mobil CZ: +420 601 563 014 skype: rbiciste
Hi, I tried that and the DB is up again! Thanks. We're now loading again at a slower pace and in locked mode. Ralph On 08/28/2013 09:45 PM, Radovan Bičiště wrote:
Hello Ralph, If data loss is not a concern try to delete sql logs (/mnt/monetdb/census/sql_logs/sql/log.990 etc). You will loose some data but the database will hopefully start up. Before loading again check available disk space. In our case it is the most usual cause of crashes. Good luck, Radovan
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF
participants (3)
-
Martin Kersten
-
Radovan Bičiště
-
Ralph Holz