[MonetDB-users] crashes with copy into (270MB file)

Hello
I Experience crashes with the copy into statement, with large files (270MB utf8) ...
sql>COPY 6000000 records into wptool.wpstats from '/var/www/pagecounts' using delimiters ' ','\n','' null as '';
MAPI = testuser@localhost:50000
ACTION= read_line
QUERY = COPY 6000000 records into wptool.wpstats from '/var/www/pagecounts' using delimiters ' ','\n','' null as '';
ERROR = !Connection terminated
Timer 5846.447 msec......
Number of records: wc -l /var/www/pagecounts
5983201 /var/www/pagecounts
My memory: free -m(egabytes)
total used free shared buffers cached
Mem: 2007 431 1576 0 20 304
-/+ buffers/cache: 106 1900
Swap: 16769 0 16769
Error-logs:
2010-12-13 14:17:41 MSG merovingian[902]: proxying client localhost:52737 for database 'test' to mapi:monetdb:///var/MonetDB5/dbf arm/test/.mapi.sock?database=test
2010-12-13 14:17:41 MSG merovingian[902]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2010-12-13 14:18:30 MSG merovingian[902]: database 'test' (1142) was killed by signal SIGSEGV
2010-12-13 14:18:38 ERR control[902]: (local): received stop signal for non running database: test
2010-12-13 14:18:46 MSG merovingian[902]: database 'test' has crashed after start on 2010-12-13 14:17:33, attempting restart, up min/avg/max: 14s/22h/2d, crash average: 1.00 0.70 0.23 (10-3=7)
Some specs:
uname -a
Linux ubuntu 2.6.32-24-server #43-Ubuntu SMP Thu Sep 16 16:05:42 UTC 2010 x86_64 GNU/Linux
2010-12-13 14:17:33 MSG test[1142]: # Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked
2010-12-13 14:17:33 MSG test[1142]: # Found 1.961 GiB available main-memory.
If anyone has some pointers on this , it would really help out. I think it would be really to chop up the files into smaller pieces, just for import.
Regards
Lex Slaghuis
-----Original Message-----
From: monetdb-users-request@lists.sourceforge.net [mailto:monetdb-users-request@lists.sourceforge.net]
Sent: vrijdag 10 december 2010 11:12
To: monetdb-users@lists.sourceforge.net
Subject: MonetDB-users Digest, Vol 55, Issue 2
Send MonetDB-users mailing list submissions to
monetdb-users@lists.sourceforge.net
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/monetdb-users
or, via email, send a message with subject or body 'help' to
monetdb-users-request@lists.sourceforge.net
You can reach the person managing the list at
monetdb-users-owner@lists.sourceforge.net
When replying, please edit your Subject line so it is more specific than "Re: Contents of MonetDB-users digest..."
Today's Topics:
1. COPY INTO: wrong length with backslashed chars?
(Daniel Boesswetter)
2. Re: COPY INTO: wrong length with backslashed chars?
(Stefan de Konink)
3. Some questions regarding parallelism and other stuff...
(Mike De-La-Columnar)
4. Re: Some questions regarding parallelism and other stuff...
(Martin Kersten)
5. Add primary key performance (paulr135)
6. Re: Add primary key performance (Stefan Manegold)
7. Re: COPY INTO: wrong length with backslashed chars?
(Daniel Boesswetter)
----------------------------------------------------------------------
Message: 1
Date: Sun, 05 Dec 2010 19:08:01 +0100
From: Daniel Boesswetter
Hi all,
I'm using MonetDB for the analysis for ~90GB of data which come from a MySQL-based web-application. The data is dumped into CSV-files on a regular basis and then imported into MonetDB with the COPY command. I have experienced some strange behaviour: obviously, MonetDB counts the number of chars for a VARCHAR-field before removing backslashes which are used e.g. for escaping newlines or other backslashes and rejects strings that should actually fit.
A simple reproduction is this:
1. In mclient: CREATE TABLE t (x VARCHAR(1)); 2. Create a textfile foo.txt containing two backslashes: \\ 3. In mclient: sql>copy into t from '/tmp/foo.txt'; SQLException:importTable:value '\\' from line 1 field 1 not inserted, expecting type varchar(1) failed to import table
Performing exactly the same steps with a VARCHAR(2) column works without errors and results in t.x containg the de-escaped single backslash.
I'm using the Oct-2010 release. My current workaround is to enlarge VARCHARs which caused problems in the past, but this is of course no guarantee that it won't happen again.
Although I found a few monetdb-bugs entries concerning the COPY command (e.g. about the misleading error message seen above), none seems to deal with my problem.
Is this a bug?
What are you escaping? Paste the line that fails ;)
Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREKAAYFAkz71sgACgkQYH1+F2Rqwn3AmgCfayhLg6zjXZgvqhV9OHJbj5Kh
z64An2xZJy7H1wv3M6hr/SHQaK2C8IM3
=//V/
-----END PGP SIGNATURE-----
------------------------------
Message: 3
Date: Mon, 6 Dec 2010 05:07:46 -0800 (PST)
From: Mike De-La-Columnar
Hi fellas,
Hi Mike
If its not too much of a hassle, i have a few questions regarding parallelism& the internal algorithms
1. What is the difference between Data-Flow, and Mitosis?
See optimizer descriptions: http://monetdb.cwi.nl/MonetDB/Documentation/Optimizer-Toolkit.html Orthogonal issues.
2. Is there a special reason why the distinct count aggregation is not parallelized just like the other aggregation functions? Is it just a matter of not making it into the release, or is there a fundamental problem here.
Distinct is a global property, not a local one. A distributed version may easily lead to throwing the complete underlying table around.
Note: even though the parallel distinct can be slower in certain distributions, it could still be useful...
3. Are there any plans on supporting parallel frameworks like OpenMP/Intel TBB directly within the low-level MAL functions? Providing data-parallelism on the loop-code level could even increase performance.
Or did you already play with it and decided that the performance gain is just not worth it? Depends on the amount of work per byte unit.
4. Is there a way for me to prevent MonetDB server from cleaning up all intermediate BATs between server restarts. I've read here that some intermediates (like join indices etc.) are lost between restarts... is it still correct? All temporaries are indeed lost. Join indices for foreign key checking are retained on disk. Rebuilding hashes is cheaper (mostly) then write/read
It could improve, but there should be a clear driving application. them from disk
Thank you very much for your time, Keep up the amazing job of building the most powerful columnar database out there!
Thanks for the support !. Martin
------------------------------
Message: 5
Date: Tue, 7 Dec 2010 07:44:14 -0800 (PST)
From: paulr135
I've loaded a 27GB (~165M rows) file into a table in MonetDB and now I am trying to add a primary key which consists of 4 fields - 2 varchars and 2 ints. It has been running for a few days (since Friday of last week).
I've followed the instructions laid out on this post: http://old.nabble.com/MonetDB-SQL:-bulk-loading-large-data-sets---adding-key... which involves setting the optimizer variable to "nov2009_pipe".
Using the nov2009_pipe optimizer pipeline for adding constraint only holds for the Feb2010 release of MonetDB. It should not be required anymore for later releases as of Jun2010-SP1, incl. the latest Oct2010 release. Which version of MonetDB are you using? Which OS are you running on? What kind of manchine/hardware are you running on, in particular how much memory does it have? Can you check what the machine is doing? Is it running out of physical memory and swapping, i.e., is the virtual size of the mserver5 (much) larger than the machine's physical memory size?
I am unable to log in via mclient - I am able to put in my credentials but it just hangs at the "Welcome to mclient, the MonetDB/SQL interactive terminal (Jun2010-SP2)" line.
So I have a few questions: 1. Is there any way I can see the status of the alter table ... add primary key command?
You can use MonetDB's stethoscope tool to "listen" what the server is doing; however, while it is busy with a single (MAL-)operation, there will not be any output; hence, you might have to wait for some time to see some output. Unless very familiar with MonetDB internals, the output of stethoscope might not be too informative for you. But you're welcome to send us the output to have a look at it. In addition, you can check what the mserver5 is doing as suggested above.
2. Is it possible to stop the alter table gracefully? Or will I have to shut the db down?
No, unfortunately there is no way to gracefully stop a running query. You'd have to way until it finishes (successfully or with an error), or shut down the DB.
3. Is there anything I should do in future to speed up adding of primary keys?
In priciple not. We are aware that with data sizes (i.e., total size of all columns that the primary keys consists of) that significantly exceed main memory, adding the key (i.e., checking whether the data fulfills the key condition) does not work as efficiently as it could/should, in particular with multi-column keys --- we're working on that ... You could, though, run the ALTER TABLE statement in trace mode, i.e., `TRACE ALTER TABLE ...`, collect the output and send it to us for analysis; that could help us to improve the creation of large multi-column primary keys. Regards, Stefan
Thanks. -- View this message in context: http://old.nabble.com/Add-primary-key-performance-tp30397515p30397515.html Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
--
| Stefan.Manegold @ CWI.nl | DB Architectures (INS1) |
| http://CWI.nl/~manegold/ | Science Park 123 (L321) |
| Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------
Message: 7
Date: Fri, 10 Dec 2010 11:11:34 +0100
From: Daniel Boesswetter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Op 05-12-10 19:08, Daniel Boesswetter schreef:
Hi all,
I'm using MonetDB for the analysis for ~90GB of data which come from a MySQL-based web-application. The data is dumped into CSV-files on a regular basis and then imported into MonetDB with the COPY command. I have experienced some strange behaviour: obviously, MonetDB counts the number of chars for a VARCHAR-field before removing backslashes which are used e.g. for escaping newlines or other backslashes and rejects strings that should actually fit.
A simple reproduction is this:
1. In mclient: CREATE TABLE t (x VARCHAR(1)); 2. Create a textfile foo.txt containing two backslashes: \\ 3. In mclient: sql>copy into t from '/tmp/foo.txt'; SQLException:importTable:value '\\' from line 1 field 1 not inserted, expecting type varchar(1) failed to import table
Performing exactly the same steps with a VARCHAR(2) column works without errors and results in t.x containg the de-escaped single backslash.
I'm using the Oct-2010 release. My current workaround is to enlarge VARCHARs which caused problems in the past, but this is of course no guarantee that it won't happen again.
Although I found a few monetdb-bugs entries concerning the COPY command (e.g. about the misleading error message seen above), none seems to deal with my problem.
Is this a bug?
What are you escaping? Paste the line that fails ;)
As I said: try to import a single "\" into a VARCHAR(1) with COPY. Writing "\" does not work because it escapes the newline. Writing "\\" does not work because it is longer than 1 char. Regards, Daniel
Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREKAAYFAkz71sgACgkQYH1+F2Rqwn3AmgCfayhLg6zjXZgvqhV9OHJbj5Kh z64An2xZJy7H1wv3M6hr/SHQaK2C8IM3 =//V/ -----END PGP SIGNATURE-----
------------------------------ ------------------------------------------------------------------------------ ------------------------------ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users End of MonetDB-users Digest, Vol 55, Issue 2 ********************************************
participants (1)
-
Lex Slaghuis