[MonetDB-users] COPY INTO memory issues
Hi, i had a look at monetdb a while (a year or so) ago and stumbled upon a few crashes/memory issues back then. Apart from that it made a very promising impression to me. Today i thought i could give it a try again to see if these issues have been fixed and found out that copy into still seems to have memory issues and i wonder if i am the only one who hits such issues. I am experimeting with dummy data created by a script. Data is created the following way: 20 smallints: Values 1-10. 100 tinyints: Values 0-1 20 ints: Values 1-10000 40 varchars: md5 of some random value 20 floats: rand(1000000)/1000000 I created a CSV file with ~2.1 million rows and then tried to insert the data. My first attempt was: sql>copy into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as ''; MAPI = monetdb@localhost:50000 ACTION= read_line QUERY = copy into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as ''; ERROR = Connection terminated This was of course the quick and dirty way. What happened in the background was that monetdb consumed all memory. My system is a debian lenny 64bit. Server has 4GB and decent CPUs. Why does it need all my memory here? I then did the same but loaded only parts of the file (1000000 rows). sql>copy 1000000 records into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as '' ; This worked but the server used ~2-3GB of my RAM during the process and what's even more important it used the same amount even though the copy process finished. Why does it use the RAM even though it is not used anymore? Then i tried to insert 1000000 rows more and my expectation was that monetdb would reuse the memory it allocated (and didn't free) in the first loading process. This wasn't the fact. Again monetdb consumed all my memory and when loading finished it indicated that it was still doing something (CPU @ ~20%). When i tried a: sql>SELECT COUNT(*) FROM md_data; it hang completely. Same when trying to establish a new connection. I stopped the server and restarted again. Everything was fine again. I inserted the rest of the data (memory raised again but there were only 100000 rows left so there wasn't any problem this time). Am i the only one hitting such issues? One could argue that i should import less records at once but even then. There seems to be some nasty memleak in the background. Or is this a configuration issue? Or am i doing something completely wrong? I am using the .debs: http://monetdb.cwi.nl/downloads/Debian/ Thanks in advance, Andreas Streichardt
Andreas Streichardt wrote:
Hi, Hi Andreas,
Thank you for taking the time to experiment and more importantly to report back. There has been a lot of work on the loader, but its importance for users to get started has our full attention. An external evaluation of MonetDB using extensive copying is reported in: http://www.mysqlperformanceblog.com/2009/09/29/quick-comparison-of-myisam-in... and http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-perform... and upto date information on the same experiment in http://www.cwi.nl/~mk/ontimeReport
i had a look at monetdb a while (a year or so) ago and stumbled upon a few crashes/memory issues back then. Apart from that it made a very promising impression to me.
Today i thought i could give it a try again to see if these issues have been fixed and found out that copy into still seems to have memory issues and i wonder if i am the only one who hits such issues.
First and important question is: Which version exactly did you use? mserver5 --version
I am experimeting with dummy data created by a script.
Data is created the following way:
20 smallints: Values 1-10. 100 tinyints: Values 0-1 20 ints: Values 1-10000 40 varchars: md5 of some random value 20 floats: rand(1000000)/1000000
I created a CSV file with ~2.1 million rows and then tried to insert the data.
My first attempt was:
sql>copy into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as ''; MAPI = monetdb@localhost:50000 ACTION= read_line QUERY = copy into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as ''; ERROR = Connection terminated
This was of course the quick and dirty way. What happened in the background was that monetdb consumed all memory. My system is a debian lenny 64bit. Server has 4GB and decent CPUs. Why does it need all my memory here? Quote from the manual:" It is strongly adviced to announce the number of records to be inserted. It avoids guessing by the server and subsequent re-allocation of
did you kill the connection or the server? if the server crashed, then the input file would be interesting to look at for use. table space, which involves potential copying." Furthermore, MonetDB uses whatever memory there is. The tables constructed are memory mapped files.
I then did the same but loaded only parts of the file (1000000 rows).
sql>copy 1000000 records into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as '' ;
Same issue. It will have to reallocate space, which may involve copying. (at least on the released versions)
This worked but the server used ~2-3GB of my RAM during the process and what's even more important it used the same amount even though the copy process finished. Why does it use the RAM even though it is not used anymore?
Tables are retained in memory until not needed and flushed. Persistent tables therefore likely stay around
Then i tried to insert 1000000 rows more and my expectation was that monetdb would reuse the memory it allocated (and didn't free) in the first loading process.
This wasn't the fact. Again monetdb consumed all my memory and when loading finished it indicated that it was still doing something (CPU @ ~20%).
log records have to be written, your OS may kick in to flush dirty pages, reloading swapped out processes, ....
When i tried a:
sql>SELECT COUNT(*) FROM md_data;
it hang completely. Same when trying to establish a new connection.
this is not what i would expect. Did you accidently ran the query as another parallel user? or the same session as the copy command?
I stopped the server and restarted again. Everything was fine again. I
good :)
inserted the rest of the data (memory raised again but there were only 100000 rows left so there wasn't any problem this time).
Am i the only one hitting such issues? One could argue that i should import less records at once but even then. There seems to be some nasty memleak in Memory leaks has been addressed in the Aug release, we are not aware of serious ones and would need to re-do the precise experiment to confirm it. What you see is memory fragmentation due to re-allocation of the BATs in memory.
the background. Or is this a configuration issue? Or am i doing something completely wrong?
I think your expectations are not yet aligned with what MonetDB does differently from other systems.
I am using the .debs: http://monetdb.cwi.nl/downloads/Debian/
Thanks in advance,
thanks to you for sharing this information, Martin Kersten
Andreas Streichardt
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Wednesday 25 November 2009 19:29:02 Martin Kersten wrote:
Andreas Streichardt wrote:
Hi,
Hi Andreas,
Hi Martin, thanks for your reply.
Thank you for taking the time to experiment and more importantly to report back. There has been a lot of work on the loader, but its importance for users to get started has our full attention. An external evaluation of MonetDB using extensive copying is reported in: http://www.mysqlperformanceblog.com/2009/09/29/quick-comparison-of-myisam-i nfobright-and-monetdb/ and http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-perfor mance-with-infobright-and-monetdb/ and upto date information on the same experiment in http://www.cwi.nl/~mk/ontimeReport
One of the very few blogs i read regulary ;)
i had a look at monetdb a while (a year or so) ago and stumbled upon a few crashes/memory issues back then. Apart from that it made a very promising impression to me.
Today i thought i could give it a try again to see if these issues have been fixed and found out that copy into still seems to have memory issues and i wonder if i am the only one who hits such issues.
First and important question is: Which version exactly did you use? mserver5 --version
MonetDB server v5.14.4 (64-bit), based on kernel v1.32.4 (64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2009 MonetDB B.V., all rights reserved Visit http://monetdb.cwi.nl/ for further information Configured for prefix: /usr Libraries: libpcre: 7.6 2008-01-28 (compiled with 7.6) openssl: OpenSSL 0.9.8g 19 Oct 2007 (compiled with OpenSSL 0.9.8g 19 Oct 2007) libxml2: 2.6.32 (compiled with 2.6.32) Compiled by: root@debian64 (x86_64-pc-linux-gnu) Compilation: cc -O2 -Wall -O2 -O6 -fomit-frame-pointer -finline-functions -falign-loops=4 -falign-jumps=4 -falign-functions=4 - fexpensive-optimizations -funroll-loops -frerun-cse-after- loop -frerun-loop-opt -ftree-vectorize Linking : ld -IPA -m elf_x86_64 (I am simply using the precompiled debs)
I am experimeting with dummy data created by a script.
Data is created the following way:
20 smallints: Values 1-10. 100 tinyints: Values 0-1 20 ints: Values 1-10000 40 varchars: md5 of some random value 20 floats: rand(1000000)/1000000
I created a CSV file with ~2.1 million rows and then tried to insert the data.
My first attempt was:
sql>copy into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as ''; MAPI = monetdb@localhost:50000 ACTION= read_line QUERY = copy into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as ''; ERROR = Connection terminated
did you kill the connection or the server? if the server crashed, then the input file would be interesting to look at for use.
Server crashed when there wasn't any mem left (only have 1 Gig Swap).
This was of course the quick and dirty way. What happened in the background was that monetdb consumed all memory. My system is a debian lenny 64bit. Server has 4GB and decent CPUs. Why does it need all my memory here?
Quote from the manual:" It is strongly adviced to announce the number of records to be inserted. It avoids guessing by the server and subsequent re-allocation of table space, which involves potential copying."
Furthermore, MonetDB uses whatever memory there is. The tables constructed are memory mapped files.
Does that mean that the maximum size of a table is limited by the available RAM? More precisely: I can't work on a 100 million rows*100 columns table if i only have 1GB Ram?
I then did the same but loaded only parts of the file (1000000 rows).
sql>copy 1000000 records into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as '' ;
Same issue. It will have to reallocate space, which may involve copying. (at least on the released versions)
Hmm...Isn't the "1000000" records stuff the hint about the number of records you mentioned earlier? If not how do i announce the number of records? Couldn't find anything else here: http://monetdb.cwi.nl/projects/monetdb/SQL/Documentation/Data- Manipulation.html
This worked but the server used ~2-3GB of my RAM during the process and what's even more important it used the same amount even though the copy process finished. Why does it use the RAM even though it is not used anymore?
Tables are retained in memory until not needed and flushed. Persistent tables therefore likely stay around
I still wonder how and if i can control how much memory monetdb is allowed to use. When does monetdb decide that a table is not used anymore? My biggest concern currently is that monetdb simply crashes due to memory outages (which i already hit during loading ;) ). Are there any settings to limit the memory usage or does monetdb require to have everything in RAM? Does it mean i need a 8GB RAM Server if i want to work on a 2,1million rows*200 columns table?
When i tried a:
sql>SELECT COUNT(*) FROM md_data;
it hang completely. Same when trying to establish a new connection.
this is not what i would expect. Did you accidently ran the query as another parallel user? or the same session as the copy command?
I think (!) it was the same session. But a new connection didn't work either.
I stopped the server and restarted again. Everything was fine again. I
good :)
Memory leaks has been addressed in the Aug release, we are not aware of serious ones and would need to re-do the precise experiment to confirm it. What you see is memory fragmentation due to re-allocation of the BATs in memory.
My problem is: How do i fix that? I am currently inserting in 10k steps and i am hitting the memory problems again. What should i do to insert my data? Stopping and starting the database for each step? :S Interestingly i am seeing a big gap in loading speed somewhere in between: 880000 [ 10000 ] Timer 4916.922 msec 890000 [ 10000 ] Timer 4477.197 msec 900000 [ 10000 ] Timer 4535.896 msec 910000 [ 10000 ] Timer 5708.848 msec 920000 [ 10000 ] Timer 19675.302 msec 930000 [ 10000 ] Timer 28901.041 msec 940000 [ 10000 ] Timer 28013.104 msec 950000 [ 10000 ] Timer 28235.442 msec 960000 [ 10000 ] Timer 28741.978 msec 970000 [ 10000 ] Timer 29581.496 msec The number indicates at which offset i am currently (~2.1 million rows in total). It is not yet swapping.
I think your expectations are not yet aligned with what MonetDB does differently from other systems.
Hmm...What does it do differently? ;) I simply wonder how to insert 2.1 million rows currently :S (data layout see above). I could also post my quick and dirty generation script here if that helps. Let me know if you need any more details! Thanks, Andreas Streichardt
Andreas Streichardt wrote:
On Wednesday 25 November 2009 19:29:02 Martin Kersten wrote:
Andreas Streichardt wrote:
Hi, Hi Andreas,
Hi Martin,
thanks for your reply.
My first attempt was:
sql>copy into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as ''; MAPI = monetdb@localhost:50000 ACTION= read_line QUERY = copy into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as ''; ERROR = Connection terminated
did you kill the connection or the server? if the server crashed, then the input file would be interesting to look at for use.
Server crashed when there wasn't any mem left (only have 1 Gig Swap).
This was of course the quick and dirty way. What happened in the background was that monetdb consumed all memory. My system is a debian lenny 64bit. Server has 4GB and decent CPUs. Why does it need all my memory here? Quote from the manual:" It is strongly adviced to announce the number of records to be inserted. It avoids guessing by the server and subsequent re-allocation of table space, which involves potential copying."
Furthermore, MonetDB uses whatever memory there is. The tables constructed are memory mapped files.
Does that mean that the maximum size of a table is limited by the available RAM? More precisely: I can't work on a 100 million rows*100 columns table if i only have 1GB Ram? This is a recurring question from people with different database experiences, and certainly needs a short chapter in the document.
- No, the databases are *not* limited to the amount of physical memory (RAM) (the largest one running in our lab is 4TB on a single PC) - Yes, the temporary tables may be allocated in swap space, which calls for a larger setting then most - Yes, you can work with huge database - Yes, ideally the hot set you are working with fits in memory (RAM+swap)
I then did the same but loaded only parts of the file (1000000 rows).
sql>copy 1000000 records into md_data FROM '/home/mop/performance/hans.csv' delimiters ',','\n' null as '' ; Same issue. It will have to reallocate space, which may involve copying. (at least on the released versions)
Hmm...Isn't the "1000000" records stuff the hint about the number of records you mentioned earlier? If not how do i announce the number of records? Couldn't find anything else here:
The issue is mostly gone in the next release, which uses the memory mapped files for all non-trival sizes.
http://monetdb.cwi.nl/projects/monetdb/SQL/Documentation/Data- Manipulation.html
This worked but the server used ~2-3GB of my RAM during the process and what's even more important it used the same amount even though the copy process finished. Why does it use the RAM even though it is not used anymore? Tables are retained in memory until not needed and flushed. Persistent tables therefore likely stay around
I still wonder how and if i can control how much memory monetdb is allowed to
you can not.
use. When does monetdb decide that a table is not used anymore? My biggest concern currently is that monetdb simply crashes due to memory outages (which i already hit during loading ;) ). Are there any settings to limit the memory usage or does monetdb require to have everything in RAM? Does it mean i need a 8GB RAM Server if i want to work on a 2,1million rows*200 columns table? MonetDB will be happily sharing the 8 GB with the other processes. It may mean that your OS kicks in at some point if it feels there is an uneven balance of resource requests.
Running a database server on a box in isolation is not uncommon for real-life applications. For those with hard memory constraints: we are experimenting with virtual engines to assess the impact and control the resources.
When i tried a:
sql>SELECT COUNT(*) FROM md_data;
it hang completely. Same when trying to establish a new connection. this is not what i would expect. Did you accidently ran the query as another parallel user? or the same session as the copy command?
I think (!) it was the same session. But a new connection didn't work either.
I think it is related to the low swap space situation. And a hard one to debug. Perhaps the next release (Nov2009) is already better, because we spent quite some effort to harden the code further using the Coverity tools
I stopped the server and restarted again. Everything was fine again. I good :)
Memory leaks has been addressed in the Aug release, we are not aware of serious ones and would need to re-do the precise experiment to confirm it. What you see is memory fragmentation due to re-allocation of the BATs in memory.
My problem is: How do i fix that?
you can not.
I am currently inserting in 10k steps and i am hitting the memory problems again. What should i do to insert my data? Stopping and starting the database for each step? :S Enlarge the OS swap space is the road to take.
Interestingly i am seeing a big gap in loading speed somewhere in between:
880000 [ 10000 ] Timer 4916.922 msec 890000 [ 10000 ] Timer 4477.197 msec 900000 [ 10000 ] Timer 4535.896 msec 910000 [ 10000 ] Timer 5708.848 msec 920000 [ 10000 ] Timer 19675.302 msec 930000 [ 10000 ] Timer 28901.041 msec 940000 [ 10000 ] Timer 28013.104 msec 950000 [ 10000 ] Timer 28235.442 msec 960000 [ 10000 ] Timer 28741.978 msec 970000 [ 10000 ] Timer 29581.496 msec
The number indicates at which offset i am currently (~2.1 million rows in total). It is not yet swapping. indeed after the memory is filled, it starts moving dirty pages out.
I think your expectations are not yet aligned with what MonetDB does differently from other systems.
Hmm...What does it do differently? ;) I simply wonder how to insert 2.1 The architecture of MonetDB is completely different from the row-store you are familiar with. It is documented in tens of papers, a glimps of the differences is summarised in the recent paper mentioned in thehome page "Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct " million rows currently :S (data layout see above). I could also post my quick and dirty generation script here if that helps.
Let me know if you need any more details!
Thanks,
Andreas Streichardt
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
Andreas Streichardt
-
Martin Kersten