[MonetDB-users] Bulk load
Hi, On 29-11-2006 09:33:27 +0100, JLP00993@correo.aeat.es wrote:
Hello:
I would try to load about 300.000.000 registers with 80 fields each one (150 GBytes) in Monet V4.12.0 database.
I think this is not going to work with current MonetDB very well.
I saw in mail lists the next bulk load method:
COPY <number> RECORDS INTO <table> FROM stdin USING DELIMITERS '\t';
I have my data in an ASCII file but my fields are not delimited by any character, they are fixed length.
Ah... as far as I know we don't have parsing for that. The solution I'd give is to convert your file in a pipe with awk or something before giving it to MapiClient.
These are my questions:
Where can I find full documentation about COPY command?.
I'm affraid nowhere at the moment.
In my case, which is the most efficient way to do the bulk load?.
The COPY INTO command is the most efficient was to insert data into MonetDB/SQL.
On 11/29/06, Fabian Groffen
On 29-11-2006 09:33:27 +0100, JLP00993@correo.aeat.es wrote:
I would try to load about 300.000.000 registers with 80 fields each one (150 GBytes) in Monet V4.12.0 database.
I think this is not going to work with current MonetDB very well.
Could you explain why this wouldn't work very well? Regards, Wouter.
On 29-11-2006 09:51:48 +0100, Wouter Scherphof wrote:
On 11/29/06, Fabian Groffen <[1]Fabian.Groffen@cwi.nl> wrote:
On 29-11-2006 09:33:27 +0100, [2]JLP00993@correo.aeat.es wrote:
I would try to load about 300.000.000 registers with 80 fields each one (150 GBytes) in Monet V4.12.0 database. I think this is not going to work with current MonetDB very well.
Could you explain why this wouldn't work very well?
150GB/80 cols ~= 1.9GB per column, assuming they are all of equal size. You need at least 2GB of memory, but from experiences this usually blows up somehow, triggering a lot of disk IO (swapping). The real case will prove how it performs, of course. We have better results with larger scaling factors on MonetDB 5.
Regards, Wouter.
Referenties
1. mailto:Fabian.Groffen@cwi.nl 2. mailto:JLP00993@correo.aeat.es
Hi Fabian,
When you mention having better results with MonetDB 5, do you mean the
latest development branch downloadable from the website? Does it go
beyond 2GB per column?
Thanks.
On Wed, 29 Nov 2006 10:14:51 +0100, "Fabian Groffen"
On 29-11-2006 09:51:48 +0100, Wouter Scherphof wrote:
On 11/29/06, Fabian Groffen <[1]Fabian.Groffen@cwi.nl> wrote:
On 29-11-2006 09:33:27 +0100, [2]JLP00993@correo.aeat.es wrote:
I would try to load about 300.000.000 registers with 80 fields each one (150 GBytes) in Monet V4.12.0 database. I think this is not going to work with current MonetDB very well.
Could you explain why this wouldn't work very well?
150GB/80 cols ~= 1.9GB per column, assuming they are all of equal size. You need at least 2GB of memory, but from experiences this usually blows up somehow, triggering a lot of disk IO (swapping). The real case will prove how it performs, of course. We have better results with larger scaling factors on MonetDB 5.
Regards, Wouter.
Referenties
1. mailto:Fabian.Groffen@cwi.nl 2. mailto:JLP00993@correo.aeat.es
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users --
moredata@fastmail.net -- http://www.fastmail.fm - Accessible with your email software or over the web
Hi, On 29-11-2006 17:50:21 -0800, moredata@fastmail.net wrote:
Hi Fabian,
When you mention having better results with MonetDB 5, do you mean the latest development branch downloadable from the website? Does it go beyond 2GB per column?
Currently the MonetDB 5 sources are not in CVS on SourceForge. We plan on doing this before the end of the year, but as usual, no guarantees. MonetDB 5 (intially) will not cross the 2GB limitation per column on 32-bits systems, but techniques for doing so are in the pipe lines. Again, no guarantees on when those are stable enough to be used for a larger audience.
Hello, Scaling up a database is best done in steps. Use a 10M example, followed by 20M, and 50M. This give some indication on what to expect going to a real life database. Taking a small sample also gives you an opportunity to test it out on different DBMS platforms. I guess that the target application is heavily dominated by simple scanning and aggregation, because the individual columns will be quite large to handle. You easily fall in the disk IO trashing pitfall. JLP00993@correo.aeat.es wrote:
Hello:
I would try to load about 300.000.000 registers with 80 fields each one (150 GBytes) in Monet V4.12.0 database.
I saw in mail lists the next bulk load method:
COPY <number> RECORDS INTO <table> FROM stdin USING DELIMITERS '\t';
I have my data in an ASCII file but my fields are not delimited by any character, they are fixed length.
These are my questions:
Where can I find full documentation about COPY command?.
In my case, which is the most efficient way to do the bulk load?.
participants (5)
-
Fabian Groffen
-
JLP00993@correo.aeat.es
-
Martin Kersten
-
moredata@fastmail.net
-
Wouter Scherphof