[MonetDB-users] Fast Loading

Hi, I came across chapter 3.2.8 in the documentation regarding 'Fast Loading'. I am trying to use it in order to improve the performance I get compared to loading tables using 'COPY INTO'. However, I am not sure how to approach this: "Each file is produced by a program that writes the binary image of the BAT directly, i.e. a binary dump of an C-array" How exactly can I produce these binary images from my source columns? How exactly are they expected by MonetDB to look like? A little more info could be very helpful. Thanks. -- View this message in context: http://www.nabble.com/Fast-Loading-tp23009932p23009932.html Sent from the monetdb-users mailing list archive at Nabble.com.

Alex Bo. wrote:
Hi,
I came across chapter 3.2.8 in the documentation regarding 'Fast Loading'. I am trying to use it in order to improve the performance I get compared to loading tables using 'COPY INTO'. However, I am not sure how to approach this: "Each file is produced by a program that writes the binary image of the BAT directly, i.e. a binary dump of an C-array" This documentation went online a little too early ;) We will report on the approach more extensively and with supportive tools after we finished the Skyserver production version.
How exactly can I produce these binary images from my source columns? How exactly are they expected by MonetDB to look like? A little more info could be very helpful.
Thanks.

Alex Bo. wrote:
"Each file is produced by a program that writes the binary image of the BAT directly, i.e. a binary dump of an C-array"
How exactly can I produce these binary images from my source columns?
The expact part can probably be reviewed in gdk_bat.mx Pseudo: array = mmap(file, size(type)*rows) for 1..n array[i] = newvalue As you can see in gdk_bat.mx there is also an offset to be taken into account for a descriptor with some meta info.
How exactly are they expected by MonetDB to look like? A little more info could be very helpful.
This works for integer type columns. If your tables contain strings, due to the storage of strings in MonetDB and best effort deduplication, the operation of writing 'your own' generator is not trivial. I have heard talks about alternative and faster ways of loading. But I wonder how much faster it would be in compared to COPY INTO. Stefan

On Sun, Apr 12, 2009 at 1:06 PM, Stefan de Konink
Alex Bo. wrote:
"Each file is produced by a program that writes the binary image of the BAT directly, i.e. a binary dump of an C-array"
How exactly can I produce these binary images from my source columns?
The expact part can probably be reviewed in gdk_bat.mx
Pseudo: array = mmap(file, size(type)*rows) for 1..n array[i] = newvalue
As you can see in gdk_bat.mx there is also an offset to be taken into account for a descriptor with some meta info.
How exactly are they expected by MonetDB to look like? A little more info could be very helpful.
This works for integer type columns. If your tables contain strings, due to the storage of strings in MonetDB and best effort deduplication, the operation of writing 'your own' generator is not trivial.
I have heard talks about alternative and faster ways of loading. But I wonder how much faster it would be in compared to COPY INTO.
0 cost vs. linear to the size of the file :) What this chapter talks about is the "attach" functionality, where if you are going to pay the cost of producing a dump then you may as well produced it in such a way that you will avoid the cost of COPY INTO. But this is in a very early stage and it still needs advanced knowledge on Monet's physical organization. lefteris
Stefan
------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Stefan de Konink wrote:
Alex Bo. wrote:
"Each file is produced by a program that writes the binary image of the BAT directly, i.e. a binary dump of an C-array"
How exactly can I produce these binary images from my source columns?
The expact part can probably be reviewed in gdk_bat.mx
Pseudo: array = mmap(file, size(type)*rows) for 1..n array[i] = newvalue
As you can see in gdk_bat.mx there is also an offset to be taken into account for a descriptor with some meta info.
No. No offset.
How exactly are they expected by MonetDB to look like? A little more info could be very helpful.
This works for integer type columns. If your tables contain strings, due to the storage of strings in MonetDB and best effort deduplication, the operation of writing 'your own' generator is not trivial.
I have heard talks about alternative and faster ways of loading. But I wonder how much faster it would be in compared to COPY INTO.
Stefan
------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Sjoerd Mullender
participants (5)
-
Alex Bo.
-
Lefteris
-
Martin Kersten
-
Sjoerd Mullender
-
Stefan de Konink