Re: [MonetDB-users] copy into table

22 Jan 2009

      A related feature  for bulk import/export that would indeed make sense AND should be easy to implement is: exchange with flat binary files. This requires 
- the file to be set up with the correct byte-order
- the file to contain only data types that have a fixed-width binary representation (like FLOAT, SINGLE, INTEGER, SMALLINT etc) but not variable widths (like VARCHAR)

Such file would neither require record_seperator nor tuple_seperator. The only challenge is how to represent NULL, and for this you could turn to the definitions of NA (Not Available atomic value) used in the popular statistical interpreter R (www.r-project.org) for its native data types (double, integer) and for other data types you could use the NA conventions implemented in R's flat file package 'ff' (single, signed and unsigned integers, smallints, bytes, etc, see documentation for 'vmode'). Even dates could be represented unambiguously  using the double representation of the POSIXct class in R.

For the SQL interface you could simply define binary files by setting tuple_seperator and 
record_seperator to '' the empty string. If you use a trick and allow COPY INTO not only to accept a table_name but also a view_name - where the view represents a subset of the columns of one table - such an interface could even be used to populate a table that has columns which are not in the binary file, say the table has a varchar column or, exchange data column by column with systems that cannot work with mixed-type binary files (such as ff which currently can only handle homogeneous atomic files).

Beside the obvious benefits - for example exporting large data to R for analysis purposes - such a file interface could also be used to create artifical data in R with certain statistical features for datawarehouse test loads (say exploring MonetDBs 'indexing' performance for distributions with certain entropy etc).

Greetings

Jens Oehlschlägel
(one of the ff authors)

Jens Oehlschlägel

tags

participants (1)