Hello fellow MonetDB'ers, The Pentaho MonetDB Bulk Loader transformation step is now working pretty well in our tests. http://ci.pentaho.com/view/Data%20Integration/job/Kettle contains a continuous integration build of the software if anyone wanted to play with it. New Transformation -> Steps -> Bulk Loading (category)-> MonetDB Bulk Loader Thank you for the previous information about enabling log output from the MonetDB JDBC driver. After killing a stalled "monetdb start ~/sg-farm" thread and restarting it, MonetDB decided to work really well. It was suggested early on that I take a look at the binary bulk loading approach to get the fastest results. Now that I have a streaming ascii version of a bulk loader done for Pentaho Data Integration, I wanted to come back to the binary approach and see what that might look like and what to be aware of. I read http://www.monetdb.org/Documentation/Cookbooks/SQLrecipies/BinaryBulkLoad It looks like the requirements are: 1. Write a physical file per column to the disk the monetdb farm exists on. 2. Tables with DATES, TIMESTAMPS or UTF-8 characters ie: Simplified Chinese are verboten and must be handled with COPY INTO .... USING DELIMITERS Assuming the above is a correct understanding, given the command below: copy into Tmp from ('path_to_file_i', 'path_to_file_f', 'path_to_file_s'); 1. If run from a computer separate from the machine running the monetdb server, would the 'path_to_file_i' be a pathname relative to the monetdb farm root directory on the remote server machine? 2. This process should work for appending new data on to an existing table, right? 3. Am I out of luck if my table has any of the verboten data types mentioned above mixed in with supported binary bulkloadable types; aka C-array style data types? 4. If not out of luck for a mixed type table, what is the procedure to load the table? In practice I imagine I will not need to fret too much about DATES needing a COPY INTO .... USING DELIMITERS ascii streaming approach because it is likely part of a dimension table with integer keys matching integers on a fact table, which is far more likely to have the supported types MonetDB Binary Bulk-loading likes. Is it true that MonetDB does not handle joins well, but instead prefers single, denormalized tables with many columns?
From what you know of the internals of MonetDb, is the effort to make some kind of streaming binary bulk loader worth it in terms of execution time on the database converting to native binary types? The documentation used the words, 'slightly faster', but I do not have a feel for when 'slightly' makes a big difference.
Thanks again for considering these questions. Brandon