[Monetdb-developers] MonetDB/SQL: bulk-loading large data sets & adding key constraints
Dear all, derived from recent experiments and intensive testing, here are some recommendations for bulk-loading large data sets (in particular those larger than your available main memory) in MonetDB/SQL and adding constraints, i.e., primary and foreign keys. First of all, bulk loading should be done from (possibly gzip or bzip2 compressed) CSV files using the SQL `COPY INTO` command; cf., http://monetdb.cwi.nl/SQL/Documentation/Data-Manipulation.html In case your tables come with primary or foreign key constraints, it is recommended to omit these constrains when initially creating the tables. Then bulk-load (`COPY INTO`) your data. Once the data is loaded add the key constraints one by one using the SQL `ALTER TABLE ... ADD CONSTRAINT ...` command. Note, though, that the current Feb2010 release of MonetDB/SQL might experience some performance problems when adding constraints on data sets larger than main memory on multi-core machines. The reason it a non-optimal query translation and plan parallelization. For the time being, i.e., until we manage to fix this problem properly, it is recommended to resort to the optimizer pipeline of the Nov2009 release (for insiders, this effectively means disabling the "mitosis" optimizer), (only) while adding the key constraints. To do so, simply activate the Nov2009 optimizer pipeline via SET optimizer='nov2009_pipe'; just before adding the key constraints via `ALTER TABLE ... ADD CONSTRAINTS` and then restore the default optimizer pipeline via SET optimizer='default_pipe'; once you're done with adding constraints. For TPCH scale factor 100 (100 GB) on a 8 GB 4-core machine, this reduces the total time to add all key constraints from ~1.5h to ~1h. For SSB scale factor 100 (100 GB) on a 8 GB 4-core machine, this reduces the total time to add all key constraints from ~4h to ~1h. Hope this helps. Please don't hesitate to comment or ask questions. We'll let you know, one we managed to fix this problem properly. Kind regards, Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
participants (1)
-
Stefan Manegold