On Thu, 31 Mar 2005 21:04:50 +0200, Martin Kersten
Thanks. Challenging application.
Indeed, but from previous experiments, properly solving it reduces some common tasks from days to seconds, which is more than worth the investment.
Loading MonetDB tables from very short (SQL) queries are not the most optimal road. A (very old) experiment of >3 years ago on simple MIL insert into a table over a shared TCP-IP connection without any further optimization on a 4 year old machine (running Linux) ran at ca >10K inserts/second. The connection was mostly the bottleneck.
Well, batching is more than acceptable. I currently use SQL*Loader in Oracle, and the COPY syntax in PostgreSQL to accomplish this, batching them in anywhere from 100-1000 "records," depending... allowing for roughly 1 second intervals (tunable for latency). This is more than acceptable, and in fact 5 seconds would be ok too.
More recent little experiment shows 25K/sec. This stuff can all be significantly improved with a little batching on both sides. The pitfall is to make sure your BATs are large enough to capture the intended updates without the need to resize itself, because that causes IO.
Well, in this case, it seems that pre-sizing would be more than acceptable, as you can do this adaptively based on the estimated load. Obviously I'm still learning MonetDB, so I'm not sure I understood the comment about pointers, honestly, and how that applies to splitting across multiple "tables" for partitioning. Could you expand on that?
If you go ahead with MIL production we can give you some additional advice (that is ofcourse lacking in the documentation ;-))
Always appreciated. Chris -- | Christopher Petrilli | petrilli@gmail.com