I have been reading about MonetDB for a few days now, and have played around with it some and it seems to solve some of the ultra complex queries that I'd like to do in a time that is amazing. Congrats! One thing, though, that confuses me is some scaling issues.
The data set that I'm looking at is potentially in the billions of traditional rows, with about 50 columns, so from my understanding that'd be 50 different BATs each with a billion or more items. My Indeed, basically correct. concern is that OIDs seem to be restricted to 2^32-1, although from looking at virtual OIDs, it's unclear whether in my situation that would be useful? Depends on your platform. We happily work with 64-bit virtual OIDs on 64bit platforms. On the 32-bit platforms you ultimately have a
Chris, Thanks for your remarks and interest. A quick reaction, others will definitely follow up if my answer is incomplete. Christopher Petrilli wrote: problem, because you indeed will run out off address space quickly.
Any thoughts on stuffing that much data into the system, especially how it might impact dealing with VM issues? I'm more than happy to
say "this is the wrong approach" or system for this problem, as this is obviously a non-normal case. We love non-normal cases and unconventional use. (But also can;t
A naive mapping of huge BATs will ultimately slow down your system to that of your IO system, and things would get worse if you are not carefull in garbage collecting intermediate results. We are playing with TPCH database of up to 100Gb right now, but are actively moving into the next realm. Two approaches are on the way, one where we replace the kernel with MonetDB/X100, specially designed for this kind of problems (actually, we have some radio astronomy database in mind, which runs in the terabytes) and using BAT groups in Monet/Five. The latter you can easily built on top of the currently released system, i.e. partition the BAT using the void ranges and adjust your algorithms to work on the partitions one (few) at a time. This is what we do to manipulate largers Multimedia Database problems. Are you planning to use the kernel from SQL or directly from MIL? Tell us a little more of the kind of application you are working on. Perhaps we can give you a few helpful pointers. promises to solve all problems arising within 24 hours) regards, Martin
Chris