On Jul 28, 2005, at 1:19 PM, Martin Kersten wrote:
MIL is likely to be moved into a corner next year when MonetDB Version 5 is released.
What is the expected replacement?
An assembler-like version, intended as a target language for front-end compilers and optimizers, not for end-user programming.
Then I could actually translate things into this new MIL. Sound good.
Can I screw it [BAT properties] up by using GDK functions?
Mostly not, but you are really linking into a kernel and this requires a lot more administration to take care off. Not to mention proper garbage collection,..., i.e. all the stuff hidden by the language interpreters.
Fortunately for me garbage collection is taken care of by Lisp. It's also a mixed interpreted _and_ compiled environment which gives me an additional leg up.
The structure of the table is known and the integrity constraints allow direct updates on the BATs.
Yes, my updates are fixed indeed. Where can I find the batch-loading code? Is that the ascii_io module?
Then a simple IP-channel can be used to quickly pump data into the system.
It does not need to be an IP channel, right? I could just as well pump data into the system directly, right?
Depending on the batching size implied in a separate data gatherer, you should be able to handle anywhere between 5K-200K inserts/sec on a normal PC. It also bypasses the SQL logger, which may not be what you want.
I will have to run some experiments here as loosing incoming data would be very bad for me on the one hand and on the other hand I need to store data as quickly as possible. I have a Mac OSX laptop.
A time-series module over BATs would be the preferred way to go.
What should this module be written in?
Such a module would have primitives for moving-statistics, window-based selections, window-based signal analysis, and schemes for efficient temporal joins (including interpolation).
I have more interest in rolling out a trading platform but once the first version is out I could definitely look into optimizations like building a time series module. I recon from your "Efficient k-NN Search on Vertically Decomposed Data" paper that I could calculate the Euclidian distance between time series on the fly since vector ops are very fast. At least I could try. I'm also talking to the http://www.cs.purdue.edu/spgist/ folks and other scientists to see if I could use one of the indexing schemes as a MonetDB search accelerator (hope I have the terminology right). I did not see an answer on my other post so I still don't know if a range search can be efficiently done on a lng bat. If they can then I could just do with that as my queries are mostly run this strategy on this range of data, using a sliding window of X BUNs. The range of data would be a subset of the price BAT, limited by a range of dates in a different BAT. But then I also need to consider a symbol BAT to take only MSFT or IBM, etc. into account. It seems a lot like the problem that you tackled in the k-NN paper. I don't fully understand how indexes work in MonetDB. I don't understand if I need to manually create them on my columns for example. I suppose this does not make sense with MonetDB. It seems that only one index is used, the one that links all BATs in a table so that particular BUNs can be accessed. Let me know if I got it right please.
Unfortunately, there is no free lunch and I guess a more top-down architectural design is now what you should be looking for.
I'm reading through Peter Alexander Boncz's dissertation at the moment.
Before you jump into the MonetDB code. Compare it with riding a Ferrari the first time around, you can get good speed-up, but also easily get killed.
Yes, I'm aware of the dangers but think that I'm on to something exciting. Your help and advice is greatly appreciated! Thanks, Joel -- http://wagerlabs.com/uptick