Re: [MonetDB-users] Delete is slow?

10 Feb 2005

      Thanks for the answers Stefan!

The MonetDB system is definitely looking quite interesting.

On Thu, 10 Feb 2005, Stefan Manegold wrote:
...
...
- I tried a load where I committed after every 100 rows. Noticed _huge_
  I/O surges. Looking at the  subdirectory, it looked like the
  X.theap file was being rewritten over and over from scratch. If this is
  a memory-mapped file, why would this occur? When you start playing with
  55MB heap files, this kills performance.
For now, we cannot do much about that, and I suppose, there'll never be any
change. Basically, we have to write the whole file to make sure that all
changes are properly committed.
But why do you consider having commits after every 100 rows, if you do load
250k rows in one go? Isn't a single commit at the end enough?
I'm looking at potentially using this on a live system, where data will be
arriving at some steady rate (approximatly 10-50 items per second). The
current data set has some tables with over 500M rows at the moment (and
growing by >5M rows per day).

After looking a bit, I can see the following solutions:

(1) Only ever do bulk loads (say at the end of the day). Disadvantage, no
analysis of date from the current day. Advantage, can try to use a bulk
load mechanism rather than "one row at a time".

(2) Leave the system "dirty" and do a commit only at the end of the day.
Still trying to determine the exact concurrency model. From what I've read
so far, there is only global locking and no "multi-view" similar to most
databases (such as Oracle or Postgresql), so this should work fine as any
uncommitted data could be seen by other processes (NOTE: I have not tested
this theory yet, so please excuse if I have it wrong, only so much time
to play with the system!).

(3) I was thinking of using persistent sub-BATs (a BAT within a BAT).
Use this to partition the data into sets (say, one sub-BAT per day). But I
was seeing some comments on the mailing lists re persistent BATs within
BATs no longer supported? I have had no time to dig/experiment with this
option. And I would be worried about the performance of operations trying
to run over a partitioned set. On the other hand, I will definitely have
to do something due to the sheer size of the data.

Any suggestions from the community on how to deal with a live data stream?
...
...
- Is it better to use "BAT.reverse().find(X)" than "BAT.search(X)"?
Well, this are two "different pairs of shoes":
Doh! That will teach me from trying to type via memory. I did mean
uselect(), and thanks for the pointers to the differences re "definite
existence" vs a set.

It was the tail vs head operation, but given that "reverse()" is
essentially free, I won't worry about it.

Thanks for the hints!
Ed

Re: [MonetDB-users] Delete is slow?

Edmund Dengler