Hi Martin,

Thanks for the detailed response.
 
...
This has been solved in the upcoming Feb2010 release.
The effect on database size and performance is shown in
http://www.cwi.nl/~mk/ontimeReport

 
That's great news. I had actually been encoding my dimension PK columns as INTs/LONGs to save on (perceived) string costs. Thanks!!
 
Furthermore, there is a dictionary compression optimizer in Feb2010,
which works for any type. It is available at  the SQL level on a
per table basis.

Even better. But I wonder, outside of strings, does dictionary compression help much? (e.g. I assume for a BAT of INTs the dictionary offset would be as large the original data)
Or do you mean it is not necessary a static dictionary, but e.g. works like LZ77, encoding groups of recurring data?
 
However, compression not necessarily leads to
performance gains, you have to decompress at some point in most plans.
Furthermore, this code is alpha-stage. A driving (test) user would
help to improve it.

Granted. Actually I am more concerned with memory use than raw performance, MonetDB is generally plenty fast for my needs already :-)
(BTW, as far as testing it out, it is available (only) in SVN trunk right now, correct?)
 
There are more options to exploit compression. Early versions of MonetDB
used gzipped BATs (10yrs ago already).

Yes, I read some archaic-looking comments about storing BATs using compress, and then decompressing them all at once. (I guess that doesn't help main memory use, but still intersting). Does version 5 still support that?
 
The current software stack
would even make it possible to massage a plan to used e.g. bitvectors.

My dimensions (and hence PK columns) tend to be low cardinality - so if I  correctly understand what bit vectors offer, that would save a ton of space in my corresponding Fact table FK columns, no?

Are these projects easy enough for someone outside the core team to approach? (albeit very slowly ;-)



If there are no plans, is this because it's completely antithetical to the monet architecture (from the papers it seems like X-100 was, to some degree at least, 'integrated' in), or more due to lack of resources?

We can always use resources to make the code base better. Involved
developer/users, but also dollarwise in the form of companies
using MonetDB in their applications and who want a (continual)
performance/functional quality assessment testing agreement.

Understand completely. 


My motivating example here is OLAP: I frequently have 1 relatively large fact table and then many much smaller dimensional tables. If optional compression were available, it we be nice to compress all or some of the BATs for the fact table columns and then have the others work as usual.
So, there is a lot and more coming upon need.

Well, OLAP & memory-constrained environments represent my need, but perhaps I'm an outlier. Whatever you choose to address however, thanks for all your hard work.
 
(Well, at least this sounds good, maybe it makes no sense). Another motivation is there seems to be a lot of anecdotal evidence for companies moving from larger big iron servers to more numerous, smaller machines - so it would be really nice to have this capability for more memory constrained settings.
Indeed, I expect this year will bring some MonetDB surprises (again).
Some of them are already in the distribution. To pick one, the
code base contains a 'recycler' optimizer, that can for many BI
applications provide a significant performance booster. A throughput
improvement of >40% has been reported already.
(It received an award in SIGMOD 2009 for its innovation)
Such optimizers gradually are integrated in the default optimizer
pipe, but available for adventurous users already.

Thanks for pointing it out, I'll read up on it. (And congratulations on the award added to your collection :-)

Best Regards.

Jason