Jason Kinzer wrote:
Hi, Hi Jason,
Thanks for your thoughts.
Are there any plans on the horizon to roll compression into Monet? The X-100 project looked really interesting in this regard, but as I understand it, that work has been transferred into VectorWise.
correct, but .... For many years MonetDB already uses dictionary compression over string columns. Provided the string table is relatively small (128MB). Furthermore, most OID columns used for intermediates do not take storage at all, and where ever possible BAT heaps are shared. However, in a recent DW experiment http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-perform... we noticed that our references to string values were the cause of excessive of space consumption. (8 bytes referring to 1 byte) This has been solved in the upcoming Feb2010 release. The effect on database size and performance is shown in http://www.cwi.nl/~mk/ontimeReport Furthermore, there is a dictionary compression optimizer in Feb2010, which works for any type. It is available at the SQL level on a per table basis. However, compression not necessarily leads to performance gains, you have to decompress at some point in most plans. Furthermore, this code is alpha-stage. A driving (test) user would help to improve it. There are more options to exploit compression. Early versions of MonetDB used gzipped BATs (10yrs ago already). The current software stack would even make it possible to massage a plan to used e.g. bitvectors.
If there are no plans, is this because it's completely antithetical to the monet architecture (from the papers it seems like X-100 was, to some degree at least, 'integrated' in), or more due to lack of resources?
We can always use resources to make the code base better. Involved developer/users, but also dollarwise in the form of companies using MonetDB in their applications and who want a (continual) performance/functional quality assessment testing agreement.
My motivating example here is OLAP: I frequently have 1 relatively large fact table and then many much smaller dimensional tables. If optional compression were available, it we be nice to compress all or some of the BATs for the fact table columns and then have the others work as usual.
(Well, at least this sounds good, maybe it makes no sense). Another motivation is there seems to be a lot of anecdotal evidence for companies moving from larger big iron servers to more numerous, smaller machines - so it would be really nice to have this capability for more memory constrained settings. Indeed, I expect this year will bring some MonetDB surprises (again). Some of them are already in the distribution. To pick one, the code base contains a 'recycler' optimizer, that can for many BI applications provide a significant performance booster. A throughput improvement of >40% has been reported already. (It received an award in SIGMOD 2009 for its innovation) Such optimizers gradually are integrated in the default optimizer
So, there is a lot and more coming upon need. pipe, but available for adventurous users already.
I understand on a basic level how compression conflicts with the relatively simple approach monet uses to load BATs (e.g. memory map), but, dwelling in ignorance, I blithely assume there could be some solution not as complex as X-100 if one were to accept a significant performance cost. For example: decompressing BAT data on the fly as part of a BATiterator. I probably don't have the skills to implement even a basic on-the-fly decompression approach like this, but just wondering aloud: how hard a problem is this?
regards, Martin
Thanks, Jason
------------------------------------------------------------------------
------------------------------------------------------------------------------ Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users