[Monetdb-developers] 32GB string heaps for 64-bits/oid32

14 Apr 2009

      MonetDB developers,

As my Easter contribution, I just checked in changes that introduce a change
in which strings are stored in BATs, that affects only the (not ver often
used?) setting of 64-bits compilation with 32-bits oids. This is achieved
with the configure options --enable-bits=64 --enable-oid32. By default, the
oid width (used to identify tuples) follows the addressing bit with (hence
64-bits pointers and 64-bits oids).

The motivation to use 32-bits oids is that many intermediates (often
consisting of oid columns) and also the hash tables and the string offsets
(which use the type var_t that follows oid in its width) become smaller. For
those 64-bit users that are interested in reducing memory consumption,
32-bits oids will likely reduce the MonetDB memory footprint by 20-40%.
Especially those who are just running out of space may benefit.

Using 32-bits oids has the effect that only 4G tuples may be stored in a
single (intermediate) table. The vaste majority of database problems run on
a single box (as MonetDB does) have this characteristic. Regrettably, in the
past, an additional resitriction (due to var_t) was that each individual
string column could hold max 4GB as well. Note that MonetDB does best-effort
string de-duplication in a column, so your tring data might be smaller than
you think. By the way, there is no good way to get precise stats on the
physical size of a column, but you can scan for .theap files in  the bat/
directory of your database installation -- those are the string heaps.

So, the change checked in interprets the var_t no longer as a byte offset,
but an index that is multiplied by 8 (GDK_VARSHIFT=3 for 64bits/oid32, zero
otherwise -- so then it is a no-op removed by the compiler even). It also
means that strings in the heap are aligned on 8-byte boundaries, then. Note
that strings often are aligned to 4-byte boundaries (32-bits) or 8-byte
boundaries (64-bits) anyway, unless the duplicate elimination really gave
up, which happens if only a very low duplicate locality exists. Also, the
string heaps additionally stored pointers in the heap for the de-duplication
hash-table which could lead to an avg overhead of 4 (half of 8 byte
alignment) + 8 (pointer) = 12 bytes per string. The change also improves the
non GDK_VARSHIFT compilation mods in reducing this overhead to 2 bytes only.
For 64-bits/oid32, the overhead is 2+4, due to the forced alignment. 

So, in general, string storage got  more compact across the board (the
effect will depend on your string distribution), notehowever that for 32/32
and 64/64 the binary BAT format has *NOT* changed. No need to export and
re-import.

As for my original goal, for 64-bits/oid32 the effect is that a single
column can now contain 32GB of (de-duplicated) string data. That aligns much
better with the 4G tuple  limit than the previous 4GB. I made this change
actually mostly because I think oid32 reduces memory consumption, not even
so much the var_t (string) offset width. Still as var_t and oid move
together, I had to do something. As an alternative, we could make var_t
independently configurable (such that we have 32-bits oids and 64-bits
var_t). This still remains an option for the future. Another potential
option for the future is to set the string alignment on a bat-for-bat
(per-column) basis. This is also feasible, but would require more API
changes.

A final point of attention is the stability/fallout of this change. I have
tried to be complete -- my changes span all our CVS repositories. However, I
did not that in general the support for 64-bits/oid32 is not as perfect as
it should be, as evidenced by the TestWeb.  Strinking examples are queries
in the src/benchmarks/ATIS (especially the simple join query seems easy to
debug), and TPC-H. But in general the 64/32 column shows more red than other
compilation settings. I think t=it is an interesting setting, we have used
it consistently for our XMark XQuery benchmarking, and I think it also
serves well for TPC-H (up to SF300) and e.g. the  open streetmap data.
Therefore, I think it would be worthwhile if some of these long standing
M5/SQL issues that are specific to 64/32 would be fixed.

thanks for your attention,

Peter

Peter Boncz

tags

participants (1)