MonetDB developers,
As my Easter contribution, I just checked in changes that
introduce a change in which strings are stored in BATs, that affects only the
(not ver often used?) setting of 64-bits compilation with 32-bits oids. This is
achieved with the configure options --enable-bits=64 --enable-oid32. By
default, the oid width (used to identify tuples) follows the addressing bit
with (hence 64-bits pointers and 64-bits oids).
The motivation to use 32-bits oids is that many
intermediates (often consisting of oid columns) and also the hash tables and
the string offsets (which use the type var_t that follows oid in its width)
become smaller. For those 64-bit users that are interested in reducing memory
consumption, 32-bits oids will likely reduce the MonetDB memory footprint by
20-40%. Especially those who are just running out of space may benefit.
Using 32-bits oids has the effect that only 4G tuples may be
stored in a single (intermediate) table. The vaste majority of database
problems run on a single box (as MonetDB does) have this characteristic.
Regrettably, in the past, an additional resitriction (due to var_t) was that
each individual string column could hold max 4GB as well. Note that MonetDB
does best-effort string de-duplication in a column, so your tring data might be
smaller than you think. By the way, there is no good way to get precise stats
on the physical size of a column, but you can scan for .theap files in the
bat/ directory of your database installation -- those are the string heaps.
So, the change checked in interprets the var_t no longer as
a byte offset, but an index that is multiplied by 8 (GDK_VARSHIFT=3 for
64bits/oid32, zero otherwise -- so then it is a no-op removed by the compiler
even). It also means that strings in the heap are aligned on 8-byte boundaries,
then. Note that strings often are aligned to 4-byte boundaries (32-bits) or 8-byte
boundaries (64-bits) anyway, unless the duplicate elimination really gave up,
which happens if only a very low duplicate locality exists. Also, the string
heaps additionally stored pointers in the heap for the de-duplication
hash-table which could lead to an avg overhead of 4 (half of 8 byte alignment)
+ 8 (pointer) = 12 bytes per string. The change also improves the non
GDK_VARSHIFT compilation mods in reducing this overhead to 2 bytes only. For
64-bits/oid32, the overhead is 2+4, due to the forced alignment.
So, in general, string storage got more compact across
the board (the effect will depend on your string distribution), notehowever
that for 32/32 and 64/64 the binary BAT format has *NOT* changed. No need to
export and re-import.
As for my original goal, for 64-bits/oid32 the effect is
that a single column can now contain 32GB of (de-duplicated) string data. That
aligns much better with the 4G tuple limit than the previous 4GB. I made
this change actually mostly because I think oid32 reduces memory consumption,
not even so much the var_t (string) offset width. Still as var_t and oid move
together, I had to do something. As an alternative, we could make var_t
independently configurable (such that we have 32-bits oids and 64-bits var_t).
This still remains an option for the future. Another potential option for the
future is to set the string alignment on a bat-for-bat (per-column) basis. This
is also feasible, but would require more API changes.
A final point of attention is the stability/fallout of this
change. I have tried to be complete -- my changes span all our CVS
repositories. However, I did not that in general the support for 64-bits/oid32
is not as perfect as it should be, as evidenced by the TestWeb. Strinking
examples are queries in the src/benchmarks/ATIS (especially the simple join
query seems easy to debug), and TPC-H. But in general the 64/32 column shows
more red than other compilation settings. I think t=it is an interesting
setting, we have used it consistently for our XMark XQuery benchmarking, and I
think it also serves well for TPC-H (up to SF300) and e.g. the open streetmap
data. Therefore, I think it would be worthwhile if some of these long standing
M5/SQL issues that are specific to 64/32 would be fixed.
thanks for your attention,
Peter