[Monetdb-developers] 32GB string heaps for 64-bits/oid32
MonetDB developers, As my Easter contribution, I just checked in changes that introduce a change in which strings are stored in BATs, that affects only the (not ver often used?) setting of 64-bits compilation with 32-bits oids. This is achieved with the configure options --enable-bits=64 --enable-oid32. By default, the oid width (used to identify tuples) follows the addressing bit with (hence 64-bits pointers and 64-bits oids). The motivation to use 32-bits oids is that many intermediates (often consisting of oid columns) and also the hash tables and the string offsets (which use the type var_t that follows oid in its width) become smaller. For those 64-bit users that are interested in reducing memory consumption, 32-bits oids will likely reduce the MonetDB memory footprint by 20-40%. Especially those who are just running out of space may benefit. Using 32-bits oids has the effect that only 4G tuples may be stored in a single (intermediate) table. The vaste majority of database problems run on a single box (as MonetDB does) have this characteristic. Regrettably, in the past, an additional resitriction (due to var_t) was that each individual string column could hold max 4GB as well. Note that MonetDB does best-effort string de-duplication in a column, so your tring data might be smaller than you think. By the way, there is no good way to get precise stats on the physical size of a column, but you can scan for .theap files in the bat/ directory of your database installation -- those are the string heaps. So, the change checked in interprets the var_t no longer as a byte offset, but an index that is multiplied by 8 (GDK_VARSHIFT=3 for 64bits/oid32, zero otherwise -- so then it is a no-op removed by the compiler even). It also means that strings in the heap are aligned on 8-byte boundaries, then. Note that strings often are aligned to 4-byte boundaries (32-bits) or 8-byte boundaries (64-bits) anyway, unless the duplicate elimination really gave up, which happens if only a very low duplicate locality exists. Also, the string heaps additionally stored pointers in the heap for the de-duplication hash-table which could lead to an avg overhead of 4 (half of 8 byte alignment) + 8 (pointer) = 12 bytes per string. The change also improves the non GDK_VARSHIFT compilation mods in reducing this overhead to 2 bytes only. For 64-bits/oid32, the overhead is 2+4, due to the forced alignment. So, in general, string storage got more compact across the board (the effect will depend on your string distribution), notehowever that for 32/32 and 64/64 the binary BAT format has *NOT* changed. No need to export and re-import. As for my original goal, for 64-bits/oid32 the effect is that a single column can now contain 32GB of (de-duplicated) string data. That aligns much better with the 4G tuple limit than the previous 4GB. I made this change actually mostly because I think oid32 reduces memory consumption, not even so much the var_t (string) offset width. Still as var_t and oid move together, I had to do something. As an alternative, we could make var_t independently configurable (such that we have 32-bits oids and 64-bits var_t). This still remains an option for the future. Another potential option for the future is to set the string alignment on a bat-for-bat (per-column) basis. This is also feasible, but would require more API changes. A final point of attention is the stability/fallout of this change. I have tried to be complete -- my changes span all our CVS repositories. However, I did not that in general the support for 64-bits/oid32 is not as perfect as it should be, as evidenced by the TestWeb. Strinking examples are queries in the src/benchmarks/ATIS (especially the simple join query seems easy to debug), and TPC-H. But in general the 64/32 column shows more red than other compilation settings. I think t=it is an interesting setting, we have used it consistently for our XMark XQuery benchmarking, and I think it also serves well for TPC-H (up to SF300) and e.g. the open streetmap data. Therefore, I think it would be worthwhile if some of these long standing M5/SQL issues that are specific to 64/32 would be fixed. thanks for your attention, Peter
participants (1)
-
Peter Boncz