Columnar Storage layout of MonetDB
I want to know the file layout of columnar data storage in MonetDB. I know each column is stored in separate files. Actually I want to know the exact layout as in a block diagram to get a clear picture of the storage layout. Helps are appreciated. Regards, Senthil Vidhiyakar. S
The layout is actually rather involved, especially if you want to know about how the structure changes/files get created or moved during the course of a run of the mserver5 process; or how SQL columns relate to BAT columns; etc. Also - the file layout changes with newer versions of MonetDB, and specifically, there was a significant change with version 11.25 (Dec 2016) - BATs are now always headless, so the format does not support storing actual heads. In light of the above - can you try to describe your interests slightly more specifically? Eyal On 03/20/2017 12:58 PM, Senthil Vidhiyakar wrote:
I want to know the file layout of columnar data storage in MonetDB. I know each column is stored in separate files. Actually I want to know the exact layout as in a block diagram to get a clear picture of the storage layout. Helps are appreciated.
Regards, Senthil Vidhiyakar. S
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
I want to know the pattern in which the columns are stored in each file. I
mean the arrangement of data in a column in that file.
How monetdb determines offset to select a particular row, header and
footer format and other details like that.
On Mon, Mar 20, 2017 at 5:59 PM, Eyal Rozenberg
The layout is actually rather involved, especially if you want to know about how the structure changes/files get created or moved during the course of a run of the mserver5 process; or how SQL columns relate to BAT columns; etc.
Also - the file layout changes with newer versions of MonetDB, and specifically, there was a significant change with version 11.25 (Dec 2016) - BATs are now always headless, so the format does not support storing actual heads.
In light of the above - can you try to describe your interests slightly more specifically?
Eyal
On 03/20/2017 12:58 PM, Senthil Vidhiyakar wrote:
I want to know the file layout of columnar data storage in MonetDB. I know each column is stored in separate files. Actually I want to know the exact layout as in a block diagram to get a clear picture of the storage layout. Helps are appreciated.
Regards, Senthil Vidhiyakar. S
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
On 21 Mar 2017, at 06:04, Senthil Vidhiyakar
wrote: I want to know the pattern in which the columns are stored in each file. I mean the arrangement of data in a column in that file.
Hai, The pattern is simple. Each column is stored in one file. within one file, data are stored in the order they were added, as a single C-array.
How monetdb determines offset to select a particular row, header and footer format and other details like that.
I don’t know what you mean with “header and footer format”, but determining the selected row is a complicated process, which depends on the data, query and whether some secondary indices can/should be used… Regards, Jennie
On Mon, Mar 20, 2017 at 5:59 PM, Eyal Rozenberg
wrote: The layout is actually rather involved, especially if you want to know about how the structure changes/files get created or moved during the course of a run of the mserver5 process; or how SQL columns relate to BAT columns; etc.
Also - the file layout changes with newer versions of MonetDB, and specifically, there was a significant change with version 11.25 (Dec 2016) - BATs are now always headless, so the format does not support storing actual heads.
In light of the above - can you try to describe your interests slightly more specifically?
Eyal
On 03/20/2017 12:58 PM, Senthil Vidhiyakar wrote: I want to know the file layout of columnar data storage in MonetDB. I know each column is stored in separate files. Actually I want to know the exact layout as in a block diagram to get a clear picture of the storage layout. Helps are appreciated.
Regards, Senthil Vidhiyakar. S
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
----- On Mar 21, 2017, at 8:40 AM, Ying Zhang Y.Zhang@cwi.nl wrote:
On 21 Mar 2017, at 06:04, Senthil Vidhiyakar
wrote: I want to know the pattern in which the columns are stored in each file. I mean the arrangement of data in a column in that file.
Hai,
The pattern is simple. Each column is stored in one file. within one file, data are stored in the order they were added, as a single C-array.
in other words, for fixed-width types (and only for those), the storage format on disk is the same as in memory: a simple C array of the respective type. See also https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/BinaryBulkLoad For the storage (and in-memory) format of auxiliary access structures / indexes, please see the respective source code. For variable-width types (e.g., strings), MonetDB uses a kind of best-effort dictionary encoding; the detailed are complicated and documented only in/as the source code.
How monetdb determines offset to select a particular row, header and footer format and other details like that.
I don’t know what you mean with “header and footer format”, but determining the selected row is a complicated process, which depends on the data, query and whether some secondary indices can/should be used…
Offset of row i is array index i. There are no footers at all. Some header information (properties, etc.) per column are stored in a struct in memory, and dumped (serialized) into a single text file (BBP.dir) on disk. Details are (only) in the source code. Best, Stefan
Regards, Jennie
On Mon, Mar 20, 2017 at 5:59 PM, Eyal Rozenberg
wrote: The layout is actually rather involved, especially if you want to know about how the structure changes/files get created or moved during the course of a run of the mserver5 process; or how SQL columns relate to BAT columns; etc.
Also - the file layout changes with newer versions of MonetDB, and specifically, there was a significant change with version 11.25 (Dec 2016) - BATs are now always headless, so the format does not support storing actual heads.
In light of the above - can you try to describe your interests slightly more specifically?
Eyal
On 03/20/2017 12:58 PM, Senthil Vidhiyakar wrote: I want to know the file layout of columnar data storage in MonetDB. I know each column is stored in separate files. Actually I want to know the exact layout as in a block diagram to get a clear picture of the storage layout. Helps are appreciated.
Regards, Senthil Vidhiyakar. S
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
A few more points: * BBP stands for BAT Buffer Pool. * The BBP.dir is in /path/to/dbfarm/your_db_name/bat/BACKUP/bbp.dir * The BBP.dir is mostly a space-separated values table, with each row containing information about one of the columns in the pool. * Since the per-column file is typically mmap()ed , it has the same _endianness_ [1] as the machine on which it was created. [1]: https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html On 03/21/2017 09:00 AM, Stefan Manegold wrote:
----- On Mar 21, 2017, at 8:40 AM, Ying Zhang Y.Zhang@cwi.nl wrote:
On 21 Mar 2017, at 06:04, Senthil Vidhiyakar
wrote: I want to know the pattern in which the columns are stored in each file. I mean the arrangement of data in a column in that file.
Hai,
The pattern is simple. Each column is stored in one file. within one file, data are stored in the order they were added, as a single C-array.
in other words, for fixed-width types (and only for those), the storage format on disk is the same as in memory: a simple C array of the respective type. See also https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/BinaryBulkLoad
For the storage (and in-memory) format of auxiliary access structures / indexes, please see the respective source code.
For variable-width types (e.g., strings), MonetDB uses a kind of best-effort dictionary encoding; the detailed are complicated and documented only in/as the source code.
How monetdb determines offset to select a particular row, header and footer format and other details like that.
I don’t know what you mean with “header and footer format”, but determining the selected row is a complicated process, which depends on the data, query and whether some secondary indices can/should be used…
Offset of row i is array index i. There are no footers at all. Some header information (properties, etc.) per column are stored in a struct in memory, and dumped (serialized) into a single text file (BBP.dir) on disk. Details are (only) in the source code.
Best, Stefan
Regards, Jennie
On Mon, Mar 20, 2017 at 5:59 PM, Eyal Rozenberg
wrote: The layout is actually rather involved, especially if you want to know about how the structure changes/files get created or moved during the course of a run of the mserver5 process; or how SQL columns relate to BAT columns; etc.
Also - the file layout changes with newer versions of MonetDB, and specifically, there was a significant change with version 11.25 (Dec 2016) - BATs are now always headless, so the format does not support storing actual heads.
In light of the above - can you try to describe your interests slightly more specifically?
Eyal
On 03/20/2017 12:58 PM, Senthil Vidhiyakar wrote: I want to know the file layout of columnar data storage in MonetDB. I know each column is stored in separate files. Actually I want to know the exact layout as in a block diagram to get a clear picture of the storage layout. Helps are appreciated.
Regards, Senthil Vidhiyakar. S
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
participants (4)
-
Eyal Rozenberg
-
Senthil Vidhiyakar
-
Stefan Manegold
-
Ying Zhang