[MonetDB-users] Storing UUID, column of type Binary?
Another alternative we have considered is to store the UUID, split into two columns of type BIGINT (UUID1 holding the 64 most significant bits of the UUID, and UUID2 holding the least 64 bits) and do all our queries along the lines of SELCT * WHERE testUUID = UUID1+UUID2 On a 64 bit system this still represents a 64 bit storage penalty as compared to a single column of type VARBINARY(16) but it is still a 64 bit saving as compared to storing the UUID as a 32 byte string (separators removed) Alternative 1. VARBINARY(16) storage = 64bit OID + 128bit value = total 172 bits Alternative 2. BIGINT x 2 storage = 64bit OID + 64bit value + 64bit OID + 64bit value = total 256 bits Alternative 3. VARCHAR(32) storage = 64bit OID + 256bit value = total 320 bits Would there be a significant performance penalty to queries on UUID1+UUID2 as compared to VARCHAR(32) ? Thanks On Sunday 17 February 2008 10:16:49 McKennirey.Matthew wrote:
In our application we use a UUID (128 bits) to identify objects. Many of our columns are simply storing these UUIDs.
At the moment our only alternative seems to be to cast these 16 byte binary strings as character strings, increasing the size to 36 bytes (or 32 bytes if we remove the '-' separators). This has obvious drawbacks in terms of size and will often mean the difference between being able to read the column into memory or not with the obvious peformance degradation.
Ideally we would like to define a column type of VARBINARY(16) (using a MySQL datatype example, actual datatype names vary across database technologies, Oracle=RAW or VARBINARY, Postgres=BIT or BYTEA, etc.). Operations on such a column would treat the content simply as a sequence of bits, not interpreting them as characters, integers, etc.
Has anyone attempted something like this with MonetDB, or are we missing some obvious solution?
Thanks very much.
On Mon, Feb 18, 2008 at 11:24:59AM -0500, McKennirey.Matthew wrote:
Another alternative we have considered is to store the UUID, split into two columns of type BIGINT (UUID1 holding the 64 most significant bits of the UUID, and UUID2 holding the least 64 bits) and do all our queries along the lines of SELCT * WHERE testUUID = UUID1+UUID2
On a 64 bit system this still represents a 64 bit storage penalty as compared to a single column of type VARBINARY(16) but it is still a 64 bit saving as compared to storing the UUID as a 32 byte string (separators removed)
Alternative 1. VARBINARY(16) storage = 64bit OID + 128bit value = total 172 bits Alternative 2. BIGINT x 2 storage = 64bit OID + 64bit value + 64bit OID + 64bit value = total 256 bits Alternative 3. VARCHAR(32) storage = 64bit OID + 256bit value = total 320 bits
You assume MonetDB requires a full 64 bit oid head column. This is not the case, we only store the first oid and count. Only when we access (through selects etc) these columns we need to materialize these oids. So your second alternative also only costs 128 bits (like your first). There is afcourse the solution to create your own c-implemenation for 128 bit numbers. See for an example the inet module in MonetDB5/src/modules/atoms/inet.mx Niels
Would there be a significant performance penalty to queries on UUID1+UUID2 as compared to VARCHAR(32) ?
Thanks
On Sunday 17 February 2008 10:16:49 McKennirey.Matthew wrote:
In our application we use a UUID (128 bits) to identify objects. Many of our columns are simply storing these UUIDs.
At the moment our only alternative seems to be to cast these 16 byte binary strings as character strings, increasing the size to 36 bytes (or 32 bytes if we remove the '-' separators). This has obvious drawbacks in terms of size and will often mean the difference between being able to read the column into memory or not with the obvious peformance degradation.
Ideally we would like to define a column type of VARBINARY(16) (using a MySQL datatype example, actual datatype names vary across database technologies, Oracle=RAW or VARBINARY, Postgres=BIT or BYTEA, etc.). Operations on such a column would treat the content simply as a sequence of bits, not interpreting them as characters, integers, etc.
Has anyone attempted something like this with MonetDB, or are we missing some obvious solution?
Thanks very much.
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Niels, thanks very much for the explanation. Based on that we decided to go ahead and try alternative 2 (UUID as BIGINT x 2) and we have created and implementation of that approach in our Java application. So far so good. Thanks again. On Tuesday 19 February 2008 09:15:54 Niels Nes wrote:
On Mon, Feb 18, 2008 at 11:24:59AM -0500, McKennirey.Matthew wrote:
Another alternative we have considered is to store the UUID, split into two columns of type BIGINT (UUID1 holding the 64 most significant bits of the UUID, and UUID2 holding the least 64 bits) and do all our queries along the lines of SELCT * WHERE testUUID = UUID1+UUID2
On a 64 bit system this still represents a 64 bit storage penalty as compared to a single column of type VARBINARY(16) but it is still a 64 bit saving as compared to storing the UUID as a 32 byte string (separators removed)
Alternative 1. VARBINARY(16) storage = 64bit OID + 128bit value = total 172 bits Alternative 2. BIGINT x 2 storage = 64bit OID + 64bit value + 64bit OID + 64bit value = total 256 bits Alternative 3. VARCHAR(32) storage = 64bit OID + 256bit value = total 320 bits
You assume MonetDB requires a full 64 bit oid head column. This is not the case, we only store the first oid and count. Only when we access (through selects etc) these columns we need to materialize these oids. So your second alternative also only costs 128 bits (like your first).
There is afcourse the solution to create your own c-implemenation for 128 bit numbers. See for an example the inet module in MonetDB5/src/modules/atoms/inet.mx
Niels
Would there be a significant performance penalty to queries on UUID1+UUID2 as compared to VARCHAR(32) ?
Thanks
On Sunday 17 February 2008 10:16:49 McKennirey.Matthew wrote:
In our application we use a UUID (128 bits) to identify objects. Many of our columns are simply storing these UUIDs.
At the moment our only alternative seems to be to cast these 16 byte binary strings as character strings, increasing the size to 36 bytes (or 32 bytes if we remove the '-' separators). This has obvious drawbacks in terms of size and will often mean the difference between being able to read the column into memory or not with the obvious peformance degradation.
Ideally we would like to define a column type of VARBINARY(16) (using a MySQL datatype example, actual datatype names vary across database technologies, Oracle=RAW or VARBINARY, Postgres=BIT or BYTEA, etc.). Operations on such a column would treat the content simply as a sequence of bits, not interpreting them as characters, integers, etc.
Has anyone attempted something like this with MonetDB, or are we missing some obvious solution?
Thanks very much.
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
McKennirey.Matthew
-
Niels Nes