I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried: select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR. On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
I have 10,000 long string with 1s and 0s. As I know Int does not support
that long numbers. Any ideas how I can do it? Actually I want to make
bitwise and, then count number of 1s and divide by length, in this case by
10,000
On Thu, Mar 10, 2016 at 10:29 PM, Sjoerd Mullender
Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
If you really want to load strings into MonetDB and convert them to
integers on the fly in your query before doing your bitwise operations, you
can have fun with implementing your own function for this (both ways, from
string2int and int2string).
One way is to implement it in C, which is trivial, and then make it become
a UDF in MonetDB (less trivial but instructive):
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti...
.
Another way is to enable the python integration (
https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb) and then write
your UDF in python without any need to recompile. The UDF in python would
be a one-line: int('1001001',2)
Good luck.
On 10 March 2016 at 22:29, Sjoerd Mullender
Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between -9223372036854775807 and 9223372036854775807" On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
If you really want to load strings into MonetDB and convert them to integers on the fly in your query before doing your bitwise operations, you can have fun with implementing your own function for this (both ways, from string2int and int2string).
One way is to implement it in C, which is trivial, and then make it become a UDF in MonetDB (less trivial but instructive): https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti... . Another way is to enable the python integration ( https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb) and then write your UDF in python without any need to recompile. The UDF in python would be a one-line: int('1001001',2) Good luck.
On 10 March 2016 at 22:29, Sjoerd Mullender
wrote: Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
It can't indeed. You can forget about conversion to integers.
Your only option is to keep them as strings and develop special bitwise
UDFs that work directly on strings. The starting points for developing your
UDFs are still the same.
On 10 Mar 2016 22:53, "Shmagi Kavtaradze"
Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between -9223372036854775807 and 9223372036854775807"
On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
If you really want to load strings into MonetDB and convert them to integers on the fly in your query before doing your bitwise operations, you can have fun with implementing your own function for this (both ways, from string2int and int2string).
One way is to implement it in C, which is trivial, and then make it become a UDF in MonetDB (less trivial but instructive): https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti... . Another way is to enable the python integration ( https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb) and then write your UDF in python without any need to recompile. The UDF in python would be a one-line: int('1001001',2) Good luck.
On 10 March 2016 at 22:29, Sjoerd Mullender
wrote: Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Shmagi, now that we understand a bit better what you want --- storing and bitwise-anding bit-strings of 10,000 bit each --- the only (existing) types in MonetDB that support that directly would (indeed) be either (1) TEXT | STRING | CLOB | CHARACTER LARGE OBJECT string with unbounded length or (2) BLOB | BINARY LARGE OBJECT bytes with unbounded length cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes In the first case, each bit (w|c)ould be represented by (i) a character '0' or '1', i.e., taking a whole byte of storage --- unless you have (or create) a textual (CSV) representation that (ii) packs bits into valid UTF-8 characters. In the second case, each bit (w|c|s)ould use only one bit of storage. In either case, you'd have to implement your own bit-wise bit-string and() operation as well as count_set_bits() operation as UDFs in C or Pyhton as Roberto suggested. Alternatively, you could also consider implementing your own variable- or fixed-length bit-string type, but than you're largely on your own and the main (if not only) documentation for this is the code itself; cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/Userdefinedtypes https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types Finally, you might consider representing your 10,000 bit bit-string as 159 64-bit BIGINT columns or 79 128-bit HUGEINT columns (if supported on your system; cf. https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes); note though that each column could effectively only hold 63 (127) bits instead of 64 (128) as MonetDB (or SQL in general) only supports signed types and MonetDB uses one value of the range (the smallest representable value) as NULL value; thus, bitwise and() (bit_and()) would work "unexpectedly" if the highest bit is set. While this would give you bit-wise and() (called bit_and()) "for free" (at least per column), you'd need to design your queries to combine all columns, and still implement a UDF to count the bits set. Hope this helps you further. Best, Stefan ----- On Mar 10, 2016, at 11:31 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
It can't indeed. You can forget about conversion to integers. Your only option is to keep them as strings and develop special bitwise UDFs that work directly on strings. The starting points for developing your UDFs are still the same. On 10 Mar 2016 22:53, "Shmagi Kavtaradze" < kavtaradze.s@gmail.com > wrote:
Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between -9223372036854775807 and 9223372036854775807 "
On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
If you really want to load strings into MonetDB and convert them to integers on the fly in your query before doing your bitwise operations, you can have fun with implementing your own function for this (both ways, from string2int and int2string).
One way is to implement it in C, which is trivial, and then make it become a UDF in MonetDB (less trivial but instructive): https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti... . Another way is to enable the python integration ( https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb ) and then write your UDF in python without any need to recompile. The UDF in python would be a one-line: int('1001001',2) Good luck.
On 10 March 2016 at 22:29, Sjoerd Mullender < sjoerd@monetdb.org > wrote:
Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Thanks a lot for the answer. I have a question. Even if I create column as
a BIGINT, how will it make sure that imported data
"1010000000000110100101110101010" (lets say 50 long 1s and 0s) is a bit
string, not a number. What I want to say BIGINT has maximum value of "
9223372036854775807" and the imported
data "1010000000000110100101110101010" is more than bigint value, if we
consider it as a number, not a bit string. I want to chunk 10,000 long 1s
and 0s into 200 chunks of size 50, and import them into Monetdb. Lets say
table (Doc,Sentenceid,Chunkid, Chunk).
On Fri, Mar 11, 2016 at 8:26 AM, Stefan Manegold
Hi Shmagi,
now that we understand a bit better what you want --- storing and bitwise-anding bit-strings of 10,000 bit each --- the only (existing) types in MonetDB that support that directly would (indeed) be either
(1) TEXT | STRING | CLOB | CHARACTER LARGE OBJECT string with unbounded length
or
(2) BLOB | BINARY LARGE OBJECT bytes with unbounded length
cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes
In the first case, each bit (w|c)ould be represented by (i) a character '0' or '1', i.e., taking a whole byte of storage --- unless you have (or create) a textual (CSV) representation that (ii) packs bits into valid UTF-8 characters.
In the second case, each bit (w|c|s)ould use only one bit of storage.
In either case, you'd have to implement your own bit-wise bit-string and() operation as well as count_set_bits() operation as UDFs in C or Pyhton as Roberto suggested.
Alternatively, you could also consider implementing your own variable- or fixed-length bit-string type, but than you're largely on your own and the main (if not only) documentation for this is the code itself; cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/Userdefinedtypes https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types
Finally, you might consider representing your 10,000 bit bit-string as 159 64-bit BIGINT columns or 79 128-bit HUGEINT columns (if supported on your system; cf. https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes); note though that each column could effectively only hold 63 (127) bits instead of 64 (128) as MonetDB (or SQL in general) only supports signed types and MonetDB uses one value of the range (the smallest representable value) as NULL value; thus, bitwise and() (bit_and()) would work "unexpectedly" if the highest bit is set. While this would give you bit-wise and() (called bit_and()) "for free" (at least per column), you'd need to design your queries to combine all columns, and still implement a UDF to count the bits set.
Hope this helps you further.
Best, Stefan
----- On Mar 10, 2016, at 11:31 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
It can't indeed. You can forget about conversion to integers. Your only option is to keep them as strings and develop special bitwise UDFs that work directly on strings. The starting points for developing your UDFs are still the same. On 10 Mar 2016 22:53, "Shmagi Kavtaradze" < kavtaradze.s@gmail.com > wrote:
Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between -9223372036854775807 and 9223372036854775807 "
On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
If you really want to load strings into MonetDB and convert them to integers on the fly in your query before doing your bitwise operations, you can have fun with implementing your own function for this (both ways, from string2int and int2string).
One way is to implement it in C, which is trivial, and then make it become a UDF in MonetDB (less trivial but instructive):
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti...
. Another way is to enable the python integration ( https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb ) and then write your UDF in python without any need to recompile. The UDF in python would be a one-line: int('1001001',2) Good luck.
On 10 March 2016 at 22:29, Sjoerd Mullender < sjoerd@monetdb.org > wrote:
Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
if your bitstring is represented using '0' & '1' characters, you for have to convert them into a "real" bit-string represented as BIGINT number either before loading them into MonetDB (preferred), or you first load them into monetdb as strings and then convert them inside MonetDB (using your own to-be-added UDF). e.g. 1010000000000110100101110101010 in binary system is 1342393258 in decimal system Stefan ----- On Mar 11, 2016, at 10:40 AM, Shmagi Kavtaradze kavtaradze.s@gmail.com wrote:
Thanks a lot for the answer. I have a question. Even if I create column as a BIGINT, how will it make sure that imported data "1010000000000110100101110101010" (lets say 50 long 1s and 0s) is a bit string, not a number. What I want to say BIGINT has maximum value of " 9223372036854775807 " and the imported data "1010000000000110100101110101010" is more than bigint value, if we consider it as a number, not a bit string. I want to chunk 10,000 long 1s and 0s into 200 chunks of size 50, and import them into Monetdb. Lets say table (Doc,Sentenceid,Chunkid, Chunk).
On Fri, Mar 11, 2016 at 8:26 AM, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:
Hi Shmagi,
now that we understand a bit better what you want --- storing and bitwise-anding bit-strings of 10,000 bit each --- the only (existing) types in MonetDB that support that directly would (indeed) be either
(1) TEXT | STRING | CLOB | CHARACTER LARGE OBJECT string with unbounded length
or
(2) BLOB | BINARY LARGE OBJECT bytes with unbounded length
cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes
In the first case, each bit (w|c)ould be represented by (i) a character '0' or '1', i.e., taking a whole byte of storage --- unless you have (or create) a textual (CSV) representation that (ii) packs bits into valid UTF-8 characters.
In the second case, each bit (w|c|s)ould use only one bit of storage.
In either case, you'd have to implement your own bit-wise bit-string and() operation as well as count_set_bits() operation as UDFs in C or Pyhton as Roberto suggested.
Alternatively, you could also consider implementing your own variable- or fixed-length bit-string type, but than you're largely on your own and the main (if not only) documentation for this is the code itself; cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/Userdefinedtypes https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types
Finally, you might consider representing your 10,000 bit bit-string as 159 64-bit BIGINT columns or 79 128-bit HUGEINT columns (if supported on your system; cf. https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes ); note though that each column could effectively only hold 63 (127) bits instead of 64 (128) as MonetDB (or SQL in general) only supports signed types and MonetDB uses one value of the range (the smallest representable value) as NULL value; thus, bitwise and() (bit_and()) would work "unexpectedly" if the highest bit is set. While this would give you bit-wise and() (called bit_and()) "for free" (at least per column), you'd need to design your queries to combine all columns, and still implement a UDF to count the bits set.
Hope this helps you further.
Best, Stefan
----- On Mar 10, 2016, at 11:31 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
It can't indeed. You can forget about conversion to integers. Your only option is to keep them as strings and develop special bitwise UDFs that work directly on strings. The starting points for developing your UDFs are still the same. On 10 Mar 2016 22:53, "Shmagi Kavtaradze" < kavtaradze.s@gmail.com > wrote:
Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between -9223372036854775807 and 9223372036854775807 "
On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
If you really want to load strings into MonetDB and convert them to integers on the fly in your query before doing your bitwise operations, you can have fun with implementing your own function for this (both ways, from string2int and int2string).
One way is to implement it in C, which is trivial, and then make it become a UDF in MonetDB (less trivial but instructive): https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti... . Another way is to enable the python integration ( https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb ) and then write your UDF in python without any need to recompile. The UDF in python would be a one-line: int('1001001',2) Good luck.
On 10 March 2016 at 22:29, Sjoerd Mullender < sjoerd@monetdb.org > wrote:
Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Thanks a lot, now I understand. How to understand that I can use HUGEINT on
a system? I am running Postgresql on a Ubuntu 12.04 server with AMD server
processor.
On Fri, Mar 11, 2016 at 11:24 AM, Stefan Manegold
if your bitstring is represented using '0' & '1' characters, you for have to convert them into a "real" bit-string represented as BIGINT number either before loading them into MonetDB (preferred), or you first load them into monetdb as strings and then convert them inside MonetDB (using your own to-be-added UDF).
e.g. 1010000000000110100101110101010 in binary system is 1342393258 in decimal system
Stefan
----- On Mar 11, 2016, at 10:40 AM, Shmagi Kavtaradze kavtaradze.s@gmail.com wrote:
Thanks a lot for the answer. I have a question. Even if I create column as a BIGINT, how will it make sure that imported data "1010000000000110100101110101010" (lets say 50 long 1s and 0s) is a bit string, not a number. What I want to say BIGINT has maximum value of " 9223372036854775807 " and the imported data "1010000000000110100101110101010" is more than bigint value, if we consider it as a number, not a bit string. I want to chunk 10,000 long 1s and 0s into 200 chunks of size 50, and import them into Monetdb. Lets say table (Doc,Sentenceid,Chunkid, Chunk).
On Fri, Mar 11, 2016 at 8:26 AM, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:
Hi Shmagi,
now that we understand a bit better what you want --- storing and bitwise-anding bit-strings of 10,000 bit each --- the only (existing) types in MonetDB that support that directly would (indeed) be either
(1) TEXT | STRING | CLOB | CHARACTER LARGE OBJECT string with unbounded length
or
(2) BLOB | BINARY LARGE OBJECT bytes with unbounded length
cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes
In the first case, each bit (w|c)ould be represented by (i) a character '0' or '1', i.e., taking a whole byte of storage --- unless you have (or create) a textual (CSV) representation that (ii) packs bits into valid UTF-8 characters.
In the second case, each bit (w|c|s)ould use only one bit of storage.
In either case, you'd have to implement your own bit-wise bit-string and() operation as well as count_set_bits() operation as UDFs in C or Pyhton as Roberto suggested.
Alternatively, you could also consider implementing your own variable- or fixed-length bit-string type, but than you're largely on your own and the main (if not only) documentation for this is the code itself; cf.,
https://www.monetdb.org/Documentation/Manuals/SQLreference/Userdefinedtypes
https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types
Finally, you might consider representing your 10,000 bit bit-string as 159 64-bit BIGINT columns or 79 128-bit HUGEINT columns (if supported on your system; cf. https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes ); note though that each column could effectively only hold 63 (127) bits instead of 64 (128) as MonetDB (or SQL in general) only supports signed types and MonetDB uses one value of the range (the smallest representable value) as NULL value; thus, bitwise and() (bit_and()) would work "unexpectedly" if the highest bit is set. While this would give you bit-wise and() (called bit_and()) "for free" (at least per column), you'd need to design your queries to combine all columns, and still implement a UDF to count the bits set.
Hope this helps you further.
Best, Stefan
----- On Mar 10, 2016, at 11:31 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
It can't indeed. You can forget about conversion to integers. Your only option is to keep them as strings and develop special bitwise UDFs that work directly on strings. The starting points for developing your UDFs are still the same. On 10 Mar 2016 22:53, "Shmagi Kavtaradze" < kavtaradze.s@gmail.com > wrote:
Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between -9223372036854775807 and 9223372036854775807 "
On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
If you really want to load strings into MonetDB and convert them to integers on the fly in your query before doing your bitwise operations, you can have fun with implementing your own function for this (both ways, from string2int and int2string).
One way is to implement it in C, which is trivial, and then make it become a UDF in MonetDB (less trivial but instructive):
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti...
. Another way is to enable the python integration ( https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb ) and then write your UDF in python without any need to recompile. The UDF in python would be a one-line: int('1001001',2) Good luck.
On 10 March 2016 at 22:29, Sjoerd Mullender < sjoerd@monetdb.org > wrote:
Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Since this is the MonetDB list, I cannot say anything about PostgreSQL. For MonetDB, you can check with select * from sys.types where sqlname = 'hugeint'; +------+------------+---------+--------+-------+-------+--------+-----------+ | id | systemname | sqlname | digits | scale | radix | eclass | schema_id | +======+============+=========+========+=======+=======+========+===========+ | 13 | hge | hugeint | 128 | 1 | 2 | 7 | 0 | +------+------------+---------+--------+-------+-------+--------+-----------+ ----- On Mar 11, 2016, at 12:00 PM, Shmagi Kavtaradze kavtaradze.s@gmail.com wrote:
Thanks a lot, now I understand. How to understand that I can use HUGEINT on a system? I am running Postgresql on a Ubuntu 12.04 server with AMD server processor.
On Fri, Mar 11, 2016 at 11:24 AM, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:
if your bitstring is represented using '0' & '1' characters, you for have to convert them into a "real" bit-string represented as BIGINT number either before loading them into MonetDB (preferred), or you first load them into monetdb as strings and then convert them inside MonetDB (using your own to-be-added UDF).
e.g. 1010000000000110100101110101010 in binary system is 1342393258 in decimal system
Stefan
----- On Mar 11, 2016, at 10:40 AM, Shmagi Kavtaradze kavtaradze.s@gmail.com wrote:
Thanks a lot for the answer. I have a question. Even if I create column as a BIGINT, how will it make sure that imported data "1010000000000110100101110101010" (lets say 50 long 1s and 0s) is a bit string, not a number. What I want to say BIGINT has maximum value of " 9223372036854775807 " and the imported data "1010000000000110100101110101010" is more than bigint value, if we consider it as a number, not a bit string. I want to chunk 10,000 long 1s and 0s into 200 chunks of size 50, and import them into Monetdb. Lets say table (Doc,Sentenceid,Chunkid, Chunk).
On Fri, Mar 11, 2016 at 8:26 AM, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:
Hi Shmagi,
now that we understand a bit better what you want --- storing and bitwise-anding bit-strings of 10,000 bit each --- the only (existing) types in MonetDB that support that directly would (indeed) be either
(1) TEXT | STRING | CLOB | CHARACTER LARGE OBJECT string with unbounded length
or
(2) BLOB | BINARY LARGE OBJECT bytes with unbounded length
cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes
In the first case, each bit (w|c)ould be represented by (i) a character '0' or '1', i.e., taking a whole byte of storage --- unless you have (or create) a textual (CSV) representation that (ii) packs bits into valid UTF-8 characters.
In the second case, each bit (w|c|s)ould use only one bit of storage.
In either case, you'd have to implement your own bit-wise bit-string and() operation as well as count_set_bits() operation as UDFs in C or Pyhton as Roberto suggested.
Alternatively, you could also consider implementing your own variable- or fixed-length bit-string type, but than you're largely on your own and the main (if not only) documentation for this is the code itself; cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/Userdefinedtypes https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types
Finally, you might consider representing your 10,000 bit bit-string as 159 64-bit BIGINT columns or 79 128-bit HUGEINT columns (if supported on your system; cf. https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes ); note though that each column could effectively only hold 63 (127) bits instead of 64 (128) as MonetDB (or SQL in general) only supports signed types and MonetDB uses one value of the range (the smallest representable value) as NULL value; thus, bitwise and() (bit_and()) would work "unexpectedly" if the highest bit is set. While this would give you bit-wise and() (called bit_and()) "for free" (at least per column), you'd need to design your queries to combine all columns, and still implement a UDF to count the bits set.
Hope this helps you further.
Best, Stefan
----- On Mar 10, 2016, at 11:31 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
It can't indeed. You can forget about conversion to integers. Your only option is to keep them as strings and develop special bitwise UDFs that work directly on strings. The starting points for developing your UDFs are still the same. On 10 Mar 2016 22:53, "Shmagi Kavtaradze" < kavtaradze.s@gmail.com > wrote:
Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between -9223372036854775807 and 9223372036854775807 "
On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
If you really want to load strings into MonetDB and convert them to integers on the fly in your query before doing your bitwise operations, you can have fun with implementing your own function for this (both ways, from string2int and int2string).
One way is to implement it in C, which is trivial, and then make it become a UDF in MonetDB (less trivial but instructive): https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti... . Another way is to enable the python integration ( https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb ) and then write your UDF in python without any need to recompile. The UDF in python would be a one-line: int('1001001',2) Good luck.
On 10 March 2016 at 22:29, Sjoerd Mullender < sjoerd@monetdb.org > wrote:
Text is about the worst type you could have chosen. Far better is to choose one of the integer types: tinyint, smallint, integer, bigint, depending on the maximum number of bits that you have. If you have a CSV file and the column contains just sequences of 0 and 1, you will need to convert those number to e.g. hexadecimal notation (i.e. something like 0x42ab). Once you have loaded the data as integers, you can use the bitwise operators. & is bitwise AND, | is bitwise OR.
On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
I am new to Monetdb. I am using Postgresql mainly, but want to check Monetdb performace. In Postres I have column of type bit() filled with 0s and 1s. Then I am comparing each row to all other rows with bitwise AND on that column. Monetdb does not have bit() type so I used text. Any ideas how to do bitwise AND in monetdb and what type of column should I use for this? The query I tried:
select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test a, test b;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
yes I am asking for Monetdb. Also I was trying to run query. After 3-4 queries it filled up my disk with 90GB in dbfarm "bat" directory. How can I drop that intermediate results? I do not have a lot of space.
participants (4)
-
Roberto Cornacchia
-
Shmagi Kavtaradze
-
Sjoerd Mullender
-
Stefan Manegold