Hi Shmagi,
now that we understand a bit better what you want
--- storing and bitwise-anding bit-strings of 10,000 bit each ---
the only (existing) types in MonetDB that support that directly
would (indeed) be either
(1) TEXT | STRING | CLOB | CHARACTER LARGE OBJECT string with unbounded length
or
(2) BLOB | BINARY LARGE OBJECT bytes with unbounded length
cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes
In the first case, each bit (w|c)ould be represented by (i) a character '0' or '1',
i.e., taking a whole byte of storage --- unless you have (or create) a textual
(CSV) representation that (ii) packs bits into valid UTF-8 characters.
In the second case, each bit (w|c|s)ould use only one bit of storage.
In either case, you'd have to implement your own bit-wise bit-string and() operation
as well as count_set_bits() operation as UDFs in C or Pyhton as Roberto suggested.
Alternatively, you could also consider implementing your own variable- or fixed-length
bit-string type, but than you're largely on your own and the main (if not only) documentation
for this is the code itself; cf.,
https://www.monetdb.org/Documentation/Manuals/SQLreference/Userdefinedtypes
https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types
Finally, you might consider representing your 10,000 bit bit-string as 159 64-bit BIGINT
columns or 79 128-bit HUGEINT columns (if supported on your system; cf.
https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes);
note though that each column could effectively only hold 63 (127) bits instead of 64 (128)
as MonetDB (or SQL in general) only supports signed types and MonetDB uses one value of the
range (the smallest representable value) as NULL value; thus, bitwise and() (bit_and())
would work "unexpectedly" if the highest bit is set.
While this would give you bit-wise and() (called bit_and()) "for free" (at least per column),
you'd need to design your queries to combine all columns, and still implement a UDF to count
the bits set.
Hope this helps you further.
Best,
Stefan
----- On Mar 10, 2016, at 11:31 PM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
> It can't indeed. You can forget about conversion to integers.
> Your only option is to keep them as strings and develop special bitwise UDFs
> that work directly on strings. The starting points for developing your UDFs are
> still the same.
> On 10 Mar 2016 22:53, "Shmagi Kavtaradze" < kavtaradze.s@gmail.com > wrote:
>
>
>
> Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't
> understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it
> fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between
> -9223372036854775807 and 9223372036854775807 "
>
> On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia <
> roberto.cornacchia@gmail.com > wrote:
>
>
>
> If you really want to load strings into MonetDB and convert them to integers on
> the fly in your query before doing your bitwise operations, you can have fun
> with implementing your own function for this (both ways, from string2int and
> int2string).
>
> One way is to implement it in C, which is trivial, and then make it become a UDF
> in MonetDB (less trivial but instructive):
> https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFunction
> .
> Another way is to enable the python integration (
> https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb ) and then write your
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |> UDF in python without any need to recompile. The UDF in python would be a
> one-line: int('1001001',2)
> Good luck.
>
>
> On 10 March 2016 at 22:29, Sjoerd Mullender < sjoerd@monetdb.org > wrote:
>
>
> Text is about the worst type you could have chosen. Far better is to
> choose one of the integer types: tinyint, smallint, integer, bigint,
> depending on the maximum number of bits that you have. If you have a
> CSV file and the column contains just sequences of 0 and 1, you will
> need to convert those number to e.g. hexadecimal notation (i.e.
> something like 0x42ab). Once you have loaded the data as integers, you
> can use the bitwise operators. & is bitwise AND, | is bitwise OR.
>
> On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
>> I am new to Monetdb. I am using Postgresql mainly, but want to check
>> Monetdb performace. In Postres I have column of type bit() filled with
>> 0s and 1s. Then I am comparing each row to all other rows with bitwise
>> AND on that column. Monetdb does not have bit() type so I used text. Any
>> ideas how to do bitwise AND in monetdb and what type of column should I
>> use for this? The query I tried:
>>
>> select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test
>> a, test b;
>>
>>
>> _______________________________________________
>> users-list mailing list
>> users-list@monetdb.org
>> https://www.monetdb.org/mailman/listinfo/users-list
>>
>
> --
> Sjoerd Mullender
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
--
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list