if your bitstring is represented using '0' & '1' characters, you for have to convert them into a "real" bit-string represented as BIGINT number
either before loading them into MonetDB (preferred),
or you first load them into monetdb as strings and then convert them inside MonetDB
(using your own to-be-added UDF).
e.g. 1010000000000110100101110101010 in binary system is 1342393258 in decimal system
Stefan
----- On Mar 11, 2016, at 10:40 AM, Shmagi Kavtaradze kavtaradze.s@gmail.com wrote:
> Thanks a lot for the answer. I have a question. Even if I create column as a
> BIGINT, how will it make sure that imported data
> "1010000000000110100101110101010" (lets say 50 long 1s and 0s) is a bit string,
> not a number. What I want to say BIGINT has maximum value of "
> 9223372036854775807 " and the imported data "1010000000000110100101110101010"
> is more than bigint value, if we consider it as a number, not a bit string. I
> want to chunk 10,000 long 1s and 0s into 200 chunks of size 50, and import them
> into Monetdb. Lets say table (Doc,Sentenceid,Chunkid, Chunk).
>
> On Fri, Mar 11, 2016 at 8:26 AM, Stefan Manegold < Stefan.Manegold@cwi.nl >
> wrote:
>
>
> Hi Shmagi,
>
> now that we understand a bit better what you want
> --- storing and bitwise-anding bit-strings of 10,000 bit each ---
> the only (existing) types in MonetDB that support that directly
> would (indeed) be either
>
> (1) TEXT | STRING | CLOB | CHARACTER LARGE OBJECT string with unbounded length
>
> or
>
> (2) BLOB | BINARY LARGE OBJECT bytes with unbounded length
>
> cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes
>
> In the first case, each bit (w|c)ould be represented by (i) a character '0' or
> '1',
> i.e., taking a whole byte of storage --- unless you have (or create) a textual
> (CSV) representation that (ii) packs bits into valid UTF-8 characters.
>
> In the second case, each bit (w|c|s)ould use only one bit of storage.
>
> In either case, you'd have to implement your own bit-wise bit-string and()
> operation
> as well as count_set_bits() operation as UDFs in C or Pyhton as Roberto
> suggested.
>
>
> Alternatively, you could also consider implementing your own variable- or
> fixed-length
> bit-string type, but than you're largely on your own and the main (if not only)
> documentation
> for this is the code itself; cf.,
> https://www.monetdb.org/Documentation/Manuals/SQLreference/Userdefinedtypes
> https://www.monetdb.org/Documentation/Manuals/MonetDB/MAL/Types
>
>
> Finally, you might consider representing your 10,000 bit bit-string as 159
> 64-bit BIGINT
> columns or 79 128-bit HUGEINT columns (if supported on your system; cf.
> https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes );
> note though that each column could effectively only hold 63 (127) bits instead
> of 64 (128)
> as MonetDB (or SQL in general) only supports signed types and MonetDB uses one
> value of the
> range (the smallest representable value) as NULL value; thus, bitwise and()
> (bit_and())
> would work "unexpectedly" if the highest bit is set.
> While this would give you bit-wise and() (called bit_and()) "for free" (at least
> per column),
> you'd need to design your queries to combine all columns, and still implement a
> UDF to count
> the bits set.
>
>
> Hope this helps you further.
>
> Best,
> Stefan
>
>
> ----- On Mar 10, 2016, at 11:31 PM, Roberto Cornacchia
> roberto.cornacchia@gmail.com wrote:
>
>> It can't indeed. You can forget about conversion to integers.
>> Your only option is to keep them as strings and develop special bitwise UDFs
>> that work directly on strings. The starting points for developing your UDFs are
>> still the same.
>> On 10 Mar 2016 22:53, "Shmagi Kavtaradze" < kavtaradze.s@gmail.com > wrote:
>>
>>
>>
>> Thanks for the advice, I will try to do it. Sorry in advance, maybe I don't
>> understand easy stuff, but if I have int of 10,000 long 1s and 0s, how can it
>> fit to Monetdb, if BIGINT is " BIGINT 64 bit signed integer between
>> -9223372036854775807 and 9223372036854775807 "
>>
>> On Thu, Mar 10, 2016 at 10:40 PM, Roberto Cornacchia <
>> roberto.cornacchia@gmail.com > wrote:
>>
>>
>>
>> If you really want to load strings into MonetDB and convert them to integers on
>> the fly in your query before doing your bitwise operations, you can have fun
>> with implementing your own function for this (both ways, from string2int and
>> int2string).
>>
>> One way is to implement it in C, which is trivial, and then make it become a UDF
>> in MonetDB (less trivial but instructive):
>> https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFunction
>> .
>> Another way is to enable the python integration (
>> https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb ) and then write your
>> UDF in python without any need to recompile. The UDF in python would be a
>> one-line: int('1001001',2)
>> Good luck.
>>
>>
>> On 10 March 2016 at 22:29, Sjoerd Mullender < sjoerd@monetdb.org > wrote:
>>
>>
>> Text is about the worst type you could have chosen. Far better is to
>> choose one of the integer types: tinyint, smallint, integer, bigint,
>> depending on the maximum number of bits that you have. If you have a
>> CSV file and the column contains just sequences of 0 and 1, you will
>> need to convert those number to e.g. hexadecimal notation (i.e.
>> something like 0x42ab). Once you have loaded the data as integers, you
>> can use the bitwise operators. & is bitwise AND, | is bitwise OR.
>>
>> On 03/10/2016 10:01 PM, Shmagi Kavtaradze wrote:
>>> I am new to Monetdb. I am using Postgresql mainly, but want to check
>>> Monetdb performace. In Postres I have column of type bit() filled with
>>> 0s and 1s. Then I am comparing each row to all other rows with bitwise
>>> AND on that column. Monetdb does not have bit() type so I used text. Any
>>> ideas how to do bitwise AND in monetdb and what type of column should I
>>> use for this? The query I tried:
>>>
>>> select a.sentenecid, b.sentenecid, a.sentence AND b.sentence from test
>>> a, test b;
>>>
>>>
>>> _______________________________________________
>>> users-list mailing list
>>> users-list@monetdb.org
>>> https://www.monetdb.org/mailman/listinfo/users-list
>>>
>>
>> --
>> Sjoerd Mullender
>>
>>
>> _______________________________________________
>> users-list mailing list
>> users-list@monetdb.org
>> https://www.monetdb.org/mailman/listinfo/users-list
>>
>>
>>
>> _______________________________________________
>> users-list mailing list
>> users-list@monetdb.org
>> https://www.monetdb.org/mailman/listinfo/users-list
>>
>>
>>
>> _______________________________________________
>> users-list mailing list
>> users-list@monetdb.org
>> https://www.monetdb.org/mailman/listinfo/users-list
>>
>>
>> _______________________________________________
>> users-list mailing list
>> users-list@monetdb.org
>> https://www.monetdb.org/mailman/listinfo/users-list
>
> --
>| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
>| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
>| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list