Hi Niels,
I have tried this in default and indeed it does work like a charm.
(my UTF8tokenize UDF takes two values and outputs a 3-column table)
I noticed, though that it results in a MAL loop:
| barrier (X_72,X_73) := iterator.new(X_8);
| X_75 := algebra.fetch(X_11,X_72);
| X_77 := algebra.fetch(X_14,X_72);
| (X_79,X_80,X_81) := str.UTF8tokenize(X_73,X_75,X_77);
| bat.append(X_64,X_79);
| bat.append(X_67,X_80);
| bat.append(X_69,X_81);
| redo (X_72,X_73) := iterator.next(X_8);
| exit (X_72,X_73);
This of course is not going to be efficient.
What if I write the bulk version of this function? Would that work?
And if it does, would it then also work in Oct2014, as it would no longer
need the "union" trick?
Roberto
On 11 April 2015 at 14:06, Niels Nes
Hi there,
I need a string tokenizer in MonetDB. The problem I have is not with the function itself, but with the fact that this is a 1 to N rows function.
Implementing this for a single string value is easy enough, using a table function that takes a string a returns a table:
create function tokenize(s string) returns table (token string) external name tokenize;
select * from tokenize("one two three");
That's fine. The issue I'm having is with extending this to a column of strings.
Ideally, given a string column
one two three four five six seven eight
I'd like to get an output along these lines (simplistic representation here):
one two three | one one two three | two one two three | three four five six | four four five six | five four five six | six seven eight | seven seven eight | eight
I can sure code the c function and the mal wrapper to implement this, but I can't see how to map it to SQL, given that table functions don't accept identifiers as parameters.
Any idea? Any possible workaround? In default you should be able to call tokenize on a column. It will output the 'union' of all per row calls. If you would like the 2 column output, you should take care of
On Sat, Apr 11, 2015 at 11:03:22AM +0200, Roberto Cornacchia wrote: this in your tokenize function, ie return both input and token.
Niels
Thanks, Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list