On Thu, Jun 04, 2015 at 11:46:08AM +0200, Martin Kersten wrote:
On 04/06/15 11:36, Roberto Cornacchia wrote:
Hi Niels,
I have tried this in default and indeed it does work like a charm. (my UTF8tokenize UDF takes two values and outputs a 3-column table)
I noticed, though that it results in a MAL loop:
| barrier (X_72,X_73) := iterator.new(X_8); | X_75 := algebra.fetch(X_11,X_72); | X_77 := algebra.fetch(X_14,X_72); | (X_79,X_80,X_81) := str.UTF8tokenize(X_73,X_75,X_77); | bat.append(X_64,X_79); | bat.append(X_67,X_80); | bat.append(X_69,X_81); | redo (X_72,X_73) := iterator.next(X_8); | exit (X_72,X_73);
This of course is not going to be efficient. What if I write the bulk version of this function? Would that work? In general, yes. If a bulk version exist, this code would not be generated.
str.UTF8tokenize(X_73:bat[:oid,:str],X_75:bat[:oid,:str],X_77:bat[:oid,:str]):bat[:oid,:str] batstr.UTF8tokenize(X_73:bat[:oid,:str],X_75:bat[:oid,:str],X_77:bat[:oid,:str]):bat[:oid,:str]
Niels
And if it does, would it then also work in Oct2014, as it would no longer need the "union" trick?
Roberto
On 11 April 2015 at 14:06, Niels Nes
mailto:Niels.Nes@cwi.nl> wrote: On Sat, Apr 11, 2015 at 11:03:22AM +0200, Roberto Cornacchia wrote: > Hi there, > > I need a string tokenizer in MonetDB. > The problem I have is not with the function itself, but with the fact > that this is a 1 to N rows function. > > Implementing this for a single string value is easy enough, using a > table function that takes a string a returns a table: > > create function tokenize(s string) > returns table (token string) > external name tokenize; > > select * > from tokenize("one two three"); > > That's fine. > The issue I'm having is with extending this to a column of strings. > > Ideally, given a string column > > one two three > four five six > seven eight > > I'd like to get an output along these lines (simplistic representation > here): > > one two three | one > one two three | two > one two three | three > four five six | four > four five six | five > four five six | six > seven eight | seven > seven eight | eight > > > I can sure code the c function and the mal wrapper to implement this, > but I can't see how to map it to SQL, given that table functions don't > accept identifiers as parameters. > > Any idea? Any possible workaround? In default you should be able to call tokenize on a column. It will output the 'union' of all per row calls. If you would like the 2 column output, you should take care of this in your tokenize function, ie return both input and token.
Niels > Thanks, Roberto >
> _______________________________________________ > users-list mailing list > users-list@monetdb.org mailto:users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 tel:%2B%2B31%2020%20592-4098 sip:4098@sip.cwi.nl mailto:sip%3A4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl mailto:Niels.Nes@cwi.nl
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl