On Thu, Jun 04, 2015 at 11:46:08AM +0200, Martin Kersten wrote:
> On 04/06/15 11:36, Roberto Cornacchia wrote:
> >Hi Niels,
> >
> >I have tried this in default and indeed it does work like a charm.
> >(my UTF8tokenize UDF takes two values and outputs a 3-column table)
> >
> >I noticed, though that it results in a MAL loop:
> >
> >| barrier (X_72,X_73) := iterator.new(X_8);
> >| X_75 := algebra.fetch(X_11,X_72);
> >| X_77 := algebra.fetch(X_14,X_72);
> >| (X_79,X_80,X_81) := str.UTF8tokenize(X_73,X_75,X_77);
> >| bat.append(X_64,X_79);
> >| bat.append(X_67,X_80);
> >| bat.append(X_69,X_81);
> >| redo (X_72,X_73) := iterator.next(X_8);
> >| exit (X_72,X_73);
> >
> >This of course is not going to be efficient.
> >What if I write the bulk version of this function? Would that work?
> In general, yes. If a bulk version exist, this code would not be generated.
>
> str.UTF8tokenize(X_73:bat[:oid,:str],X_75:bat[:oid,:str],X_77:bat[:oid,:str]):bat[:oid,:str]
batstr.UTF8tokenize(X_73:bat[:oid,:str],X_75:bat[:oid,:str],X_77:bat[:oid,:str]):bat[:oid,:str]
Niels
room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl>
> >And if it does, would it then also work in Oct2014, as it would no longer need the "union" trick?
> >
> >Roberto
> >
> >
> >On 11 April 2015 at 14:06, Niels Nes <Niels.Nes@cwi.nl <mailto:Niels.Nes@cwi.nl>> wrote:
> >
> > On Sat, Apr 11, 2015 at 11:03:22AM +0200, Roberto Cornacchia wrote:
> > > Hi there,
> > >
> > > I need a string tokenizer in MonetDB.
> > > The problem I have is not with the function itself, but with the fact
> > > that this is a 1 to N rows function.
> > >
> > > Implementing this for a single string value is easy enough, using a
> > > table function that takes a string a returns a table:
> > >
> > > create function tokenize(s string)
> > > returns table (token string)
> > > external name tokenize;
> > >
> > > select *
> > > from tokenize("one two three");
> > >
> > > That's fine.
> > > The issue I'm having is with extending this to a column of strings.
> > >
> > > Ideally, given a string column
> > >
> > > one two three
> > > four five six
> > > seven eight
> > >
> > > I'd like to get an output along these lines (simplistic representation
> > > here):
> > >
> > > one two three | one
> > > one two three | two
> > > one two three | three
> > > four five six | four
> > > four five six | five
> > > four five six | six
> > > seven eight | seven
> > > seven eight | eight
> > >
> > >
> > > I can sure code the c function and the mal wrapper to implement this,
> > > but I can't see how to map it to SQL, given that table functions don't
> > > accept identifiers as parameters.
> > >
> > > Any idea? Any possible workaround?
> > In default you should be able to call tokenize on a column.
> > It will output the 'union' of all per row calls.
> > If you would like the 2 column output, you should take care of
> > this in your tokenize function, ie return both input and token.
> >
> > Niels
> > > Thanks, Roberto
> > >
> >
> > > _______________________________________________
> > > users-list mailing list
> > > users-list@monetdb.org <mailto:users-list@monetdb.org>
> > > https://www.monetdb.org/mailman/listinfo/users-list
> >
> >
> > --
> > Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI)
> > Science Park 123, 1098 XG Amsterdam, The Netherlands
> > room L3.14, phone ++31 20 592-4098 <tel:%2B%2B31%2020%20592-4098> sip:4098@sip.cwi.nl <mailto:sip%3A4098@sip.cwi.nl>
> > url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl <mailto:Niels.Nes@cwi.nl>
> >
> > _______________________________________________
> > users-list mailing list
> > users-list@monetdb.org <mailto:users-list@monetdb.org>
> > https://www.monetdb.org/mailman/listinfo/users-list
> >
> >
> >
> >
> >_______________________________________________
> >users-list mailing list
> >users-list@monetdb.org
> >https://www.monetdb.org/mailman/listinfo/users-list
> >
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
--
Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI)
Science Park 123, 1098 XG Amsterdam, The Netherlands
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list