Re: [Monetdb-developers] [Monetdb-checkins] MonetDB5/src/modules/mal pcre.mx, , 1.86, 1.87
On Sat, Aug 01, 2009 at 03:45:42PM +0000, Fabian wrote:
Update of /cvsroot/monetdb/MonetDB5/src/modules/mal In directory 23jxhf1.ch3.sourceforge.com:/tmp/cvs-serv22620
Modified Files: pcre.mx Log Message: Always use PCRE for case insensitive matches, even though they don't use any wildcards. As far as I know there are no case insensitive versions of BAT*select, so fixing that up would be more efforts than just going through PCRE.
Once / in case that turns[C out to be(come) a performance bottleneck, we could probably add a BAT(u)iselect (for string-tailed BATs, only) that is based on strcasecmp(3) with not too much efford ... Stefan
U pcre.mx Index: pcre.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB5/src/modules/mal/pcre.mx,v retrieving revision 1.86 retrieving revision 1.87 diff -u -d -r1.86 -r1.87 --- pcre.mx 1 Aug 2009 14:43:08 -0000 1.86 +++ pcre.mx 1 Aug 2009 15:45:40 -0000 1.87 @@ -1267,20 +1267,36 @@
if (!r) { if (strcmp(ppat, (char*)str_nil) == 0) { - BAT *bp = BATdescriptor(*b); - BAT *res = NULL; - - if (bp == NULL) - throw(MAL, "pcre.like", OPERATION_FAILED); - if (us) - res = BATuselect(bp, *pat, *pat); - else - res = BATselect(bp, *pat, *pat); - - *ret = res->batCacheid; - BBPkeepref(res->batCacheid); - BBPreleaseref(bp->batCacheid); - r = MAL_SUCCEED; + /* there is no pattern or escape involved, fall back to + * simple (no PCRE) match */ + /* FIXME: we have a slight problem here if we need a case + * insensitive match, so even though there is no pattern, + * just fall back to PCRE for the moment. If there is a + * case insensitive BAT*select, we should use that instead */ + if (ignore) { + GDKfree(ppat); + ppat = GDKmalloc(sizeof(char) * (strlen(*pat) + 3)); + sprintf(ppat, "^%s$", *pat); + if (us) + r = PCREuselect(ret, &ppat, b, &ignore); + else + r = PCREselect(ret, &ppat, b, &ignore); + } else { + BAT *bp = BATdescriptor(*b); + BAT *res = NULL; + + if (bp == NULL) + throw(MAL, "pcre.like", OPERATION_FAILED); /*operation?*/ + if (us) + res = BATuselect(bp, *pat, *pat); + else + res = BATselect(bp, *pat, *pat); + + *ret = res->batCacheid; + BBPkeepref(res->batCacheid); + BBPreleaseref(bp->batCacheid); + r = MAL_SUCCEED; + } } else { if (us) r = PCREuselect(ret, &ppat, b, &ignore);
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Monetdb-checkins mailing list Monetdb-checkins@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
On 01-08-2009 17:56:06 +0200, Stefan Manegold wrote:
On Sat, Aug 01, 2009 at 03:45:42PM +0000, Fabian wrote:
Modified Files: pcre.mx Log Message: Always use PCRE for case insensitive matches, even though they don't use any wildcards. As far as I know there are no case insensitive versions of BAT*select, so fixing that up would be more efforts than just going through PCRE.
Once / in case that turns[C out to be(come) a performance bottleneck, we could probably add a BAT(u)iselect (for string-tailed BATs, only) that is based on strcasecmp(3) with not too much efford ...
True, I don't expect it to be a very complicated task to achieve either. The only case for which we now do PCRE matching instead of BAT*select now is the case where you do this: SELECT col FROM table WHERE col ILIKE 'myStrInG';
On Sat, 1 Aug 2009, Stefan Manegold wrote:
Once / in case that turns[C out to be(come) a performance bottleneck, we could probably add a BAT(u)iselect (for string-tailed BATs, only) that is based on strcasecmp(3) with not too much efford ...
Is that UTF8 compatible? Stefan
On Sat, Aug 01, 2009 at 07:13:18PM +0200, Stefan de Konink wrote:
On Sat, 1 Aug 2009, Stefan Manegold wrote:
Once / in case that turns[C out to be(come) a performance bottleneck, we could probably add a BAT(u)iselect (for string-tailed BATs, only) that is based on strcasecmp(3) with not too much efford ...
Is that UTF8 compatible?
as much or little as srcmp(3), I suppose, which we use for normal BAT(u)select, and hence for non-LIKE string predicates (i.e., normal equality and inequality) Stefan
Stefan
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Op 2 aug 2009 om 01:01 heeft Stefan Manegold
On Sat, Aug 01, 2009 at 07:13:18PM +0200, Stefan de Konink wrote:
On Sat, 1 Aug 2009, Stefan Manegold wrote:
Once / in case that turns[C out to be(come) a performance bottleneck, we could probably add a BAT(u)iselect (for string-tailed BATs, only) that is based on strcasecmp(3) with not too much efford ...
Is that UTF8 compatible?
as much or little as srcmp(3), I suppose, which we use for normal BAT(u)select, and hence for non-LIKE string predicates (i.e., normal equality and inequality)
Ok, the alternative I suggested to Fabian was to use our lookup table if a char doesnt matches. The minor detail there is endianess. PCRE might be faster anyway, strcasencmp the n is for only one char; trivial to test I guess.
Stefan
Stefan
--- --- --- --------------------------------------------------------------------- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
participants (3)
-
Fabian Groffen
-
Stefan de Konink
-
Stefan Manegold