PCRE: references to captures in replace string
After declaring a wrapper for pcre.replace: sql>create function pcre_replace(s string, pattern string, repl string, flags string) returns string external name pcre."replace"; operation successful (6.187ms) I cannot get replacements with references to captures work (but the code in pcre.c seems to support it): sql>select pcre_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\1 \\2', ''); +---------------------------+ | pcre_replace_single_value | +===========================+ | \1 \2 | +---------------------------+ 1 tuple (0.309ms) The pattern is matched, but the back references are replaced literally. Am I not getting the syntax correct?
From http://www.pcre.org/original/doc/html/pcresyntax.html I see I could use any of \1, \g1, \g{1] for back references, but none seem to work.
Roberto
I found this thread from October 2015 where Sjoerd concludes that back
references replacement is not implemented in the MonetDB wrapper:
https://www.monetdb.org/pipermail/users-list/2015-October/008492.html
Probably that is still the case?
On 20 April 2016 at 16:35, Roberto Cornacchia
After declaring a wrapper for pcre.replace:
sql>create function pcre_replace(s string, pattern string, repl string, flags string) returns string external name pcre."replace"; operation successful (6.187ms)
I cannot get replacements with references to captures work (but the code in pcre.c seems to support it):
sql>select pcre_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\1 \\2', ''); +---------------------------+ | pcre_replace_single_value | +===========================+ | \1 \2 | +---------------------------+ 1 tuple (0.309ms)
The pattern is matched, but the back references are replaced literally. Am I not getting the syntax correct? From http://www.pcre.org/original/doc/html/pcresyntax.html I see I could use any of \1, \g1, \g{1] for back references, but none seem to work.
Roberto
I still find this to be an interested feature to add, as its very useful in string date manipulation. On Tue, Apr 26, 2016 at 5:28 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I found this thread from October 2015 where Sjoerd concludes that back references replacement is not implemented in the MonetDB wrapper: https://www.monetdb.org/pipermail/users-list/2015-October/008492.html
Probably that is still the case?
On 20 April 2016 at 16:35, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
After declaring a wrapper for pcre.replace:
sql>create function pcre_replace(s string, pattern string, repl string, flags string) returns string external name pcre."replace"; operation successful (6.187ms)
I cannot get replacements with references to captures work (but the code in pcre.c seems to support it):
sql>select pcre_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\1 \\2', ''); +---------------------------+ | pcre_replace_single_value | +===========================+ | \1 \2 | +---------------------------+ 1 tuple (0.309ms)
The pattern is matched, but the back references are replaced literally. Am I not getting the syntax correct? From http://www.pcre.org/original/doc/html/pcresyntax.html I see I could use any of \1, \g1, \g{1] for back references, but none seem to work.
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
An interesting workaround for now could be to implement a simple UDF in
python, to let python perform the regex processing:
https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb
Unfortunately this is for now only available in a separate branch which is
kept in sync with default. For me this is no option because I need to rely
on a stable MonetDB version. But perhaps for other it is an option.
On 26 April 2016 at 16:57, imad hajj chahine
I still find this to be an interested feature to add, as its very useful in string date manipulation.
On Tue, Apr 26, 2016 at 5:28 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I found this thread from October 2015 where Sjoerd concludes that back references replacement is not implemented in the MonetDB wrapper: https://www.monetdb.org/pipermail/users-list/2015-October/008492.html
Probably that is still the case?
On 20 April 2016 at 16:35, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
After declaring a wrapper for pcre.replace:
sql>create function pcre_replace(s string, pattern string, repl string, flags string) returns string external name pcre."replace"; operation successful (6.187ms)
I cannot get replacements with references to captures work (but the code in pcre.c seems to support it):
sql>select pcre_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\1 \\2', ''); +---------------------------+ | pcre_replace_single_value | +===========================+ | \1 \2 | +---------------------------+ 1 tuple (0.309ms)
The pattern is matched, but the back references are replaced literally. Am I not getting the syntax correct? From http://www.pcre.org/original/doc/html/pcresyntax.html I see I could use any of \1, \g1, \g{1] for back references, but none seem to work.
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Another option is to use R. This is available in Jul2015 branch, and is
very easy (https://www.monetdb.org/content/embedded-r-monetdb)
CREATE FUNCTION r_replace(s STRING, p STRING, r STRING) RETURNS STRING
LANGUAGE R {
sub(p,r,s)
};
sql>select r_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\2
\\1');
+------------------------+
| r_replace_single_value |
+========================+
| Pear Apple |
+------------------------+
It works out of the box on columns:
select r_replace(value, '^.*(\d).*$','\\1%\\2') from obj_string limit 100;
And luckily without loops:
X_10:bat[:oid,:str] := batrapi.eval(nil,"{\n sub(p,r,s)\n};",X_9,A0,A1);
Still not very fast - I've just tried on 11M strings, 45s - probably
because strings are fully copied, but for now this seems the easiest
workaround.
It would be great to have it working on pcre though.
Roberto
On 4 May 2016 at 10:31, Roberto Cornacchia
An interesting workaround for now could be to implement a simple UDF in python, to let python perform the regex processing: https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb
Unfortunately this is for now only available in a separate branch which is kept in sync with default. For me this is no option because I need to rely on a stable MonetDB version. But perhaps for other it is an option.
On 26 April 2016 at 16:57, imad hajj chahine
wrote: I still find this to be an interested feature to add, as its very useful in string date manipulation.
On Tue, Apr 26, 2016 at 5:28 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I found this thread from October 2015 where Sjoerd concludes that back references replacement is not implemented in the MonetDB wrapper: https://www.monetdb.org/pipermail/users-list/2015-October/008492.html
Probably that is still the case?
On 20 April 2016 at 16:35, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
After declaring a wrapper for pcre.replace:
sql>create function pcre_replace(s string, pattern string, repl string, flags string) returns string external name pcre."replace"; operation successful (6.187ms)
I cannot get replacements with references to captures work (but the code in pcre.c seems to support it):
sql>select pcre_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\1 \\2', ''); +---------------------------+ | pcre_replace_single_value | +===========================+ | \1 \2 | +---------------------------+ 1 tuple (0.309ms)
The pattern is matched, but the back references are replaced literally. Am I not getting the syntax correct? From http://www.pcre.org/original/doc/html/pcresyntax.html I see I could use any of \1, \g1, \g{1] for back references, but none seem to work.
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
A last note about doing this with R:
I changed sub(p,r,s) into sub(p,r,s,perl=TRUE,useBytes=TRUE)
And my test on 11M strings went from45s to 10s
On 4 May 2016 at 10:57, Roberto Cornacchia
Another option is to use R. This is available in Jul2015 branch, and is very easy (https://www.monetdb.org/content/embedded-r-monetdb)
CREATE FUNCTION r_replace(s STRING, p STRING, r STRING) RETURNS STRING LANGUAGE R { sub(p,r,s) };
sql>select r_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\2 \\1'); +------------------------+ | r_replace_single_value | +========================+ | Pear Apple | +------------------------+
It works out of the box on columns:
select r_replace(value, '^.*(\d).*$','\\1%\\2') from obj_string limit 100;
And luckily without loops:
X_10:bat[:oid,:str] := batrapi.eval(nil,"{\n sub(p,r,s)\n};",X_9,A0,A1);
Still not very fast - I've just tried on 11M strings, 45s - probably because strings are fully copied, but for now this seems the easiest workaround.
It would be great to have it working on pcre though. Roberto
On 4 May 2016 at 10:31, Roberto Cornacchia
wrote: An interesting workaround for now could be to implement a simple UDF in python, to let python perform the regex processing: https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb
Unfortunately this is for now only available in a separate branch which is kept in sync with default. For me this is no option because I need to rely on a stable MonetDB version. But perhaps for other it is an option.
On 26 April 2016 at 16:57, imad hajj chahine
wrote:
I still find this to be an interested feature to add, as its very useful in string date manipulation.
On Tue, Apr 26, 2016 at 5:28 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I found this thread from October 2015 where Sjoerd concludes that back references replacement is not implemented in the MonetDB wrapper: https://www.monetdb.org/pipermail/users-list/2015-October/008492.html
Probably that is still the case?
On 20 April 2016 at 16:35, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
After declaring a wrapper for pcre.replace:
sql>create function pcre_replace(s string, pattern string, repl string, flags string) returns string external name pcre."replace"; operation successful (6.187ms)
I cannot get replacements with references to captures work (but the code in pcre.c seems to support it):
sql>select pcre_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\1 \\2', ''); +---------------------------+ | pcre_replace_single_value | +===========================+ | \1 \2 | +---------------------------+ 1 tuple (0.309ms)
The pattern is matched, but the back references are replaced literally. Am I not getting the syntax correct? From http://www.pcre.org/original/doc/html/pcresyntax.html I see I could use any of \1, \g1, \g{1] for back references, but none seem to work.
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Is it possible to construct aggregate functions using embedded R? Would the syntax be: "create aggregate function"? I need to pass in a pair of aligned vectors/columns to compute weighted percentiles.
I've held off on taking the R route for performance reasons, but I don't see how to pass multiple vectors into the new sub* aggregate implementation.
All of the aggregate functions in gdk_agg.c take a single column and I don't know if it is possible to alter the interface to pass in multiples.
Thanks,
Dave
-----Original Message-----
From: Roberto Cornacchia [roberto.cornacchia@gmail.commailto:roberto.cornacchia@gmail.com]
Sent: Wednesday, May 04, 2016 04:59 AM Eastern Standard Time
To: Communication channel for MonetDB users
Subject: Re: PCRE: references to captures in replace string
Another option is to use R. This is available in Jul2015 branch, and is very easy (https://www.monetdb.org/content/embedded-r-monetdbhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_content_embedded-2Dr-2Dmonetdb&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=CqtgVJpkJoop7G_7izH9LywiM82AR0XCNvGu0ipkFuo&s=cMjL1PD_6AwNjME5KXxpxTCzfole4bSqgnp4uolKgoY&e=)
CREATE FUNCTION r_replace(s STRING, p STRING, r STRING) RETURNS STRING LANGUAGE R {
sub(p,r,s)
};
sql>select r_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\2 \\1');
+------------------------+
| r_replace_single_value |
+========================+
| Pear Apple |
+------------------------+
It works out of the box on columns:
select r_replace(value, '^.*(\d).*$','\\1%\\2') from obj_string limit 100;
And luckily without loops:
X_10:bat[:oid,:str] := batrapi.eval(nil,"{\n sub(p,r,s)\n};",X_9,A0,A1);
Still not very fast - I've just tried on 11M strings, 45s - probably because strings are fully copied, but for now this seems the easiest workaround.
It would be great to have it working on pcre though.
Roberto
On 4 May 2016 at 10:31, Roberto Cornacchia
Dave,
A while ago I was playing with aggregates in R, this skeleton could be
helpful:
CREATE table test (customer int, d string, n int);
INSERT INTO test VALUES(1,'2015-01-01', 100);
INSERT INTO test VALUES(1, '2015-01-02', 100);
INSERT INTO test VALUES(2, '2015-01-03', 100);
INSERT INTO test VALUES(2, '2015-01-01', 100);
INSERT INTO test VALUES(2, '2015-01-02', 100);
CREATE AGGREGATE myaggr(d string, n int) RETURNS DOUBLE LANGUAGE R {
# df is a dataframe containing the input columns
my_aggr <- function(df) {
42.0
}
df <- cbind(d,n)
# aggr_group contains the groups, passed by MonetDB, as in the GROUP BY
clause
as.vector(by(df, aggr_group, my_aggr))
};
select customer, myaggr(d,n) from test group by customer;
+----------+--------------------------+
| customer | L1 |
+==========+==========================+
| 1 | 42 |
| 2 | 42 |
+----------+--------------------------+
On 4 May 2016 at 13:19, Anderson, David B
Is it possible to construct aggregate functions using embedded R? Would the syntax be: "create aggregate function"? I need to pass in a pair of aligned vectors/columns to compute weighted percentiles.
I've held off on taking the R route for performance reasons, but I don't see how to pass multiple vectors into the new sub* aggregate implementation.
All of the aggregate functions in gdk_agg.c take a single column and I don't know if it is possible to alter the interface to pass in multiples.
Thanks, Dave
-----Original Message----- *From: *Roberto Cornacchia [roberto.cornacchia@gmail.com] *Sent: *Wednesday, May 04, 2016 04:59 AM Eastern Standard Time *To: *Communication channel for MonetDB users *Subject: *Re: PCRE: references to captures in replace string
Another option is to use R. This is available in Jul2015 branch, and is very easy (https://www.monetdb.org/content/embedded-r-monetdb https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_content_embedded-2Dr-2Dmonetdb&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=CqtgVJpkJoop7G_7izH9LywiM82AR0XCNvGu0ipkFuo&s=cMjL1PD_6AwNjME5KXxpxTCzfole4bSqgnp4uolKgoY&e= )
CREATE FUNCTION r_replace(s STRING, p STRING, r STRING) RETURNS STRING LANGUAGE R { sub(p,r,s) };
sql>select r_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\2 \\1'); +------------------------+ | r_replace_single_value | +========================+ | Pear Apple | +------------------------+
It works out of the box on columns:
select r_replace(value, '^.*(\d).*$','\\1%\\2') from obj_string limit 100;
And luckily without loops:
X_10:bat[:oid,:str] := batrapi.eval(nil,"{\n sub(p,r,s)\n};",X_9,A0,A1);
Still not very fast - I've just tried on 11M strings, 45s - probably because strings are fully copied, but for now this seems the easiest workaround.
It would be great to have it working on pcre though. Roberto
On 4 May 2016 at 10:31, Roberto Cornacchia
wrote: An interesting workaround for now could be to implement a simple UDF in python, to let python perform the regex processing: https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_blog_embedded-2Dpythonnumpy-2Dmonetdb&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=CqtgVJpkJoop7G_7izH9LywiM82AR0XCNvGu0ipkFuo&s=USK84Sl9Muw1-qalA5A9ju8ZlqTAQhLAjZpaBBU8-rw&e=
Unfortunately this is for now only available in a separate branch which is kept in sync with default. For me this is no option because I need to rely on a stable MonetDB version. But perhaps for other it is an option.
On 26 April 2016 at 16:57, imad hajj chahine
wrote:
I still find this to be an interested feature to add, as its very useful in string date manipulation.
On Tue, Apr 26, 2016 at 5:28 PM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I found this thread from October 2015 where Sjoerd concludes that back references replacement is not implemented in the MonetDB wrapper: https://www.monetdb.org/pipermail/users-list/2015-October/008492.html https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_pipermail_users-2Dlist_2015-2DOctober_008492.html&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=CqtgVJpkJoop7G_7izH9LywiM82AR0XCNvGu0ipkFuo&s=-2C6SmdbU2YUQPzV0Y4tHbMAO_qimRAh1LYz_CmnnVk&e=
Probably that is still the case?
On 20 April 2016 at 16:35, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
After declaring a wrapper for pcre.replace:
sql>create function pcre_replace(s string, pattern string, repl string, flags string) returns string external name pcre."replace"; operation successful (6.187ms)
I cannot get replacements with references to captures work (but the code in pcre.c seems to support it):
sql>select pcre_replace('Apple Pear', '^([a-zA-Z]+)\\s*([a-zA-Z]+)$', '\\1 \\2', ''); +---------------------------+ | pcre_replace_single_value | +===========================+ | \1 \2 | +---------------------------+ 1 tuple (0.309ms)
The pattern is matched, but the back references are replaced literally. Am I not getting the syntax correct? From http://www.pcre.org/original/doc/html/pcresyntax.html https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pcre.org_original_doc_html_pcresyntax.html&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=CqtgVJpkJoop7G_7izH9LywiM82AR0XCNvGu0ipkFuo&s=1w0JvVy_zYBkYl-zfHDgynjLIdhHUAILmVoBG7XimvA&e= I see I could use any of \1, \g1, \g{1] for back references, but none seem to work.
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=CqtgVJpkJoop7G_7izH9LywiM82AR0XCNvGu0ipkFuo&s=pDDshZs8eRHmb2WK-zmw9R7tisgZG07gKuSQ0h7p9Kk&e=
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=CqtgVJpkJoop7G_7izH9LywiM82AR0XCNvGu0ipkFuo&s=pDDshZs8eRHmb2WK-zmw9R7tisgZG07gKuSQ0h7p9Kk&e=
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Anderson, David B
-
imad hajj chahine
-
Roberto Cornacchia