Re: pymonetdb or Python UDF

17 Jun 2018

      Adam,

A solution in Python could use the following SQL snippets:

-- example data
drop table if exists d;
create table d as select 'key1=value1, key2=value2, key3=value3' as s
union all select 'key1=value4, key2=value5, key3=value6';

-- create utility function
create or replace function pcre_replace(origin string, pat string, repl
string, flags string)
  returns string external name pcre.replace;

-- split - if you know the key and the number of columns
select pcre_replace(s,'.*key1[=]\([a-zA-Z0-9-]*\),.*','\\1','') as value1,
  pcre_replace(s,'.*key2[=]\([a-zA-Z0-9-]*\),.*','\\1','') as value2,
  pcre_replace(s,'.*key3[=]\([a-zA-Z0-9-]*\).*','\\1','') as value3
  from d;

A more general solution in Python could be similar to the following R
solution:

## use DBI and MonetDB.R packages
library(DBI)
library(MonetDB.R)
## create a connection
con <- dbConnect(MonetDB.R::MonetDB.R(), dbname="testt")
## example data
dbWriteTable(con,"d",
             data.frame(s=c('key1=value1, key2=value2, key3=value3',
                            'key1=value4, key2=value5, key3=value6'),
                        stringsAsFactors=FALSE), overwrite=TRUE)
## utility function
is.na(dbExecute(con, "create or replace function pcre_replace(origin
string, pat string, repl string, flags string) returns string external
name pcre.replace"))
## function to split a column and create a new table
splitColumn <- function(con, tableName, columnName, newTableName) {
    ## read the first row
    d1 <- dbGetQuery(con, sprintf("select %s from %s limit 1",
columnName, tableName))
    ## get the number of splits
    n <- length(strsplit(d1$s[1],", ")[[1]])
    ## get the key
    key <- gsub("^(.*)1=.*","\\1",d1$s[1])
    ## create the regexs
    regex <- paste0(".*",key,1:n,"=\\([a-zA-Z0-9_-]*\\).*")
    ## create the new table
    dbExecute(con, sprintf("drop table if exists %s", newTableName))
    stmt <- sprintf("create table %s as select %s from %s",
                    newTableName,
                    paste0(sprintf("pcre_replace(%s, '%s','\\\\1','') as
value%i",
                                   columnName, regex, 1:n), collapse=", "),
                    tableName)
    is.na(dbExecute(con, stmt))
}
splitColumn(con, "d", "s", "d2")
## check the new table
print(dbGetQuery(con, "select * from d2"))
## clean up
dbDisconnect(con)

I hope that this is helpful.

-- Mark

On 06/14/2018 04:41 PM, Doherty, Adam wrote:
...
Hi Stefan:
Thanks for the reply.
I created a UDF which returns a table, however when I try to run the function I get the following error -
Error: Embedded Python is enabled but an error was thrown during initialization.
SQLState:  PY000
ErrorCode: 0
I have Python 2.7.5 installed on CentOS 7.4
Adam
-----Original Message-----
From: users-list [mailto:users-list-bounces+adam.doherty=esso.ca@monetdb.org] On Behalf Of Stefan Manegold
Sent: Thursday, June 14, 2018 02:39
To: Communication channel for MonetDB users 
Subject: Re: pymonetdb or Python UDF
Hi Adam,
if you know how to do it (efficiently) in Python,
it's merely a matter of wrapping that Python code 
into a suitable Python UDF;
cf., https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb
Alternatively, you could consider using a (JIT-compiled) C(/C++) UDF;
cf., https://www.monetdb.org/node/437
Best,
Stefan
----- On Jun 13, 2018, at 6:31 PM, Doherty, Adam adam.doherty@esso.ca wrote:
...
Hello all:
This is more of curiosity but I have data in my events table currently trapped
in a nvarchar field.
I am wondering if I can build a table and do execute a select into with a python
UDF pulling data from the existing fields plus the values of my nvarchar field.
My string data has the following format –
“key1=value, key2=val, key3=value”…
I want to break out the data first by splitting on the , and then the =
Keys would be the columns in the table.
I can do this with a python script and pymonetdb but a UDF would be faster.
Adam Doherty
Process Control Systems Administrator
Kearl Technical, Imperial
505 Quarry Park Blvd SE
W3B.224
Calgary, AB T2C 5N1
adam.doherty@esso.ca P 587.476.4640 M 403.869.1636
imperialoil.ca | Twitter | YouTube
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

Re: pymonetdb or Python UDF

Mark Clements