Hi, I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex: *select date_to_str("Expiry_Date",'%Y-%m-%d') from table*: exec time is 0.72s if i create a function: *CREATE FUNCTION DayBracket(pdate date) returns char(10)* *BEGIN* * return date_to_str(pdate,'%Y-%m-%d');* *end;* *grant execute on function DayBracket to public;* and execute *select DayBracket("Expiry_Date") from table*: exec time is 24s even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function. The following function perform much better: *CREATE FUNCTION DayBracketOpt(pdate date) returns char(10)* *BEGIN* * return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0');* *end;* *grant execute on function DayBracketOpt to public;* I also tried to create a Python function: *CREATE FUNCTION pyDayBracket(pdate date) returns string* *LANGUAGE PYTHON {* * from time import strftime, strptime* * return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate]* *};* it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date; Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping? Thank you.
Imad, we had the same problem (you can find our mail in list with
subject "Functions in Monetdb", date nov/22), and had no satisfactory
solution yet.
I think monetDb is not good for UDF yet... (though is great db engine!)
2016-12-21 10:40 GMT-03:00 imad hajj chahine
Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping?
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hai
On 21 Dec 2016, at 15:26, Alberto Ferrari
wrote: Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine
: Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping?
I think you’ll have to drop the depending functions and views first. Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Jennie,
Thank you for the answer, the unexpected version of the function take 1.1
sec to execute, the table contains 500k values.
You mentioned Bulk functions, I tried to achieve this with python with no
luck, can you please check my Python function syntax and let me know the
problem.
*CREATE FUNCTION pyDayBracket(pdate date) returns string*
*LANGUAGE PYTHON {*
* from time import strftime, strptime*
* return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate]*
*};*
Throws a buffer size mismatch exception.
Also why python don't recognize the date type from monetdb?
Thank you.
On Wed, Dec 21, 2016 at 5:11 PM, Ying Zhang
Hai
On 21 Dec 2016, at 15:26, Alberto Ferrari
wrote: Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine <
imad.hajj.chahine@gmail.com>:
Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to
turn
off dependency check before dropping?
I think you’ll have to drop the depending functions and views first.
Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hey Imad,
Not sure what the problem is; your function works fine for me. Note that it won't be extremely efficient either, because you are using a loop in Python and hence calling a function for every value as well. Maybe the pandas version of this function would work better for you:
CREATE FUNCTION pydate(d DATE) RETURNS STRING LANGUAGE PYTHON {
import pandas as pd
return pd.to_datetime(d).strftime('%Y-%m-%d')
};
Note that if you just want to convert dates to strings you can use the following Python UDF as well (but then there's not much point in using a Python UDF at all):
CREATE FUNCTION pydate(d DATE) RETURNS STRING LANGUAGE PYTHON { return d };
I'm not sure what you mean with "Python doesn't recognize the date type from MonetDB". Dates aren't a fully supported type in Python UDFs; and any not fully supported type is converted to a string as a fallback. You can input and return dates from Python UDFs but they will be converted to/from strings when passed between MonetDB and Python.
Hope that helps,
Mark
----- Original Message -----
From: "imad hajj chahine"
Hai
On 21 Dec 2016, at 15:26, Alberto Ferrari
wrote: Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine <
imad.hajj.chahine@gmail.com>:
Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to
turn
off dependency check before dropping?
I think you’ll have to drop the depending functions and views first.
Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 21 Dec 2016, at 16:24, imad hajj chahine
wrote: Hi Jennie,
Thank you for the answer, the unexpected version of the function take 1.1 sec to execute, the table contains 500k values.
You mean the faster version of the function took 1.1 sec? How long did the slower one take?
You mentioned Bulk functions, I tried to achieve this with python with no luck, can you please check my Python function syntax and let me know the problem.
I’m sorry that I’m not a python expert. But I saw Mark has already answered. Regards, Jennie
CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] };
Throws a buffer size mismatch exception. Also why python don't recognize the date type from monetdb?
Thank you.
On Wed, Dec 21, 2016 at 5:11 PM, Ying Zhang
wrote: Hai On 21 Dec 2016, at 15:26, Alberto Ferrari
wrote: Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine
: Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping?
I think you’ll have to drop the depending functions and views first.
Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Thank you Mark and Jennie,
The purpose of this function is to create date brackets, like month and
quarter brackets not only to convert date to string.
I handled null values and now its working and returning Nat for null
values. Any idea on how to return null instead NaT to stay in the SQL
standards?
I also noticed that i cant aggregate over the python function, for instance
if i create a view with a column referring the Python UDF and i execute
select count(columnPythonUDF) from table1 an error occured.
Am I excepting too much from a Python UDF to behave like a normal SQL UDF,
as I didnt face this error when aggregating over an SQL UDF.
*CREATE FUNCTION pyDayBracket(d DATE) RETURNS STRING LANGUAGE PYTHON {*
* import pandas as pd*
* return pd.to_datetime(d, coerce=True).strftime('%Y-%m-%d')*
*};*
as for the execution time:
1. directly invoking str_to_date in select query took 0.73 sec
2. UDF with concat date parts tool 1.1 sec
3. UDF calling str_to_date took 24 sec
Thank you.
On Wed, Dec 21, 2016 at 6:17 PM, Ying Zhang
On 21 Dec 2016, at 16:24, imad hajj chahine
wrote: Hi Jennie,
Thank you for the answer, the unexpected version of the function take 1.1 sec to execute, the table contains 500k values.
You mean the faster version of the function took 1.1 sec? How long did the slower one take?
You mentioned Bulk functions, I tried to achieve this with python with
no luck, can you please check my Python function syntax and let me know the problem.
I’m sorry that I’m not a python expert. But I saw Mark has already answered.
Regards, Jennie
CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] };
Throws a buffer size mismatch exception. Also why python don't recognize the date type from monetdb?
Thank you.
On Wed, Dec 21, 2016 at 5:11 PM, Ying Zhang
wrote: Hai On 21 Dec 2016, at 15:26, Alberto Ferrari
wrote: Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine <
Hi,
I have a weird behavior when i package some sql code in a function,
imad.hajj.chahine@gmail.com>: the
performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in
pdate]
}; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping?
I think you’ll have to drop the depending functions and views first.
Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Jennie, I also have the same problem with this date related function. *CREATE FUNCTION DateAdd(pdate date, pdays int) returns date* *BEGIN* * return select pdate + cast(pdays as interval day);* *end;* On Wed, Dec 21, 2016 at 7:45 PM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Thank you Mark and Jennie,
The purpose of this function is to create date brackets, like month and quarter brackets not only to convert date to string. I handled null values and now its working and returning Nat for null values. Any idea on how to return null instead NaT to stay in the SQL standards? I also noticed that i cant aggregate over the python function, for instance if i create a view with a column referring the Python UDF and i execute select count(columnPythonUDF) from table1 an error occured. Am I excepting too much from a Python UDF to behave like a normal SQL UDF, as I didnt face this error when aggregating over an SQL UDF.
*CREATE FUNCTION pyDayBracket(d DATE) RETURNS STRING LANGUAGE PYTHON {* * import pandas as pd* * return pd.to_datetime(d, coerce=True).strftime('%Y-%m-%d')* *};*
as for the execution time:
1. directly invoking str_to_date in select query took 0.73 sec 2. UDF with concat date parts tool 1.1 sec 3. UDF calling str_to_date took 24 sec
Thank you.
On Wed, Dec 21, 2016 at 6:17 PM, Ying Zhang
wrote: On 21 Dec 2016, at 16:24, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Hi Jennie,
Thank you for the answer, the unexpected version of the function take 1.1 sec to execute, the table contains 500k values.
You mean the faster version of the function took 1.1 sec? How long did the slower one take?
You mentioned Bulk functions, I tried to achieve this with python with
no luck, can you please check my Python function syntax and let me know the problem.
I’m sorry that I’m not a python expert. But I saw Mark has already answered.
Regards, Jennie
CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] };
Throws a buffer size mismatch exception. Also why python don't recognize the date type from monetdb?
Thank you.
On Wed, Dec 21, 2016 at 5:11 PM, Ying Zhang
wrote: Hai On 21 Dec 2016, at 15:26, Alberto Ferrari < aferrari@starconnecting.com> wrote:
Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine <
Hi,
I have a weird behavior when i package some sql code in a function,
imad.hajj.chahine@gmail.com>: the
performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in
pdate]
}; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping?
I think you’ll have to drop the depending functions and views first.
Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 21 Dec 2016, at 18:45, imad hajj chahine
wrote: Thank you Mark and Jennie,
The purpose of this function is to create date brackets, like month and quarter brackets not only to convert date to string. I handled null values and now its working and returning Nat for null values. Any idea on how to return null instead NaT to stay in the SQL standards? I also noticed that i cant aggregate over the python function, for instance if i create a view with a column referring the Python UDF and i execute select count(columnPythonUDF) from table1 an error occured. Am I excepting too much from a Python UDF to behave like a normal SQL UDF, as I didnt face this error when aggregating over an SQL UDF.
CREATE FUNCTION pyDayBracket(d DATE) RETURNS STRING LANGUAGE PYTHON { import pandas as pd return pd.to_datetime(d, coerce=True).strftime('%Y-%m-%d') };
as for the execution time: • directly invoking str_to_date in select query took 0.73 sec • UDF with concat date parts tool 1.1 sec • UDF calling str_to_date took 24 sec
If you TRACE your queries, you shall be able to see where the time went. Also, the Tomograph tool (https://www.monetdb.org/Documentation/Manuals/MonetDB/Profiler/tomograph) might give you some explanations… Jennie
Thank you.
On Wed, Dec 21, 2016 at 6:17 PM, Ying Zhang
wrote: On 21 Dec 2016, at 16:24, imad hajj chahine
wrote: Hi Jennie,
Thank you for the answer, the unexpected version of the function take 1.1 sec to execute, the table contains 500k values.
You mean the faster version of the function took 1.1 sec? How long did the slower one take?
You mentioned Bulk functions, I tried to achieve this with python with no luck, can you please check my Python function syntax and let me know the problem.
I’m sorry that I’m not a python expert. But I saw Mark has already answered.
Regards, Jennie
CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] };
Throws a buffer size mismatch exception. Also why python don't recognize the date type from monetdb?
Thank you.
On Wed, Dec 21, 2016 at 5:11 PM, Ying Zhang
wrote: Hai On 21 Dec 2016, at 15:26, Alberto Ferrari
wrote: Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine
: Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping?
I think you’ll have to drop the depending functions and views first.
Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Jennie,
After examining the file /module/atoms/mtime.c, I understand why using the
extract version perform much better from the date_to_str version, as the
extract has a bulk version while the date_to_str doesn't.
Any reason why you skipped the creation of a bulk version from date_to_str?
Also the bulk version is missing for the dayofyear, dayofweek, weekofyear
functions.
Thank you.
On Wed, Dec 21, 2016 at 5:11 PM, Ying Zhang
Hai
On 21 Dec 2016, at 15:26, Alberto Ferrari
wrote: Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine <
imad.hajj.chahine@gmail.com>:
Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to
turn
off dependency check before dropping?
I think you’ll have to drop the depending functions and views first.
Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
AFAIK, a lacking of time is the main/only reason. Whenever possible, we gradually add the missing bulk impl. for a function. Sometime ago (the first commit was on Dec 17, 2013), we added something called Manifold, which will try to automatically generate a bulk version for a function, if it doesn’t have an explicitly implemented bulk version. However, this may not be possible for every function. Moreover, the generated bulk version may not be as efficient as an explicit implementation.
On 27 Dec 2016, at 19:49, imad hajj chahine
wrote: Hi Jennie,
After examining the file /module/atoms/mtime.c, I understand why using the extract version perform much better from the date_to_str version, as the extract has a bulk version while the date_to_str doesn't. Any reason why you skipped the creation of a bulk version from date_to_str? Also the bulk version is missing for the dayofyear, dayofweek, weekofyear functions.
Thank you.
On Wed, Dec 21, 2016 at 5:11 PM, Ying Zhang
wrote: Hai On 21 Dec 2016, at 15:26, Alberto Ferrari
wrote: Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
Mostly, it’s because monetdb can’t automatically bulk execute user defined functions. So for UDFs, the overhead of a function call is multiplied by the number of values to process. While for built-in functions, they all have a bulk version, so that they are only called once with all values to process.
2016-12-21 10:40 GMT-03:00 imad hajj chahine
: Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
Using LIMIT in the outer query doesn’t help, your UDF is till called with every value. The final tuple is only filtered after that.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
Hmm, a bit unexpected. How much better is it? How many values are processed?
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping?
I think you’ll have to drop the depending functions and views first.
Regards, Jennie
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Thank you Alberto,
In my case i am using the same syntax as in a select query, so there must
be something other the calculation in the function itself.
As martin proposed to write a C or Python language, I tried this path with
Python with no luck.
Checking the query trace:
120633 profiler.starttrace();
2 X_32=0@0:void := querylog.define("trace\nselect
db.daybracket(\"Expiry_Date\")from \"db\".\"table1\" limit
1\n;":str,"default_pipe":str,93:int);
7 X_1=0:int := sql.mvc();
13 X_22=
Imad, we had the same problem (you can find our mail in list with subject "Functions in Monetdb", date nov/22), and had no satisfactory solution yet. I think monetDb is not good for UDF yet... (though is great db engine!)
2016-12-21 10:40 GMT-03:00 imad hajj chahine
: Hi,
I have a weird behavior when i package some sql code in a function, the performance deteriorate. The same code running directly in select statement is blinking fast. ex:
select date_to_str("Expiry_Date",'%Y-%m-%d') from table: exec time is 0.72s
if i create a function: CREATE FUNCTION DayBracket(pdate date) returns char(10) BEGIN return date_to_str(pdate,'%Y-%m-%d'); end; grant execute on function DayBracket to public;
and execute select DayBracket("Expiry_Date") from table: exec time is 24s
even when i set the limit to 1 its taking the same time, so i guess it should be something related to loading the function.
The following function perform much better: CREATE FUNCTION DayBracketOpt(pdate date) returns char(10) BEGIN return extract(year from pdate) || '-' || lpad(extract(month from pdate),2,'0') || '-' || lpad(dayofmonth(pdate),2,'0'); end; grant execute on function DayBracketOpt to public;
I also tried to create a Python function: CREATE FUNCTION pyDayBracket(pdate date) returns string LANGUAGE PYTHON { from time import strftime, strptime return [strftime("%Y-%m-%d", strptime(pdt,"%Y-%m-%d")) for pdt in pdate] }; it is taking too long and throwing an exception:buffer size mismatch PS: the pdate array type is detected in pythons as string not as date;
Also how to delete a function with dependency, I need to replace some functions which are used in others functions and views. Any flags to turn off dependency check before dropping?
Thank you.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (4)
-
Alberto Ferrari
-
imad hajj chahine
-
Mark Raasveldt
-
Ying Zhang