Hi Martin,
Thanks for your response.
The working plan with a 199,999 row table is as follows:
+----------------------------------------------------------------+
| mal |
+================================================================+
| function user.s3_1{autoCommit=true}(A0:str):void; |
| X_16 := nil:bat[:oid,:dbl]; |
| X_19 := nil:bat[:oid,:oid]; |
| X_20 := nil:bat[:oid,:str]; |
| X_27 := nil:bat[:oid,:dbl]; |
| barrier X_60 := language.dataflow(); |
| X_3 := sql.mvc(); |
| X_4:bat[:oid,:oid] := sql.tid(X_3,"sys","test1"); |
| X_7 := sql.bind(X_3,"sys","test1","x",0); |
| (X_10,r1_10) := sql.bind(X_3,"sys","test1","x",2); |
| X_13 := sql.bind(X_3,"sys","test1","x",1); |
| X_15 := sql.delta(X_7,X_10,r1_10,X_13); |
| X_16 := algebra.leftfetchjoin(X_4,X_15); |
| X_17 := algebra.project(X_16,A0); |
| X_19 := algebra.subslice(X_17,0:wrd,0:wrd); |
| X_20 := algebra.leftfetchjoin(X_19,X_17); |
| X_21 := sql.bind(X_3,"sys","test1","y",0); |
| (X_23,r1_29) := sql.bind(X_3,"sys","test1","y",2); |
| X_25 := sql.bind(X_3,"sys","test1","y",1); |
| X_26 := sql.delta(X_21,X_23,r1_29,X_25); |
| X_27 := algebra.leftfetchjoin(X_4,X_26); |
| language.pass(X_17); |
| language.pass(X_4); |
| exit X_60; |
| X_55 := algebra.project(X_27,nil:dbl); |
| X_57 := batcalc./(X_16,X_27); |
| X_58 := batmmath.log(X_57); |
| X_55 := X_58; |
| X_28 := algebra.leftfetchjoin(X_19,X_55); |
| X_29 := sql.resultSet(2,1,X_20); |
| sql.rsColumn(X_29,".L1","L1","char",14,0,X_20); |
| sql.rsColumn(X_29,"sys.L1","my_udf_x","double",53,0,X_28); |
| X_41 := io.stdout(); |
| sql.exportResult(X_41,X_29); |
| end s3_1; |
+----------------------------------------------------------------+
35 tuples (0.964ms)
The non-working plan with a 200,000 row table is shown below:
+------------------------------------------------------------------------------------------+
| mal |
+==========================================================================================+
| function user.s4_1{autoCommit=true}(A0:str):void; |
| X_13 := nil:bat[:oid,:str]; |
| X_16 := nil:bat[:oid,:dbl]; |
| barrier X_91 := language.dataflow(); |
| X_3 := sql.mvc(); |
| X_47:bat[:oid,:oid] := sql.tid(X_3,"sys","test",0,2); |
| X_50:bat[:oid,:dbl] := sql.bind(X_3,"sys","test","x",0,0,2); |
| (X_52:bat[:oid,:oid] ,X_53:bat[:oid,:dbl] ) := sql.bind(X_3,"sys","test","x",2,0,2); |
| X_62 := sql.delta(X_50,X_52,X_53); |
| X_64 := algebra.leftfetchjoin(X_47,X_62); |
| X_66 := algebra.project(X_64,A0); |
| X_68 := algebra.subslice(X_66,0:wrd,0:wrd); |
| X_49:bat[:oid,:oid] := sql.tid(X_3,"sys","test",1,2); |
| X_51:bat[:oid,:dbl] := sql.bind(X_3,"sys","test","x",0,1,2); |
| (X_54:bat[:oid,:oid] ,X_55:bat[:oid,:dbl] ) := sql.bind(X_3,"sys","test","x",2,1,2); |
| X_9 := sql.bind(X_3,"sys","test","x",1); |
| X_63 := sql.delta(X_51,X_54,X_55,X_9); |
| X_65 := algebra.leftfetchjoin(X_49,X_63); |
| X_67 := algebra.project(X_65,A0); |
| X_69 := algebra.subslice(X_67,0:wrd,0:wrd); |
| X_85 := mat.packIncrement(X_68,2); |
| X_70 := mat.packIncrement(X_85,X_69); |
| X_12 := algebra.subslice(X_70,0:wrd,0:wrd); |
| X_72 := algebra.leftfetchjoin(X_68,X_66); |
| X_73 := algebra.leftfetchjoin(X_69,X_67); |
| X_87 := mat.packIncrement(X_72,2); |
| X_71 := mat.packIncrement(X_87,X_73); |
| X_13 := algebra.leftfetchjoin(X_12,X_71); |
| X_56:bat[:oid,:dbl] := sql.bind(X_3,"sys","test","y",0,0,2); |
| (X_58:bat[:oid,:oid] ,X_59:bat[:oid,:dbl] ) := sql.bind(X_3,"sys","test","y",2,0,2); |
| X_74 := sql.delta(X_56,X_58,X_59); |
| X_76 := algebra.leftfetchjoin(X_47,X_74); |
| X_78 := algebra.project(X_76,nil:dbl); |
| X_83 := algebra.leftfetchjoin(X_68,X_78); |
| X_57:bat[:oid,:dbl] := sql.bind(X_3,"sys","test","y",0,1,2); |
| (X_60:bat[:oid,:oid] ,X_61:bat[:oid,:dbl] ) := sql.bind(X_3,"sys","test","y",2,1,2); |
| X_15 := sql.bind(X_3,"sys","test","y",1); |
| X_75 := sql.delta(X_57,X_60,X_61,X_15); |
| X_77 := algebra.leftfetchjoin(X_49,X_75); |
| X_79 := algebra.project(X_77,nil:dbl); |
| X_84 := algebra.leftfetchjoin(X_69,X_79); |
| X_89 := mat.packIncrement(X_83,2); |
| X_82 := mat.packIncrement(X_89,X_84); |
| X_16 := algebra.leftfetchjoin(X_12,X_82); |
| language.pass(X_66); |
| language.pass(X_67); |
| language.pass(X_47); |
| language.pass(X_68); |
| language.pass(X_49); |
| language.pass(X_69); |
| language.pass(X_12); |
| exit X_91; |
| X_86:bat[:oid,:oid] := nil:bat[:oid,:oid]; |
| X_88:bat[:oid,:str] := nil:bat[:oid,:str]; |
| X_90:bat[:oid,:dbl] := nil:bat[:oid,:dbl]; |
| X_17 := sql.resultSet(2,1,X_13); |
| sql.rsColumn(X_17,".L1","L1","char",14,0,X_13); |
| sql.rsColumn(X_17,"sys.L1","my_udf_x","double",53,0,X_16); |
| X_29 := io.stdout(); |
| sql.exportResult(X_29,X_17); |
| end s4_1; |
+------------------------------------------------------------------------------------------+
I hope this sheds some light on the issue, I don't have any background in MAL to interpret these.
Best regards,
Alastair
________________________________________
From: users-list
Hi Stefan,
Thanks for you response.
I am not actually sure because I changed the test data script to output constant values of 0.1 and 0.2 for each row.
The problem still remains with these values in the test data, at 200000 records, I get a null.
If it is any clue, running mclient with --interactive shows that the time taken for this statement drops when going from 199,999 -> 200,000 as it is have decided to execute a different plan.
Best regards,
Alastair ________________________________________ From: users-list
on behalf of Stefan Manegold Sent: 13 November 2013 13:14 To: Communication channel for MonetDB users Subject: Re: User Defined Functions returning null what are the x & y values of your the last of your 200000 tuples, i.e., the one omitted if you only load the first 199999 tuples?
----- Original Message -----
Hi all,
I am trying to evaluate the performance of MonetDB for an analytical workload.
I am using MonetDB-11.15.17 built from source on Ubuntu 13.04.
I have encountered a problem with user using defined functions in SQL which I have reduced to this test case. I appears that whenever I have over 200,000 rows in the table, the UDF returns null.
First create a test data set with the following python:
import random with open("test_data.csv","w") as f: for i in range(0,1000000): f.write("%s,%s\n" % (random.random(),random.random()))
Then I use the following to load the data, and try using a simple UDF in SQL.
CREATE TABLE TEST ( x float, y float );
COPY 200000 RECORDS INTO TEST from '/home/alastair/test_data.csv' USING DELIMITERS ',','\n','"' NULL AS '';
CREATE FUNCTION MY_UDF (x float, y float) RETURNS float BEGIN DECLARE ret float; set ret = LOG(x/y); RETURN ret; END;
SELECT 'udf alone',MY_UDF(0.1,0.2);
SELECT 'udf from table',MY_UDF(t.x,t.y) FROM TEST as t limit 1; -- this returns NULL with 200000 records
DROP FUNCTION MY_UDF; DROP TABLE TEST;
The value returned my MY_UDF is null with 200,000 records, changing to 199,999 records in COPY INTO produces the correct result.
If I use a UDF built in C, the problem does not exist at all.
I feel like I might have missed something silly but I can't think of anything.
Best regards,
Alastair
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list