Hi Daniel,
[ the Users mailing list is probably more suited for your question ]
The main reason why your first approach is slower is that MonetDB is column-oriented.
It doesn't evaluate your condition per input tuple, but per input column.
See the EXPLAIN of your example, using one of my existing databases (TABLE=
spinque.info, FIELD=attribute, PATTERN=lock):
| C_9:bat[:oid] := sql.tid(X_8:int, "spinque":str, "info":str); |
| X_12:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str, "attribute":str, 0:int); |
| (X_17:bat[:oid], X_18:bat[:str]) := sql.bind(X_8:int, "spinque":str, "info":str, "attribute":str, 2:int); |
| X_15:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str, "attribute":str, 1:int); |
| X_21:bat[:str] := sql.projectdelta(C_9:bat[:oid], X_12:bat[:str], X_17:bat[:oid], X_18:bat[:str], X_15:bat[:str]); |
| X_26:bat[:bit] := algebra.project(X_21:bat[:str], false:bit); |
Just binding input columns to variables
| X_32:bat[:bit] := batcalc.ifthenelse(X_26:bat[:bit], false:bit, true:bit); Evaluating your first constant condition (user_case_sensitive_pref=1)
| X_36:bat[:bit] := batalgebra.like(X_21:bat[:str], "%lock%":str); |
Evaluating the "then" branch
| X_40:bat[:bit] := batalgebra.ilike(X_21:bat[:str], "%lock%":str); |
Evaluating the "else" branch
| X_42:bat[:bit] := batcalc.ifthenelse(X_32:bat[:bit], X_36:bat[:bit], X_40:bat[:bit]); |
Evaluating the condition. This produces a boolean column with either the value from the "then" branch or the value from the "else" branch
| C_45:bat[:oid] := algebra.thetaselect(X_42:bat[:bit], true:bit, "==":str); |
Evaluating where, in the original column, the condition is satisfied
| X_47:bat[:str] := algebra.projection(C_45:bat[:oid], X_21:bat[:str]); |
Fetching the values, in the original column, where the condition is satisfied
| sql.resultSet(X_59:bat[:str], X_61:bat[:str], X_63:bat[:str], X_65:bat[:int], X_67:bat[:int], X_47:bat[:str]); |
This is really an overkill for your statement. Firstly, because it always evaluates fully both the output branches of your condition. Secondly, because of the overhead needed for this approach.
Without a specialised "contains" UDF, this is how MonetDB handles it in a generic way. It is the price to pay to get the benefits that column decomposition brings in other cases.
In this case, you would get better performance by implementing a BAT function (not a function that works on a single value, but on a column of values).
However, this requires you to recompile MonetDB to include your new function.
Best regards,
Roberto