Hi Daniel,
[ the Users mailing list is probably more suited for your question ]
The main reason why your first approach is slower is that MonetDB is
column-oriented.
It doesn't evaluate your condition per input tuple, but per input column.
See the EXPLAIN of your example, using one of my existing databases (TABLE=
spinque.info, FIELD=attribute, PATTERN=lock):
| C_9:bat[:oid] := sql.tid(X_8:int, "spinque":str, "info":str);
|
| X_12:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str,
"attribute":str, 0:int);
|
| (X_17:bat[:oid], X_18:bat[:str]) := sql.bind(X_8:int, "spinque":str,
"info":str, "attribute":str, 2:int);
|
| X_15:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str,
"attribute":str, 1:int);
|
| X_21:bat[:str] := sql.projectdelta(C_9:bat[:oid], X_12:bat[:str],
X_17:bat[:oid], X_18:bat[:str], X_15:bat[:str]);
|
| X_26:bat[:bit] := algebra.project(X_21:bat[:str], false:bit);
|
Just binding input columns to variables
| X_32:bat[:bit] := batcalc.ifthenelse(X_26:bat[:bit], false:bit,
true:bit);
Evaluating your first constant condition (user_case_sensitive_pref=1)
| X_36:bat[:bit] := batalgebra.like(X_21:bat[:str], "%lock%":str);
|
Evaluating the "then" branch
| X_40:bat[:bit] := batalgebra.ilike(X_21:bat[:str], "%lock%":str);
|
Evaluating the "else" branch
| X_42:bat[:bit] := batcalc.ifthenelse(X_32:bat[:bit], X_36:bat[:bit],
X_40:bat[:bit]);
|
Evaluating the condition. This produces a boolean column with either the
value from the "then" branch or the value from the "else" branch
| C_45:bat[:oid] := algebra.thetaselect(X_42:bat[:bit], true:bit,
"==":str);
|
Evaluating where, in the original column, the condition is satisfied
| X_47:bat[:str] := algebra.projection(C_45:bat[:oid], X_21:bat[:str]);
|
Fetching the values, in the original column, where the condition is
satisfied
| sql.resultSet(X_59:bat[:str], X_61:bat[:str], X_63:bat[:str],
X_65:bat[:int], X_67:bat[:int], X_47:bat[:str]);
|
This is really an overkill for your statement. Firstly, because it always
evaluates fully both the output branches of your condition. Secondly,
because of the overhead needed for this approach.
Without a specialised "contains" UDF, this is how MonetDB handles it in a
generic way. It is the price to pay to get the benefits that column
decomposition brings in other cases.
In this case, you would get better performance by implementing a BAT
function (not a function that works on a single value, but on a column of
values).
You would write that in C using the GDK api (
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti...),
and evaluate the condition i the loop, per value.
However, this requires you to recompile MonetDB to include your new
function.
You could also try the relatively new JIT C UDF mechanism, which allows you
to write your function in C but with a simpler API and without recompiling:
https://www.monetdb.org/blog/JIT_C_C%252B%252B_UDFs_in_MonetDB
Best regards,
Roberto
On Mon, 7 Oct 2019 at 13:17, Daniel Zvinca
Hello,
I am exploring MonetDB queries performance that are generated based on user parameters/preferences. To be more specific the where clause is generated based on a user interface mapping user input parameters straight into SQL statements. See below one example of this transformation.
FIELD contains PATTERN (case sensitive/insensitive)
is transformed into
select ... from TABLE where (case when (user_case_sensitive_pref=1) then ("FIELD" like '%PATTERN%') else ("FIELD" ilike '%PATTERN%') end);
Performance of this in comparison with query generated from within the code:
char * query; if ( user_case_sensitive_pref ) query = "select ... from TABLE where \"FIELD\" like '%PATTERN%'"; else query = "select ... from TABLE where \"FIELD\" ilike '%PATTERN%'";
query_execute( query);
is huge (second approach takes 0.12 sec, vs 6 sec for first one). But it requires a far more complex implementation.
First I thought this has to do with user variables but even if the statement is
select ... from TABLE where (case when TRUE then ("FIELD" like '%PATTERN%') else ("FIELD" ilike '%PATTERN%') end);
timing is similarly slow.
This narrows down to "case" performance issue, as far as I can see.
Am I wrong?
Thank you, Dan
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list