Re: case performance

7 Oct 2019

      Hi Daniel,

[ the Users mailing list is probably more suited for your question ]

The main reason why your first approach is slower is that MonetDB is
column-oriented.
It doesn't evaluate your condition per input tuple, but per input column.

See the EXPLAIN of your example, using one of my existing databases (TABLE=
spinque.info, FIELD=attribute, PATTERN=lock):

|     C_9:bat[:oid] := sql.tid(X_8:int, "spinque":str, "info":str);

                                                            |
|     X_12:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str,
"attribute":str, 0:int);
                                                                    |
|     (X_17:bat[:oid], X_18:bat[:str]) := sql.bind(X_8:int, "spinque":str,
"info":str, "attribute":str, 2:int);
                                                              |
|     X_15:bat[:str] := sql.bind(X_8:int, "spinque":str, "info":str,
"attribute":str, 1:int);
                                                                    |
|     X_21:bat[:str] := sql.projectdelta(C_9:bat[:oid], X_12:bat[:str],
X_17:bat[:oid], X_18:bat[:str], X_15:bat[:str]);
                                                                 |
|     X_26:bat[:bit] := algebra.project(X_21:bat[:str], false:bit);

                                                            |
Just binding input columns to variables

|     X_32:bat[:bit] := batcalc.ifthenelse(X_26:bat[:bit], false:bit,
true:bit);
Evaluating your first constant condition (user_case_sensitive_pref=1)

|     X_36:bat[:bit] := batalgebra.like(X_21:bat[:str], "%lock%":str);

                                                             |
Evaluating the "then" branch

|     X_40:bat[:bit] := batalgebra.ilike(X_21:bat[:str], "%lock%":str);

                                                            |
Evaluating the "else" branch

|     X_42:bat[:bit] := batcalc.ifthenelse(X_32:bat[:bit], X_36:bat[:bit],
X_40:bat[:bit]);
                                                              |
Evaluating the condition. This produces a boolean column with either the
value from the "then" branch or the value from the "else" branch

|     C_45:bat[:oid] := algebra.thetaselect(X_42:bat[:bit], true:bit,
"==":str);
                                                                   |
Evaluating where, in the original column, the condition is satisfied

|     X_47:bat[:str] := algebra.projection(C_45:bat[:oid], X_21:bat[:str]);

                                                            |
Fetching the values, in the original column, where the condition is
satisfied

|     sql.resultSet(X_59:bat[:str], X_61:bat[:str], X_63:bat[:str],
X_65:bat[:int], X_67:bat[:int], X_47:bat[:str]);
                                                                     |

This is really an overkill for your statement. Firstly, because it always
evaluates fully both the output branches of your condition. Secondly,
because of the overhead needed for this approach.
Without a specialised "contains" UDF, this is how MonetDB handles it in a
generic way. It is the price to pay to get the benefits that column
decomposition brings in other cases.

In this case, you would get better performance by implementing a BAT
function (not a function that works on a single value, but on a column of
values).
You would write that in C using the GDK api (
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti...),
and evaluate the condition i the loop, per value.
However, this requires you to recompile MonetDB to include your new
function.

You could also try the relatively new JIT C UDF mechanism, which allows you
to write your function in C but with a simpler API and without recompiling:
https://www.monetdb.org/blog/JIT_C_C%252B%252B_UDFs_in_MonetDB

Best regards,
Roberto

On Mon, 7 Oct 2019 at 13:17, Daniel Zvinca  wrote:
...
Hello,
I am exploring MonetDB queries performance that are generated based on
user parameters/preferences.
To be more specific the where clause is generated based on a user
interface mapping user input parameters straight into SQL statements.
See below one example of this transformation.
FIELD contains PATTERN (case sensitive/insensitive)
is transformed into
select ... from TABLE where
(case when (user_case_sensitive_pref=1) then ("FIELD" like '%PATTERN%')
else  ("FIELD" ilike '%PATTERN%')
end);
Performance of this in comparison with query generated from within the
code:
char * query;
if ( user_case_sensitive_pref )
  query = "select ... from TABLE where  \"FIELD\" like '%PATTERN%'";
else
  query = "select ... from TABLE where  \"FIELD\" ilike '%PATTERN%'";
query_execute( query);
is huge (second approach takes 0.12 sec, vs 6 sec for first one).
But it requires a far more complex implementation.
First I thought this has to do with user variables but even if the
statement is
select ... from TABLE where
(case when TRUE then ("FIELD" like '%PATTERN%')
else  ("FIELD" ilike '%PATTERN%')
end);
timing is similarly slow.
This narrows down to "case" performance issue, as far as I can see.
Am I wrong?
Thank you,
Dan
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list

Re: case performance

Roberto Cornacchia