Re: case performance

8 Oct 2019

      Your last reply, Roberto, makes me wonder if it would be an idea to
implement a scalar capi as well (for now is bat only). Then for sure the
loop will be entirely done in C.

BAT operations are faster for most of the operations, but when formulas
include multiple conditional evaluations, the parallelism and vectorization
advantage is clearly lost and loop × scalar might be the winner, don't you
think?

On Mon, Oct 7, 2019, 18:39 Roberto Cornacchia 
wrote:
...
Just to mention, you can also force a tuple-at-the-time evaluation in pure
SQL:
-- BEGIN
declare cs boolean;
set cs=true;
START TRANSACTION;
CREATE OR REPLACE FUNCTION mycontains(s STRING, p STRING, casesensitive
boolean)
RETURNS BOOLEAN
BEGIN
  IF casesensitive
  THEN RETURN SELECT s like '%'||p||'%';
  ELSE RETURN SELECT s ilike '%'||p||'%';
  END IF;
END;
CREATE TABLE t(s STRING);
INSERT INTO t VALUES ('apple'),('pear'),('banana'),('orange');
explain select * from t where mycontains(s,'an',cs);;
-- END
If you look at this explain, you'll see it makes a loop that calls the
mycontains() function. This does avoid evaluating both branches. However
the explicit interpreted loop won't make it very fast, perhaps see on your
data. (Note: when the function is simpler, the loop is actually done in C)
On Mon, 7 Oct 2019 at 16:53, Daniel Zvinca 
wrote:
...
I wasn't sure if this question should be posted here or on regular user
list, considering the way statements are generated and possible connections
with internal functionality.
I understand the explanations. I should have checked the explain feature
myself. I think that I expected the optimizer to notice a constant
expression and evaluate it before choosing which real columnar operation
needs to be performed (obviously only one). Instead all members are
evaluated, the whole explain statement is self-explanatory, indeed. (a
delayed bat evaluation mechanism would have helped, I guess, for this case)
I use capi intensively, but for string columns the capi memory management
can be a serious limitation for large sets.
Anyway, your answer is very helpful for me, it makes me consider
adjusting my approach from a straight translation into one SQL statement
into using an intermediary language that would build the optimal statement
instead of stretching the SQL to get what I need.
Thank you,
On Mon, Oct 7, 2019 at 4:44 PM Sjoerd Mullender 
wrote:
...
On 07/10/2019 14.33, Roberto Cornacchia wrote:
...
Hi Daniel,
[...]
...
In this case, you would get better performance by implementing a BAT
function (not a function that works on a single value, but on a column
of values).
You would write that in C using the GDK api
(
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti...
),
and evaluate the condition i the loop, per value.
However, this requires you to recompile MonetDB to include your new
function.
Recompilation of MonetDB should not be necessary.  See [1].
[1] https://www.monetdb.org/hg/MonetDB-extend/
--
Sjoerd Mullender
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________
developers-list mailing list
developers-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/developers-list