Just to mention, you can also force a tuple-at-the-time evaluation in pure
SQL:
-- BEGIN
declare cs boolean;
set cs=true;
START TRANSACTION;
CREATE OR REPLACE FUNCTION mycontains(s STRING, p STRING, casesensitive
boolean)
RETURNS BOOLEAN
BEGIN
IF casesensitive
THEN RETURN SELECT s like '%'||p||'%';
ELSE RETURN SELECT s ilike '%'||p||'%';
END IF;
END;
CREATE TABLE t(s STRING);
INSERT INTO t VALUES ('apple'),('pear'),('banana'),('orange');
explain select * from t where mycontains(s,'an',cs);;
-- END
If you look at this explain, you'll see it makes a loop that calls the
mycontains() function. This does avoid evaluating both branches. However
the explicit interpreted loop won't make it very fast, perhaps see on your
data. (Note: when the function is simpler, the loop is actually done in C)
On Mon, 7 Oct 2019 at 16:53, Daniel Zvinca
I wasn't sure if this question should be posted here or on regular user list, considering the way statements are generated and possible connections with internal functionality.
I understand the explanations. I should have checked the explain feature myself. I think that I expected the optimizer to notice a constant expression and evaluate it before choosing which real columnar operation needs to be performed (obviously only one). Instead all members are evaluated, the whole explain statement is self-explanatory, indeed. (a delayed bat evaluation mechanism would have helped, I guess, for this case)
I use capi intensively, but for string columns the capi memory management can be a serious limitation for large sets.
Anyway, your answer is very helpful for me, it makes me consider adjusting my approach from a straight translation into one SQL statement into using an intermediary language that would build the optimal statement instead of stretching the SQL to get what I need.
Thank you,
On Mon, Oct 7, 2019 at 4:44 PM Sjoerd Mullender
wrote: On 07/10/2019 14.33, Roberto Cornacchia wrote:
Hi Daniel,
[...]
In this case, you would get better performance by implementing a BAT function (not a function that works on a single value, but on a column of values). You would write that in C using the GDK api ( https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/UserDefinedFuncti... ), and evaluate the condition i the loop, per value. However, this requires you to recompile MonetDB to include your new function.
Recompilation of MonetDB should not be necessary. See [1].
[1] https://www.monetdb.org/hg/MonetDB-extend/
-- Sjoerd Mullender
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list