I personally would put it the other way around. The scalar version should in principle always be there, as it guarantees that every query works.
Then you look at optimizing performance.
1) leave it as is and let it wrap in a MAL loop
2) If it is a C function, it will likely be wrapped in a C loop by manifold.
3) For best performance, you may want to implement the loop yourself with a BAT implementation.
When doing what depends on the nature of the function.
If every scalar call requires costly initialisations or allocations, then the first two options can be very slow. In that case you want to choose option 3 and do all initialisations and allocations outside the loop.
Your function is essentially a tight loop around a strstr() call, so in principle I don't expect much difference between 2) and 3).
There is a catch though. You have a boolean parameter to control casessensitive.
So in case of a BAT implementation, you have 3 columns in input (*also for the possibly constant boolean column*) and your loop looks like (pseudocode):
loop(string,pattern,cs)
if (cs)
strstr(sring,pattern);
else
strcasestr(string,pattern)
Conditions inside loops are not your best friends.
Given that in most use cases your boolean column will be constant (either all true or all false), you can optimize this by doing:
if (cs.sorted && cs.revsorted) /* cs is constant */ {
if (cs[0]) {
loop(string,pattern,cs)
strstr(sring,pattern);
} else {
loop(string,pattern,cs)
strcasestr(string,pattern)
}
} else { /* cs not constant, check at every tuple */
loop(string,pattern,cs)
if (cs)
strstr(sring,pattern);
else
strcasestr(string,pattern)
}
The point here is that option 3) is the only one that allows you to play this way.
I normally use option 3) whenever I see a chance to get some work out of the loop. In all other cases there is no real need.
> BAT operations are faster for most of the operations, but when formulas include multiple conditional evaluations, the parallelism and vectorization advantage is clearly lost and loop × scalar might be the winner, don't you think?
Notice that when you manage to go back to a very tight loop, you got your vectorization opportunities back.
As for parallelism opportunities, I doubt that option 2) would allow any. Parallelism in MonetDB is handled at MAL level. So option 1), if any, would be your best shot for this. Especially if your function can be written in SQL or MAL (these are rare), and they are simple enough to be inlined.