RE: Inline function much faster than UDF in C?

12 Dec 2013

      Hi Stefan,

Thanks once again to you and Sjoerd, you have both been extremely helpful!

I have more simple question about integration of the BAT function version.

When I add the scalar version to the SQL catalog I have something like this:

    module udf;
    command myudf(one:dbl,two:dbl):dbl
    address myudf_scalar_function
    comment "my udf";

and then in SQL:

    create function myudf(one double,two double)
    returns double external name udf.myudf;

If I have the BAT version registered in a MAL module like so:

    module batudf;    
    command myudf(one:bat[:oid,:dbl],two[:oid,:dbl]):bat[:oid,:dbl]
    address myudf_bat_function
    comment "my udf";

should I also register the BAT version in the SQL catalog?  It seems that the UDFBATfuse functions in the example are not added.
Does the SQL engine have another mechanism for finding the BAT version?

Best regards and thanks,

Alastair

________________________________________
From: users-list  on behalf of Stefan Manegold 
Sent: 12 December 2013 13:44
To: Communication channel for MonetDB users
Subject: Re: Inline function much faster than UDF in C?
...
Hi Stefan,
Thanks again for your advice, I am looking through the BATfuse example and
attempting to extrapolate at the minute.
Is the BAT version automagically inserted by optimiser into the query plan?
Yes, indeed ;-)
...
In the BAT fuse example, are the tests related to tail ordering important or
just a hint for future optimisations?
The latter.
...
I'm not sure if this would be useful for my application of a general floating
point function to arbitrarily ordered input parameters.
Not necessarily, in particular if you only aggregate the result afterwards;
hence, it should be safe to simple indicate that the result of your function
is not (know to be) ordered.
...
I'm very new to BAT terminology and BATs in general.  Is it safe for me to
assume the head is dense, sorted and the seqbase is the same as the input?
Yes. In fact, dense (and thus unique and sorted) is nowadays the only option
for BAT heads. And if, as in your case, the results BAT has exactly one tuple
("BUN") for each input tuple, then also the dense head of the result should
(in fact must) have the same seqbase as the dense head of the input.
...
I guess that these BAT functions are called from the outputs of "group by" in
order to calculate the SUM aggregate for each group?
Almost: the result of your function is use together with the result of a group-by
function to then calculate the aggregates for group.

Stefan
...
Best regards and thanks,
Alastair
________________________________________
From: users-list
 on behalf
of Stefan Manegold 
Sent: 10 December 2013 19:18
To: Communication channel for MonetDB users
Subject: Re: Inline function much faster than UDF in C?
Hi Alastair,
please be aware that your function is not an aggregate by itself.
Rather, you aggregate (via standard SUM()) over the result of your function.
For documentation and exmaples of how to implement scalar and bulk versions
of UDFs, please see
http://dev.monetdb.org/hg/MonetDB/file/tip/sql/backends/monet5/UDF/
Best,
Stefan
----- Original Message -----
...
Hi Sjoerd,
Thanks for your response.  I think you are almost certainly right about the
slow loop in MAL.
I hope that operating in bulk also means I create an implementation using
all
available SIMD lanes.
Are there any examples in the source of implementing a bulk version of this
type of aggregate that I could use as a guide?
Best regards,
Alastair
________________________________________
From: users-list
 on behalf
of Sjoerd Mullender 
Sent: 09 December 2013 13:08
To: users-list@monetdb.org
Subject: Re: Inline function much faster than UDF in C?
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
To get the most out of a C implementation, you need both a scalar
version (one value at the time), and a bulk version (complete BATs as
input).  If you only have the former, the code generator will generate
a slow loop in MAL calling the scalar function.  And then it is very
well possible that the bulk versions of the inline expression in your
second attempt is a lot faster.
On 2013-12-09 13:47, Alastair McKinley wrote:
...
Hi everyone,
I have only been using MonetDB for a few weeks now and I have come
across a very unexpected result when using User Defined Functions
in C.
I have a query which does essentially this:
TABLE small_table ( int col1; float p; );
TABLE big_table ( int col1; int col2; float p; );
SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table
small inner join big_table big on big.col2 == small.col1 group by
big.col1;
My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))
I wrote an optimised version of this function using sse intrinsics
in C and compiled it into MonetDB.
To my surprise, I discovered that writing this function inline in
the query like so:
1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))
is a lot faster!
The query takes approx 2 seconds with the C version and less than
0.5 seconds with the inline version in SQL!
Can anyone explain this strange result?
Best regards,
Alastair
_______________________________________________ users-list mailing
list users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
- --
Sjoerd Mullender
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQCVAwUBUqXA1j7g04AjvIQpAQLGnwQAjDYkXF/z+AIcGG4/2+WZtjF5S0Z1jvLJ
/sARm3pjN9e6d9U0k/KR5MmGZ3DWJNPDqYrGFqZIWp2+B46YMB72DoXW9hAudcbP
P4azUiyvG0BccaHjf3gco5CA8U2DA6nGNXpX//NOr0ievVRbdMT/sxRCPNGbXBVk
L0dFS9GqaR4=
=sPj/
-----END PGP SIGNATURE-----
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |

_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list