Inline function much faster than UDF in C?

Alastair McKinley

9 Dec 2013 9 Dec '13

12:47 p.m.

Hi everyone, I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C. I have a query which does essentially this: TABLE small_table ( int col1; float p; ); TABLE big_table ( int col1; int col2; float p; ); SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1; My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y)) I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB. To my surprise, I discovered that writing this function inline in the query like so: 1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p)))) is a lot faster! The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL! Can anyone explain this strange result? Best regards, Alastair

Attachments:

attachment.html (text/html — 2.0 KB)

Show replies by date

Martin Kersten

9 Dec 9 Dec

1:05 p.m.

Hi You stumbled upon a well-known phenomenon in main-memory database processing. Function calls are not for free and vectorised processing is fast. Furthermore, common-sub expression elimination optimizers can lead to further improvements. regards, Martin On 09/12/13 13:47, Alastair McKinley wrote:

...

Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Stefan Manegold

1:05 p.m.

Hi Alastair, please profile your queries as described here http://www.monetdb.org/Documentation/Manuals/SQLreference/Trace http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming to see which query spends how much time where, and share the results, so that we can interpret them. Thanks! Stefan ----- Original Message -----

...

Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Alastair McKinley

10 Dec 10 Dec

6:13 p.m.

Hi Stefan, Thanks for your reply. Just to clarify things, should I prepend the statement with TRACE and then "select * from tracelog()"; ? Should I then use mclient --log to capture the output? Best regards, Alastair ________________________________________ From: users-list on behalf of Stefan Manegold Sent: 09 December 2013 13:05 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C? Hi Alastair, please profile your queries as described here http://www.monetdb.org/Documentation/Manuals/SQLreference/Trace http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming to see which query spends how much time where, and share the results, so that we can interpret them. Thanks! Stefan ----- Original Message -----

...

Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Stefan Manegold

7:14 p.m.

Hi Alastair, the documentation at http://www.monetdb.org/Documentation/Manuals/SQLreference/Trace is indeed misleading --- sorry! Simply prefixing your SQL statement with TRACE in mclient will create both the normal result set of the query and a second result set containing the performance trace. Simply redirecting both result sets, i.e., the output of mclient into a file will do the job. Stefan ----- Original Message -----

...

Hi Stefan,

Thanks for your reply.

Just to clarify things, should I prepend the statement with TRACE and then "select * from tracelog()"; ?

Should I then use mclient --log to capture the output?

Best regards,

Alastair ________________________________________ From: users-list on behalf of Stefan Manegold Sent: 09 December 2013 13:05 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C?

Hi Alastair,

please profile your queries as described here http://www.monetdb.org/Documentation/Manuals/SQLreference/Trace http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming

to see which query spends how much time where, and share the results, so that we can interpret them.

Thanks! Stefan

----- Original Message -----

...
Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Sjoerd Mullender

9 Dec 9 Dec

1:08 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 To get the most out of a C implementation, you need both a scalar version (one value at the time), and a bulk version (complete BATs as input). If you only have the former, the code generator will generate a slow loop in MAL calling the scalar function. And then it is very well possible that the bulk versions of the inline expression in your second attempt is a lot faster. On 2013-12-09 13:47, Alastair McKinley wrote:

...

Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQCVAwUBUqXA1j7g04AjvIQpAQLGnwQAjDYkXF/z+AIcGG4/2+WZtjF5S0Z1jvLJ /sARm3pjN9e6d9U0k/KR5MmGZ3DWJNPDqYrGFqZIWp2+B46YMB72DoXW9hAudcbP P4azUiyvG0BccaHjf3gco5CA8U2DA6nGNXpX//NOr0ievVRbdMT/sxRCPNGbXBVk L0dFS9GqaR4= =sPj/ -----END PGP SIGNATURE-----

Alastair McKinley

10 Dec 10 Dec

6:18 p.m.

Hi Sjoerd, Thanks for your response. I think you are almost certainly right about the slow loop in MAL. I hope that operating in bulk also means I create an implementation using all available SIMD lanes. Are there any examples in the source of implementing a bulk version of this type of aggregate that I could use as a guide? Best regards, Alastair ________________________________________ From: users-list on behalf of Sjoerd Mullender Sent: 09 December 2013 13:08 To: users-list@monetdb.org Subject: Re: Inline function much faster than UDF in C? -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 To get the most out of a C implementation, you need both a scalar version (one value at the time), and a bulk version (complete BATs as input). If you only have the former, the code generator will generate a slow loop in MAL calling the scalar function. And then it is very well possible that the bulk versions of the inline expression in your second attempt is a lot faster. On 2013-12-09 13:47, Alastair McKinley wrote:

...

Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Stefan Manegold

7:18 p.m.

Hi Alastair, please be aware that your function is not an aggregate by itself. Rather, you aggregate (via standard SUM()) over the result of your function. For documentation and exmaples of how to implement scalar and bulk versions of UDFs, please see http://dev.monetdb.org/hg/MonetDB/file/tip/sql/backends/monet5/UDF/ Best, Stefan ----- Original Message -----

...

Hi Sjoerd,

Thanks for your response. I think you are almost certainly right about the slow loop in MAL. I hope that operating in bulk also means I create an implementation using all available SIMD lanes.

Are there any examples in the source of implementing a bulk version of this type of aggregate that I could use as a guide?

Best regards,

Alastair

________________________________________ From: users-list on behalf of Sjoerd Mullender Sent: 09 December 2013 13:08 To: users-list@monetdb.org Subject: Re: Inline function much faster than UDF in C?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

To get the most out of a C implementation, you need both a scalar version (one value at the time), and a bulk version (complete BATs as input). If you only have the former, the code generator will generate a slow loop in MAL calling the scalar function. And then it is very well possible that the bulk versions of the inline expression in your second attempt is a lot faster.

On 2013-12-09 13:47, Alastair McKinley wrote:

...
Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUqXA1j7g04AjvIQpAQLGnwQAjDYkXF/z+AIcGG4/2+WZtjF5S0Z1jvLJ /sARm3pjN9e6d9U0k/KR5MmGZ3DWJNPDqYrGFqZIWp2+B46YMB72DoXW9hAudcbP P4azUiyvG0BccaHjf3gco5CA8U2DA6nGNXpX//NOr0ievVRbdMT/sxRCPNGbXBVk L0dFS9GqaR4= =sPj/ -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Alastair McKinley

12 Dec 12 Dec

1:17 p.m.

Hi Stefan, Thanks again for your advice, I am looking through the BATfuse example and attempting to extrapolate at the minute. Is the BAT version automagically inserted by optimiser into the query plan? In the BAT fuse example, are the tests related to tail ordering important or just a hint for future optimisations? I'm not sure if this would be useful for my application of a general floating point function to arbitrarily ordered input parameters. I'm very new to BAT terminology and BATs in general. Is it safe for me to assume the head is dense, sorted and the seqbase is the same as the input? I guess that these BAT functions are called from the outputs of "group by" in order to calculate the SUM aggregate for each group? Best regards and thanks, Alastair ________________________________________ From: users-list on behalf of Stefan Manegold Sent: 10 December 2013 19:18 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C? Hi Alastair, please be aware that your function is not an aggregate by itself. Rather, you aggregate (via standard SUM()) over the result of your function. For documentation and exmaples of how to implement scalar and bulk versions of UDFs, please see http://dev.monetdb.org/hg/MonetDB/file/tip/sql/backends/monet5/UDF/ Best, Stefan ----- Original Message -----

...

Hi Sjoerd,

Thanks for your response. I think you are almost certainly right about the slow loop in MAL. I hope that operating in bulk also means I create an implementation using all available SIMD lanes.

Are there any examples in the source of implementing a bulk version of this type of aggregate that I could use as a guide?

Best regards,

Alastair

________________________________________ From: users-list on behalf of Sjoerd Mullender Sent: 09 December 2013 13:08 To: users-list@monetdb.org Subject: Re: Inline function much faster than UDF in C?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

To get the most out of a C implementation, you need both a scalar version (one value at the time), and a bulk version (complete BATs as input). If you only have the former, the code generator will generate a slow loop in MAL calling the scalar function. And then it is very well possible that the bulk versions of the inline expression in your second attempt is a lot faster.

On 2013-12-09 13:47, Alastair McKinley wrote:

...
Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUqXA1j7g04AjvIQpAQLGnwQAjDYkXF/z+AIcGG4/2+WZtjF5S0Z1jvLJ /sARm3pjN9e6d9U0k/KR5MmGZ3DWJNPDqYrGFqZIWp2+B46YMB72DoXW9hAudcbP P4azUiyvG0BccaHjf3gco5CA8U2DA6nGNXpX//NOr0ievVRbdMT/sxRCPNGbXBVk L0dFS9GqaR4= =sPj/ -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Stefan Manegold

1:44 p.m.

...

Hi Stefan,

Thanks again for your advice, I am looking through the BATfuse example and attempting to extrapolate at the minute.

Is the BAT version automagically inserted by optimiser into the query plan?

Yes, indeed ;-)

...

In the BAT fuse example, are the tests related to tail ordering important or just a hint for future optimisations?

The latter.

...

I'm not sure if this would be useful for my application of a general floating point function to arbitrarily ordered input parameters.

Not necessarily, in particular if you only aggregate the result afterwards; hence, it should be safe to simple indicate that the result of your function is not (know to be) ordered.

...

I'm very new to BAT terminology and BATs in general. Is it safe for me to assume the head is dense, sorted and the seqbase is the same as the input?

Yes. In fact, dense (and thus unique and sorted) is nowadays the only option for BAT heads. And if, as in your case, the results BAT has exactly one tuple ("BUN") for each input tuple, then also the dense head of the result should (in fact must) have the same seqbase as the dense head of the input.

...

I guess that these BAT functions are called from the outputs of "group by" in order to calculate the SUM aggregate for each group?

Almost: the result of your function is use together with the result of a group-by function to then calculate the aggregates for group. Stefan

...

Best regards and thanks,

Alastair

________________________________________ From: users-list on behalf of Stefan Manegold Sent: 10 December 2013 19:18 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C?

Hi Alastair,

please be aware that your function is not an aggregate by itself. Rather, you aggregate (via standard SUM()) over the result of your function.

For documentation and exmaples of how to implement scalar and bulk versions of UDFs, please see http://dev.monetdb.org/hg/MonetDB/file/tip/sql/backends/monet5/UDF/

Best, Stefan

----- Original Message -----

...
Hi Sjoerd,

Thanks for your response. I think you are almost certainly right about the slow loop in MAL. I hope that operating in bulk also means I create an implementation using all available SIMD lanes.

Are there any examples in the source of implementing a bulk version of this type of aggregate that I could use as a guide?

Best regards,

Alastair

________________________________________ From: users-list on behalf of Sjoerd Mullender Sent: 09 December 2013 13:08 To: users-list@monetdb.org Subject: Re: Inline function much faster than UDF in C?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

To get the most out of a C implementation, you need both a scalar version (one value at the time), and a bulk version (complete BATs as input). If you only have the former, the code generator will generate a slow loop in MAL calling the scalar function. And then it is very well possible that the bulk versions of the inline expression in your second attempt is a lot faster.

On 2013-12-09 13:47, Alastair McKinley wrote:

...
Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUqXA1j7g04AjvIQpAQLGnwQAjDYkXF/z+AIcGG4/2+WZtjF5S0Z1jvLJ /sARm3pjN9e6d9U0k/KR5MmGZ3DWJNPDqYrGFqZIWp2+B46YMB72DoXW9hAudcbP P4azUiyvG0BccaHjf3gco5CA8U2DA6nGNXpX//NOr0ievVRbdMT/sxRCPNGbXBVk L0dFS9GqaR4= =sPj/ -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Sjoerd Mullender

1:59 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-12-12 14:44, Stefan Manegold wrote:

...

...
Hi Stefan,

Thanks again for your advice, I am looking through the BATfuse example and attempting to extrapolate at the minute.

Is the BAT version automagically inserted by optimiser into the query plan?

Yes, indeed ;-)

...
In the BAT fuse example, are the tests related to tail ordering important or just a hint for future optimisations?

The latter.

Note that if a property is set, it *must* be true, if it is not set, it is not known to be true. I.e., if the sorted property is set, the column must in actual fact be sorted, but if the sorted property is not set, the column may or may not be sorted. In other words, it is safe to clear the properties (sorted, revsorted, dense, key, nil, nonil; the "set" property is not actually used and should always be 0). However, the head column should be dense (i.e. sorted, nonil, and dense all set) with a seqbase that is the same as the input head column. Best is to use TYPE_void as the type of the head column.

...

...
I'm not sure if this would be useful for my application of a general floating point function to arbitrarily ordered input parameters.

Not necessarily, in particular if you only aggregate the result afterwards; hence, it should be safe to simple indicate that the result of your function is not (know to be) ordered.

If there is a simple way to determine that a column is (reverse) sorted if the input is (reverse) sorted, then by all means set the appropriate property (like multiplying all values with the same constant, or when applying an order preserving function to all values), but if you need to look at all values to determine whether the result is sorted, then don't bother.

...

...
I'm very new to BAT terminology and BATs in general. Is it safe for me to assume the head is dense, sorted and the seqbase is the same as the input?

Yes. In fact, dense (and thus unique and sorted) is nowadays the only option for BAT heads. And if, as in your case, the results BAT has exactly one tuple ("BUN") for each input tuple, then also the dense head of the result should (in fact must) have the same seqbase as the dense head of the input.

...
I guess that these BAT functions are called from the outputs of "group by" in order to calculate the SUM aggregate for each group?

Almost: the result of your function is use together with the result of a group-by function to then calculate the aggregates for group.

Stefan

...
Best regards and thanks,

Alastair

________________________________________ From: users-list on behalf of Stefan Manegold Sent: 10 December 2013 19:18 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C?

Hi Alastair,

please be aware that your function is not an aggregate by itself. Rather, you aggregate (via standard SUM()) over the result of your function.

For documentation and exmaples of how to implement scalar and bulk versions of UDFs, please see http://dev.monetdb.org/hg/MonetDB/file/tip/sql/backends/monet5/UDF/

Best,

...
Stefan

----- Original Message -----

...
Hi Sjoerd,

Thanks for your response. I think you are almost certainly right about the slow loop in MAL. I hope that operating in bulk also means I create an implementation using all available SIMD lanes.

Are there any examples in the source of implementing a bulk version of this type of aggregate that I could use as a guide?

Best regards,

Alastair

________________________________________ From: users-list on behalf of Sjoerd Mullender Sent: 09 December 2013 13:08 To: users-list@monetdb.org Subject: Re: Inline function much faster than UDF in C?

To get the most out of a C implementation, you need both a scalar version (one value at the time), and a bulk version (complete BATs as input). If you only have the former, the code generator will generate a slow loop in MAL calling the scalar function. And then it is very well possible that the bulk versions of the inline expression in your second attempt is a lot faster.

On 2013-12-09 13:47, Alastair McKinley wrote:

...
...
...
...
Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

...
The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

...
...
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQCVAwUBUqnBJT7g04AjvIQpAQJT1gQAjSHiBvrYL7LoPb2/uC8PjO9DnEUd2jup FOYV/GwENoTbubzg+EqAxJHEpbLved5A07t+TvEMKlo6SAx0TFQO98yjHGYFoDSy CFLRQbpuOhCsM8htdIIolDStRpCshRYTw40cjAE4syuebN8iel4MCRkXmSRVlfGg gXTzy8DH6Yg= =BIxN -----END PGP SIGNATURE-----

Alastair McKinley

2:49 p.m.

Hi Stefan, Thanks once again to you and Sjoerd, you have both been extremely helpful! I have more simple question about integration of the BAT function version. When I add the scalar version to the SQL catalog I have something like this: module udf; command myudf(one:dbl,two:dbl):dbl address myudf_scalar_function comment "my udf"; and then in SQL: create function myudf(one double,two double) returns double external name udf.myudf; If I have the BAT version registered in a MAL module like so: module batudf; command myudf(one:bat[:oid,:dbl],two[:oid,:dbl]):bat[:oid,:dbl] address myudf_bat_function comment "my udf"; should I also register the BAT version in the SQL catalog? It seems that the UDFBATfuse functions in the example are not added. Does the SQL engine have another mechanism for finding the BAT version? Best regards and thanks, Alastair ________________________________________ From: users-list on behalf of Stefan Manegold Sent: 12 December 2013 13:44 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C?

...

Hi Stefan,

Thanks again for your advice, I am looking through the BATfuse example and attempting to extrapolate at the minute.

Is the BAT version automagically inserted by optimiser into the query plan?

Yes, indeed ;-)

...

In the BAT fuse example, are the tests related to tail ordering important or just a hint for future optimisations?

The latter.

...

I'm not sure if this would be useful for my application of a general floating point function to arbitrarily ordered input parameters.

Not necessarily, in particular if you only aggregate the result afterwards; hence, it should be safe to simple indicate that the result of your function is not (know to be) ordered.

...

I'm very new to BAT terminology and BATs in general. Is it safe for me to assume the head is dense, sorted and the seqbase is the same as the input?

...

I guess that these BAT functions are called from the outputs of "group by" in order to calculate the SUM aggregate for each group?

Almost: the result of your function is use together with the result of a group-by function to then calculate the aggregates for group. Stefan

...

Best regards and thanks,

Alastair

________________________________________ From: users-list on behalf of Stefan Manegold Sent: 10 December 2013 19:18 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C?

Hi Alastair,

please be aware that your function is not an aggregate by itself. Rather, you aggregate (via standard SUM()) over the result of your function.

For documentation and exmaples of how to implement scalar and bulk versions of UDFs, please see http://dev.monetdb.org/hg/MonetDB/file/tip/sql/backends/monet5/UDF/

Best, Stefan

----- Original Message -----

...
Hi Sjoerd,

Thanks for your response. I think you are almost certainly right about the slow loop in MAL. I hope that operating in bulk also means I create an implementation using all available SIMD lanes.

Are there any examples in the source of implementing a bulk version of this type of aggregate that I could use as a guide?

Best regards,

Alastair

________________________________________ From: users-list on behalf of Sjoerd Mullender Sent: 09 December 2013 13:08 To: users-list@monetdb.org Subject: Re: Inline function much faster than UDF in C?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

To get the most out of a C implementation, you need both a scalar version (one value at the time), and a bulk version (complete BATs as input). If you only have the former, the code generator will generate a slow loop in MAL calling the scalar function. And then it is very well possible that the bulk versions of the inline expression in your second attempt is a lot faster.

On 2013-12-09 13:47, Alastair McKinley wrote:

...
Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUqXA1j7g04AjvIQpAQLGnwQAjDYkXF/z+AIcGG4/2+WZtjF5S0Z1jvLJ /sARm3pjN9e6d9U0k/KR5MmGZ3DWJNPDqYrGFqZIWp2+B46YMB72DoXW9hAudcbP P4azUiyvG0BccaHjf3gco5CA8U2DA6nGNXpX//NOr0ievVRbdMT/sxRCPNGbXBVk L0dFS9GqaR4= =sPj/ -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Alastair McKinley

13 Dec 13 Dec

5:06 p.m.

Hi Stefan, I managed to get my query to call my BAT function, but have run into another issue. I get an error "b and g must be aligned". I think it is from gdk/gdk_aggr.c around line 89 in MonetDB-11.15.17. I guess this is a property of my result BAT? I am currently using the UDFFuseBAT as a template so I doing this with my output BAT. bres = BATnew(TYPE_void,TYPE_dbl,n); if (bres == NULL) throw(MAL, "batudf.myudfbat", MAL_MALLOC_FAIL); msg = doMYUDFBAT_dbl(bres,bone,btwo,n); if (msg != MAL_SUCCEED) { BBPreleaseref(bres->batCacheid); throw(MAL, "batudf.udfcalcjsbat", "error in udf calcjs bat"); } else { /* set number of tuples in result BAT */ BATsetcount(bres, n); /* set result properties */ bres->hdense = TRUE; /* result head is dense */ BATseqbase(bres, bone->hseqbase); /* result head has same seqbase as input */ bres->hsorted = 1; /* result head is sorted */ bres->hrevsorted = (BATcount(bres) <= 1); BATkey(bres, TRUE); /* result head is key (unique) */ *ret = bres; return MAL_SUCCEED; } In the MAL wrapper I am doing this: /* do it */ msg = doMYUDF(&bres,bone,btwo); /* release input BAT-descritors */ BBPreleaseref(bone->batCacheid); BBPreleaseref(btwo->batCacheid); if (msg == MAL_SUCCEED) { /* register result BAT in buffer pool */ BBPkeepref((*ires = bres->batCacheid)); } return msg; Am I missing something obvious? I can post the entire source code if necessary. Best regards and thanks, Alastair ________________________________________ From: users-list on behalf of Stefan Manegold Sent: 12 December 2013 13:44 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C?

...

Hi Stefan,

Thanks again for your advice, I am looking through the BATfuse example and attempting to extrapolate at the minute.

Is the BAT version automagically inserted by optimiser into the query plan?

Yes, indeed ;-)

...

In the BAT fuse example, are the tests related to tail ordering important or just a hint for future optimisations?

The latter.

...

I'm not sure if this would be useful for my application of a general floating point function to arbitrarily ordered input parameters.

Not necessarily, in particular if you only aggregate the result afterwards; hence, it should be safe to simple indicate that the result of your function is not (know to be) ordered.

...

I'm very new to BAT terminology and BATs in general. Is it safe for me to assume the head is dense, sorted and the seqbase is the same as the input?

...

I guess that these BAT functions are called from the outputs of "group by" in order to calculate the SUM aggregate for each group?

Almost: the result of your function is use together with the result of a group-by function to then calculate the aggregates for group. Stefan

...

Best regards and thanks,

Alastair

________________________________________ From: users-list on behalf of Stefan Manegold Sent: 10 December 2013 19:18 To: Communication channel for MonetDB users Subject: Re: Inline function much faster than UDF in C?

Hi Alastair,

please be aware that your function is not an aggregate by itself. Rather, you aggregate (via standard SUM()) over the result of your function.

For documentation and exmaples of how to implement scalar and bulk versions of UDFs, please see http://dev.monetdb.org/hg/MonetDB/file/tip/sql/backends/monet5/UDF/

Best, Stefan

----- Original Message -----

...
Hi Sjoerd,

Thanks for your response. I think you are almost certainly right about the slow loop in MAL. I hope that operating in bulk also means I create an implementation using all available SIMD lanes.

Are there any examples in the source of implementing a bulk version of this type of aggregate that I could use as a guide?

Best regards,

Alastair

________________________________________ From: users-list on behalf of Sjoerd Mullender Sent: 09 December 2013 13:08 To: users-list@monetdb.org Subject: Re: Inline function much faster than UDF in C?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

To get the most out of a C implementation, you need both a scalar version (one value at the time), and a bulk version (complete BATs as input). If you only have the former, the code generator will generate a slow loop in MAL calling the scalar function. And then it is very well possible that the bulk versions of the inline expression in your second attempt is a lot faster.

On 2013-12-09 13:47, Alastair McKinley wrote:

...
Hi everyone,

I have only been using MonetDB for a few weeks now and I have come across a very unexpected result when using User Defined Functions in C.

I have a query which does essentially this:

TABLE small_table ( int col1; float p; );

TABLE big_table ( int col1; int col2; float p; );

SELECT col1,1+SUM(MY_FUNCTION(big.p,small.p)) FROM small_table small inner join big_table big on big.col2 == small.col1 group by big.col1;

My function is f(x,y) = 0.7213*(x*ln(x/x+y) + y*ln(y/x+y))

I wrote an optimised version of this function using sse intrinsics in C and compiled it into MonetDB.

To my surprise, I discovered that writing this function inline in the query like so:

1+SUM(0.7213475*(big.p*LOG(big.p/(big.p+small.p))+small.p*LOG(small.p/(big.p+small.p))))

is a lot faster!

The query takes approx 2 seconds with the C version and less than 0.5 seconds with the inline version in SQL!

Can anyone explain this strange result?

Best regards,

Alastair

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUqXA1j7g04AjvIQpAQLGnwQAjDYkXF/z+AIcGG4/2+WZtjF5S0Z1jvLJ /sARm3pjN9e6d9U0k/KR5MmGZ3DWJNPDqYrGFqZIWp2+B46YMB72DoXW9hAudcbP P4azUiyvG0BccaHjf3gco5CA8U2DA6nGNXpX//NOr0ievVRbdMT/sxRCPNGbXBVk L0dFS9GqaR4= =sPj/ -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

4104

Age (days ago)

4108

Last active (days ago)

List overview

Download

12 comments

4 participants

participants (4)

Alastair McKinley
Martin Kersten
Sjoerd Mullender
Stefan Manegold