UDF function with >3 parameters cannot benefit from using PYTHON_MAP

15 Nov 2017

      Hi,

I created a table with 9 columns and passed 6 columns from the table to a
UDF.  This UDF is fully vectorizable.   However, we found only single
thread was used during the execution.   Then, we came to think about what
the problem behind and we created the two simple functions python_min_map_3
and python_min_map_4 which takes 3 and 4 parameters separately.  The
MonetDB Linux version we used was July 2017.

If PYTHON_MAP is enabled, it would return the results of each segment
processed by parallel code.  The number of results depends on how many
threads are used.  Otherwise, a single value should be returned (e.g.
because of numpy.min).

The result is

- python_min_map_3 returns a couple of numbers
- python_min_map_4 returns a single number

Even when we increased the number of the arguments, a single number was
always returned.  Is it a restriction in using PYTHON_MAP when the number
of arguments should be no more than 3?  Here is our example code below.

CREATE FUNCTION python_min_map_3(x0 FLOAT, x1 FLOAT, x2 FLOAT)
RETURNS FLOAT LANGUAGE PYTHON_MAP {
    return numpy.min(x0)
};

select python_min_map_3(x0, x1, x2) from table_0;

CREATE FUNCTION python_min_map_4(x0 FLOAT, x1 FLOAT, x2 FLOAT, x3 FLOAT)
RETURNS FLOAT LANGUAGE PYTHON_MAP {
    return numpy.min(x0)
};

select python_min_map_4(x0, x1, x2,x3) from table_0;

NumPy in MonetDB reference:
https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb

Best regards,

Hanfeng Chen

Hanfeng Chen

tags

participants (1)