Python UDFs with blob column
Hello, I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed. A look at the code confirmed that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string". I have thus written an implementation of this feature. Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks. Best regards, Florestan
Hi Florestan,
On 12 Jun 2018, at 16:56, Florestan De Moor
wrote: I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed. A look at the code confirmed that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string". I have thus written an implementation of this feature. Great!
Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks.
I suggest you send a patch (hg diff) to this mailing list. Best, Hannes
Hi Hannes, Thank you for your answer! Here is the patch in attachment. Best, Florestan On 12/06/2018 11:39, Hannes Mühleisen wrote:
Hi Florestan,
On 12 Jun 2018, at 16:56, Florestan De Moor
wrote: I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed. A look at the code confirmed that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string". I have thus written an implementation of this feature. Great!
Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks. I suggest you send a patch (hg diff) to this mailing list.
Best,
Hannes
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
Hello Florestan, Your patch has been applied and will be part of the next feature release (https://dev.monetdb.org/hg/MonetDB/rev/7aaeaa80867f). Thank you very much for the contribution. Best regards, Panos. Florestan De Moor @ 2018-06-12 16:33 GMT:
Hi Hannes,
Thank you for your answer! Here is the patch in attachment.
Best,
Florestan
On 12/06/2018 11:39, Hannes Mühleisen wrote:
Hi Florestan,
On 12 Jun 2018, at 16:56, Florestan De Moor
wrote: I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed. A look at the code confirmed that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string". I have thus written an implementation of this feature. Great!
Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks. I suggest you send a patch (hg diff) to this mailing list.
Best,
Hannes
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
diff -r d18b0a317120 sql/backends/monet5/UDF/pyapi/conversion.c --- a/sql/backends/monet5/UDF/pyapi/conversion.c Tue Jun 12 09:45:28 2018 +0200 +++ b/sql/backends/monet5/UDF/pyapi/conversion.c Tue Jun 12 12:23:25 2018 -0400 @@ -825,24 +825,39 @@ bool *mask = NULL; char *data = NULL; blob *ele_blob; - size_t blob_fixed_size = -1; + size_t blob_fixed_size = ret->memory_size; + + PyObject *pickle_module = NULL, *pickle = NULL; + bool gstate = 0; + if (ret->result_type == NPY_OBJECT) { - // FIXME: check for byte array/or pickle object to string - msg = createException(MAL, "pyapi.eval", - SQLSTATE(PY000) "Python object to BLOB not supported yet."); - goto wrapup; + // Python objects, we may need to pickle them, so we + // may execute Python code, we have to obtain the GIL + gstate = Python_ObtainGIL(); + pickle_module = PyImport_ImportModule("pickle"); + if (pickle_module == NULL) { + msg = createException(MAL, "pyapi.eval", + SQLSTATE(PY000) "Can't load pickle module to pickle python object to blob"); + Python_ReleaseGIL(gstate); + goto wrapup; + } + blob_fixed_size = 0; // Size depends on the objects } + if (ret->mask_data != NULL) { mask = (bool *)ret->mask_data; } if (ret->array_data == NULL) { msg = createException(MAL, "pyapi.eval", SQLSTATE(PY000) "No return value stored in the structure."); + if (ret->result_type == NPY_OBJECT) { + Py_XDECREF(pickle_module); + Python_ReleaseGIL(gstate); + } goto wrapup; } data = (char *)ret->array_data; data += (index_offset * ret->count) * ret->memory_size; - blob_fixed_size = ret->memory_size; b = COLnew(seqbase, TYPE_sqlblob, (BUN)ret->count, TRANSIENT); b->tnil = 0; b->tnonil = 1; @@ -850,26 +865,68 @@ b->tsorted = 0; b->trevsorted = 0; for (iu = 0; iu < ret->count; iu++) { + + char* memcpy_data; size_t blob_len = 0; + + if (ret->result_type == NPY_OBJECT) { + PyObject *object = *((PyObject **)&data[0]); + if (PyByteArray_Check(object)) { + memcpy_data = PyByteArray_AsString(object); + blob_len = pyobject_get_size(object); + } else { + pickle = PyObject_CallMethod(pickle_module, "dumps", "O", object); + if (pickle == NULL) { + msg = createException(MAL, "pyapi.eval", + SQLSTATE(PY000) "Can't pickle object to blob"); + Py_XDECREF(pickle_module); + Python_ReleaseGIL(gstate); + goto wrapup; + } + memcpy_data = PyBytes_AsString(pickle); + blob_len = pyobject_get_size(pickle); + Py_XDECREF(pickle); + } + if (memcpy_data == NULL) { + msg = createException(MAL, "pyapi.eval", + SQLSTATE(PY000) "Can't get blob pickled object as char*"); + Py_XDECREF(pickle_module); + Python_ReleaseGIL(gstate); + goto wrapup; + } + } else { + memcpy_data = data; + } + if (mask && mask[iu]) { ele_blob = (blob *)GDKmalloc(offsetof(blob, data)); ele_blob->nitems = ~(size_t)0; } else { if (blob_fixed_size > 0) { blob_len = blob_fixed_size; - } else { - assert(0); } ele_blob = GDKmalloc(blobsize(blob_len)); ele_blob->nitems = blob_len; - memcpy(ele_blob->data, data, blob_len); + memcpy(ele_blob->data, memcpy_data, blob_len); } - if (BUNappend(b, ele_blob, false) != GDK_SUCCEED) { + if (BUNappend(b, ele_blob, FALSE) != GDK_SUCCEED) { + if (ret->result_type == NPY_OBJECT) { + Py_XDECREF(pickle_module); + Python_ReleaseGIL(gstate); + } goto bunins_failed; } GDKfree(ele_blob); data += ret->memory_size; + } + + // We are done, we can release the GIL + if (ret->result_type == NPY_OBJECT) { + Py_XDECREF(pickle_module); + Python_ReleaseGIL(gstate); + } + BATsetcount(b, (BUN)ret->count); BATsettrivprop(b); } else { _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
participants (3)
-
Florestan De Moor
-
Hannes Mühleisen
-
Panagiotis Koutsourakis