On Wed, Mar 04, 2015 at 04:40:36PM +0100, Roberto Cornacchia wrote:
Hello,
I have a function definition
command batstemmer.stem(terms:bat[:oid,:str], stemmer_name:str):bat [:oid,:str] address CMDbatstem comment "Wrapper for snowball stemmer";
which internally uses the snowball stemmer (http:// snowball.tartarus.org/).
When the bat to be stemmed is large enough, mitosis will split it into chunks and call the function "stem" on each chunk, possibly in parallel.
Problem is, the snowball stemmer implementation appears to be thread-unsafe, which causes a SIGSEGV.
Indeed, using the no_mitosis_pipe solves the issue. However, this solution is suboptimal.
Another solution I found is to mark the mal signature as {unsafe}. This works, although it does something a bit silly: it splits the table into chunks, then repacks everything, and finally runs my function on the re-packed bat (basically wasting effort on a useless split + repack).
Now, my question is: is there a more focussed property to use? {unsafe} implies thread-unsafe, but it is actually stronger than that. For example, it also implies that there might be side-effects. Therefore, the result cannot be recycled. In my case, instead, the result is perfectly safe to be reused.
Thanks for any tip.
Roberto Roberto
The easiest solution is to add a mutex in the wrapper function. Unsafe indeed flags a function for the dataflow optimizer (ie isn't run concurrently). Niels
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl