That is what I feared. Yes I have some loopback queries in the function, I was hoping that each process would manage the query on its own, but unfortunately that’s not the case. Well, thanks for your support! Regards, Stefano
On 04 Jul 2016, at 11:42, Mark Raasveldt
wrote: Hey Stefano,
Are you perhaps using a lot of loopback queries in your UDF? Because of locking reasons, loopback queries are send back to the main process and executed there. After they are executed, the result data is copied back to the forked process. If loopback queries are your primary bottle neck, then parallelizing the Python function does not do much. In fact, it might degrade the performance because return data of the loopback query has to be copied between processes, which does not have to happen if the code is executed in the main process. The main advantage of parallelizing Python functions in this case would be sending multiple queries to the database at the same time, however, if the individual queries are already parallelized then you should not see a very big gain here either.
Regards,
Mark
----- Original Message ----- From: "Stefano Fioravanzo"
To: "users-list" Sent: Monday, July 4, 2016 11:28:18 AM Subject: Re: python map Hello,
Yes I am using a table-producing function, but the returned table has just one column. I have noticed that in this case it works just like without a table-producing function. (But to be sure, I have continued my tests with the return type set to STRING).
About the mserver5 processes: Now I am using 3 threads. When I fire the query, 3 mserver5 processes are created so I have 4 processes (the main mserver5 process + the 3 new ones). The 3 processes start to run in parallel ( At the beginning of the function I have a ‘print “function started”’, and I can see 3 of them in the log ), but then the main computations are carried on by the main mserver5 process, not the 3 new ones. The main mserver5 process is always executing at 100%, sometime the other 3 execute a little bit for a fraction of a second but that’s all.
Regards,
Stefano
On 01 Jul 2016, at 15:53, Mark Raasveldt
wrote: Hey Stefano,
Hm, according to the MAL plan you send you are still using a table-producing function, are you not? (explain select * from calcola_dati_volantino_simulato((select * from temp_input_table));"). How exactly are you getting that to execute in parallel, or is the query shown there not correct?
As for it not actually being executed in parallel, it might not be using multiple processes. In this case the Python GIL could be blocking your functions from executing in parallel. MonetDB will only use multiple processes if your operating system supports the fork() operation. You can check if it is using multiple processes by executing the shell command "pgrep mserver5" while a query is running, if multiple processes show up then MonetDB is using multiple processes, otherwise it is executing everything in the same process which means that the GIL could be blocking concurrent execution.
Regards,
Mark
----- Original Message ----- From: "Stefano Fioravanzo"
To: "users-list" Sent: Thursday, June 30, 2016 3:01:58 PM Subject: Re: python map Hey Mark,
I have tested the options you gave me to run queries with python_map. For now, I have decided to create an empty table and fill it later, using the parallelized function. So I made a few tests and this approach should work. Here you can find the explain of the query (I have set just 3 threads for now) http://pastie.org/private/ar4kbpnn7qiadulxp5g http://pastie.org/private/ar4kbpnn7qiadulxp5g . The operator ‘batpyapimap.eval’ is called 3 times so everything should be alright. Except for the fact that the 3 python calls do not run in parallel but one after the other. I can see clearly from the logs I print that the python function fires up the first time with the first slice of input parameters, when it completes it fires up a second time and so on… Why is that? Currently I am not gaining anything because the zero actual parallelization..
Thank you,
Stefano _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list