Hello, I have a long and complex embedded python function. This function depends on the single rows of the input columns, so I would like to parallelize it. I’ve read about the post on embedded python and python_map. I understand that using python_map should automatically split (in a not specified way) the input columns and fire different (how many?) python processes executing on their input slice. Finally it should pack all the partial results. Well, I have tried to use python_map, but nothing different happened. Is there something more I have to do apart from setting LANGUAGE PYTHON_MAP? Regards, Stefano
Hey Stefano,
Indeed, all you have to do is create the function with LANGUAGE PYTHON_MAP instead of LANGUAGE PYTHON to parallelize the function. It should be noted that the parallelization uses a heuristic based on table size to determine if parallelization is worth the effort. If you are doing expensive operations on a small table then the functions might indeed not be parallelized because it incorrectly assumes the parallelization is not worth the effort. You can force parallelization of queries regardless of table size by starting mserver5 with the --forcemito option.
If you want to check if your function is parallelized you can look at the explain output (type 'explain <SQL query>;' instead of <SQL query>), this should present you with the generated MAL plan that specifies how the SQL query is executed. An example of a MAL plan with parallel Python execution is given below (note that 'batpyapimap.eval', which is the python MAL operator, is called multiple times).
The amount of Python processes launched should be equal to the amount of cores on your machine, as specified in the startup of mserver5 (e.g. # Serving database 'demo', using 4 threads). You can change the amount of threads with the option '--set gdk_nr_threads=n' if you want to use a different amount of threads.
Hope that helps.
Mark
+--------------------------------------------------------------------------------------------------+
| mal |
+==================================================================================================+
| function user.s8_1():void; |
| X_30:void := querylog.define("explain select python_function(i) from integers;","default_pip |
: e",46); :
| barrier X_93 := language.dataflow(); |
| X_13 := bat.new(nil:oid,nil:str); |
| X_21 := bat.append(X_13,"sys.L"); |
| X_16 := bat.new(nil:oid,nil:str); |
| X_23 := bat.append(X_16,"python_function_i"); |
| X_17 := bat.new(nil:oid,nil:str); |
| X_25 := bat.append(X_17,"int"); |
| X_18 := bat.new(nil:oid,nil:int); |
| X_27 := bat.append(X_18,32); |
| X_20 := bat.new(nil:oid,nil:int); |
| X_29 := bat.append(X_20,0); |
| X_1 := sql.mvc(); |
| C_48:bat[:oid] := sql.tid(X_1,"sys","integers",0,4); |
| X_53:bat[:int] := sql.bind(X_1,"sys","integers","i",0,0,4); |
| (C_57:bat[:oid],X_58:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,0,4); |
| X_66 := sql.delta(X_53,C_57,X_58); |
| X_70 := algebra.projection(C_48,X_66); |
| X_74 := batpyapimap.eval(nil,"{ return i * 2 };",X_70); |
| C_49:bat[:oid] := sql.tid(X_1,"sys","integers",1,4); |
| X_54:bat[:int] := sql.bind(X_1,"sys","integers","i",0,1,4); |
| (C_59:bat[:oid],X_60:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,1,4); |
| X_67 := sql.delta(X_54,C_59,X_60); |
| X_71 := algebra.projection(C_49,X_67); |
| X_75 := batpyapimap.eval(nil,"{ return i * 2 };",X_71); |
| C_50:bat[:oid] := sql.tid(X_1,"sys","integers",2,4); |
| X_55:bat[:int] := sql.bind(X_1,"sys","integers","i",0,2,4); |
| (C_61:bat[:oid],X_62:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,2,4); |
| X_68 := sql.delta(X_55,C_61,X_62); |
| X_72 := algebra.projection(C_50,X_68); |
| X_76 := batpyapimap.eval(nil,"{ return i * 2 };",X_72); |
| C_52:bat[:oid] := sql.tid(X_1,"sys","integers",3,4); |
| X_56:bat[:int] := sql.bind(X_1,"sys","integers","i",0,3,4); |
| (C_63:bat[:oid],X_64:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,3,4); |
| X_7:bat[:int] := sql.bind(X_1,"sys","integers","i",1); |
| X_69 := sql.delta(X_56,C_63,X_64,X_7); |
| X_73 := algebra.projection(C_52,X_69); |
| X_77 := batpyapimap.eval(nil,"{ return i * 2 };",X_73); |
| X_87 := mat.packIncrement(X_74,4); |
| X_89 := mat.packIncrement(X_87,X_75); |
| X_90 := mat.packIncrement(X_89,X_76); |
| X_9:bat[:int] := mat.packIncrement(X_90,X_77); |
| exit X_93; |
| sql.resultSet(X_21,X_23,X_25,X_27,X_29,X_9); |
| end user.s8_1; |
+--------------------------------------------------------------------------------------------------+
----- Original Message -----
From: "Stefano Fioravanzo"
Hi Mark, I have tried what you said but with no results. Here (http://pastie.org/private/cdd0rzhpdyojuq6i0drkra http://pastie.org/private/cdd0rzhpdyojuq6i0drkra) is the explain of my query without forcing parallelization (I have deleted a few lines because it was too long) So I started mserver5 with —forcemito: mserver5 --set embedded_r=true --set embedded_py=true —dbpath=<path> --daemon=yes --set mapi_open=true --debug=2 —forcemito I did again the explain and…nothing changed. The explain is exactly the same as before. The server starts with ' Serving database ‘dbname', using 16 threads’, so at least monet knows it can use up to 16 threads. The input columns of the python function have 550 rows, not that much but the function takes a lot of time to execute so I really need to run it in parallel. Thanks, Stefano
On 28 Jun 2016, at 23:24, Mark Raasveldt
wrote: Hey Stefano,
Indeed, all you have to do is create the function with LANGUAGE PYTHON_MAP instead of LANGUAGE PYTHON to parallelize the function. It should be noted that the parallelization uses a heuristic based on table size to determine if parallelization is worth the effort. If you are doing expensive operations on a small table then the functions might indeed not be parallelized because it incorrectly assumes the parallelization is not worth the effort. You can force parallelization of queries regardless of table size by starting mserver5 with the --forcemito option.
If you want to check if your function is parallelized you can look at the explain output (type 'explain <SQL query>;' instead of <SQL query>), this should present you with the generated MAL plan that specifies how the SQL query is executed. An example of a MAL plan with parallel Python execution is given below (note that 'batpyapimap.eval', which is the python MAL operator, is called multiple times).
The amount of Python processes launched should be equal to the amount of cores on your machine, as specified in the startup of mserver5 (e.g. # Serving database 'demo', using 4 threads). You can change the amount of threads with the option '--set gdk_nr_threads=n' if you want to use a different amount of threads.
Hope that helps.
Mark
+--------------------------------------------------------------------------------------------------+ | mal | +==================================================================================================+ | function user.s8_1():void; | | X_30:void := querylog.define("explain select python_function(i) from integers;","default_pip | : e",46); : | barrier X_93 := language.dataflow(); | | X_13 := bat.new(nil:oid,nil:str); | | X_21 := bat.append(X_13,"sys.L"); | | X_16 := bat.new(nil:oid,nil:str); | | X_23 := bat.append(X_16,"python_function_i"); | | X_17 := bat.new(nil:oid,nil:str); | | X_25 := bat.append(X_17,"int"); | | X_18 := bat.new(nil:oid,nil:int); | | X_27 := bat.append(X_18,32); | | X_20 := bat.new(nil:oid,nil:int); | | X_29 := bat.append(X_20,0); | | X_1 := sql.mvc(); | | C_48:bat[:oid] := sql.tid(X_1,"sys","integers",0,4); | | X_53:bat[:int] := sql.bind(X_1,"sys","integers","i",0,0,4); | | (C_57:bat[:oid],X_58:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,0,4); | | X_66 := sql.delta(X_53,C_57,X_58); | | X_70 := algebra.projection(C_48,X_66); | | X_74 := batpyapimap.eval(nil,"{ return i * 2 };",X_70); | | C_49:bat[:oid] := sql.tid(X_1,"sys","integers",1,4); | | X_54:bat[:int] := sql.bind(X_1,"sys","integers","i",0,1,4); | | (C_59:bat[:oid],X_60:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,1,4); | | X_67 := sql.delta(X_54,C_59,X_60); | | X_71 := algebra.projection(C_49,X_67); | | X_75 := batpyapimap.eval(nil,"{ return i * 2 };",X_71); | | C_50:bat[:oid] := sql.tid(X_1,"sys","integers",2,4); | | X_55:bat[:int] := sql.bind(X_1,"sys","integers","i",0,2,4); | | (C_61:bat[:oid],X_62:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,2,4); | | X_68 := sql.delta(X_55,C_61,X_62); | | X_72 := algebra.projection(C_50,X_68); | | X_76 := batpyapimap.eval(nil,"{ return i * 2 };",X_72); | | C_52:bat[:oid] := sql.tid(X_1,"sys","integers",3,4); | | X_56:bat[:int] := sql.bind(X_1,"sys","integers","i",0,3,4); | | (C_63:bat[:oid],X_64:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,3,4); | | X_7:bat[:int] := sql.bind(X_1,"sys","integers","i",1); | | X_69 := sql.delta(X_56,C_63,X_64,X_7); | | X_73 := algebra.projection(C_52,X_69); | | X_77 := batpyapimap.eval(nil,"{ return i * 2 };",X_73); | | X_87 := mat.packIncrement(X_74,4); | | X_89 := mat.packIncrement(X_87,X_75); | | X_90 := mat.packIncrement(X_89,X_76); | | X_9:bat[:int] := mat.packIncrement(X_90,X_77); | | exit X_93; | | sql.resultSet(X_21,X_23,X_25,X_27,X_29,X_9); | | end user.s8_1; | +--------------------------------------------------------------------------------------------------+
----- Original Message ----- From: "Stefano Fioravanzo"
To: "users-list" Sent: Tuesday, June 28, 2016 12:03:49 PM Subject: python map Hello,
I have a long and complex embedded python function. This function depends on the single rows of the input columns, so I would like to parallelize it.
I’ve read about the post on embedded python and python_map. I understand that using python_map should automatically split (in a not specified way) the input columns and fire different (how many?) python processes executing on their input slice. Finally it should pack all the partial results.
Well, I have tried to use python_map, but nothing different happened. Is there something more I have to do apart from setting LANGUAGE PYTHON_MAP?
Regards,
Stefano _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hey Stefano,
Ah, I see that you are using a table-producing function. It is unfortunately not possible to parallelize table-producing functions currently. I will have a look to see if I can get table-producing functions to execute in parallel as well. If you want parallelization right now you could rewrite your function to a scalar function if possible (by e.g. pickling each row to a string and returning a set of rows, see: https://dev.monetdb.org/hg/MonetDB/file/default/sql/backends/monet5/Tests/py... for a testcase involving pickling).
Regards,
Mark
----- Original Message -----
From: "Stefano Fioravanzo"
On 28 Jun 2016, at 23:24, Mark Raasveldt
wrote: Hey Stefano,
Indeed, all you have to do is create the function with LANGUAGE PYTHON_MAP instead of LANGUAGE PYTHON to parallelize the function. It should be noted that the parallelization uses a heuristic based on table size to determine if parallelization is worth the effort. If you are doing expensive operations on a small table then the functions might indeed not be parallelized because it incorrectly assumes the parallelization is not worth the effort. You can force parallelization of queries regardless of table size by starting mserver5 with the --forcemito option.
If you want to check if your function is parallelized you can look at the explain output (type 'explain <SQL query>;' instead of <SQL query>), this should present you with the generated MAL plan that specifies how the SQL query is executed. An example of a MAL plan with parallel Python execution is given below (note that 'batpyapimap.eval', which is the python MAL operator, is called multiple times).
The amount of Python processes launched should be equal to the amount of cores on your machine, as specified in the startup of mserver5 (e.g. # Serving database 'demo', using 4 threads). You can change the amount of threads with the option '--set gdk_nr_threads=n' if you want to use a different amount of threads.
Hope that helps.
Mark
+--------------------------------------------------------------------------------------------------+ | mal | +==================================================================================================+ | function user.s8_1():void; | | X_30:void := querylog.define("explain select python_function(i) from integers;","default_pip | : e",46); : | barrier X_93 := language.dataflow(); | | X_13 := bat.new(nil:oid,nil:str); | | X_21 := bat.append(X_13,"sys.L"); | | X_16 := bat.new(nil:oid,nil:str); | | X_23 := bat.append(X_16,"python_function_i"); | | X_17 := bat.new(nil:oid,nil:str); | | X_25 := bat.append(X_17,"int"); | | X_18 := bat.new(nil:oid,nil:int); | | X_27 := bat.append(X_18,32); | | X_20 := bat.new(nil:oid,nil:int); | | X_29 := bat.append(X_20,0); | | X_1 := sql.mvc(); | | C_48:bat[:oid] := sql.tid(X_1,"sys","integers",0,4); | | X_53:bat[:int] := sql.bind(X_1,"sys","integers","i",0,0,4); | | (C_57:bat[:oid],X_58:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,0,4); | | X_66 := sql.delta(X_53,C_57,X_58); | | X_70 := algebra.projection(C_48,X_66); | | X_74 := batpyapimap.eval(nil,"{ return i * 2 };",X_70); | | C_49:bat[:oid] := sql.tid(X_1,"sys","integers",1,4); | | X_54:bat[:int] := sql.bind(X_1,"sys","integers","i",0,1,4); | | (C_59:bat[:oid],X_60:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,1,4); | | X_67 := sql.delta(X_54,C_59,X_60); | | X_71 := algebra.projection(C_49,X_67); | | X_75 := batpyapimap.eval(nil,"{ return i * 2 };",X_71); | | C_50:bat[:oid] := sql.tid(X_1,"sys","integers",2,4); | | X_55:bat[:int] := sql.bind(X_1,"sys","integers","i",0,2,4); | | (C_61:bat[:oid],X_62:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,2,4); | | X_68 := sql.delta(X_55,C_61,X_62); | | X_72 := algebra.projection(C_50,X_68); | | X_76 := batpyapimap.eval(nil,"{ return i * 2 };",X_72); | | C_52:bat[:oid] := sql.tid(X_1,"sys","integers",3,4); | | X_56:bat[:int] := sql.bind(X_1,"sys","integers","i",0,3,4); | | (C_63:bat[:oid],X_64:bat[:int]) := sql.bind(X_1,"sys","integers","i",2,3,4); | | X_7:bat[:int] := sql.bind(X_1,"sys","integers","i",1); | | X_69 := sql.delta(X_56,C_63,X_64,X_7); | | X_73 := algebra.projection(C_52,X_69); | | X_77 := batpyapimap.eval(nil,"{ return i * 2 };",X_73); | | X_87 := mat.packIncrement(X_74,4); | | X_89 := mat.packIncrement(X_87,X_75); | | X_90 := mat.packIncrement(X_89,X_76); | | X_9:bat[:int] := mat.packIncrement(X_90,X_77); | | exit X_93; | | sql.resultSet(X_21,X_23,X_25,X_27,X_29,X_9); | | end user.s8_1; | +--------------------------------------------------------------------------------------------------+
----- Original Message ----- From: "Stefano Fioravanzo"
To: "users-list" Sent: Tuesday, June 28, 2016 12:03:49 PM Subject: python map Hello,
I have a long and complex embedded python function. This function depends on the single rows of the input columns, so I would like to parallelize it.
I’ve read about the post on embedded python and python_map. I understand that using python_map should automatically split (in a not specified way) the input columns and fire different (how many?) python processes executing on their input slice. Finally it should pack all the partial results.
Well, I have tried to use python_map, but nothing different happened. Is there something more I have to do apart from setting LANGUAGE PYTHON_MAP?
Regards,
Stefano _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Yeah, well the solution you proposed could work I guess, but the table created by this python function is very large, and the whole process would be hard to maintain. Parallel execution would really be the solution. I tried using joblib inside the embedded function to fork the process, but with no success. Do you know of some way to do it? Note: Inside the function I am using loopback queries, so if I were able to fork the process I still would need to replicate the _conn object (or open new connections, which I think is not possibile)..so yeah, there’s this other problem too… Thanks, Stefano
Hey Mark, I have tested the options you gave me to run queries with python_map. For now, I have decided to create an empty table and fill it later, using the parallelized function. So I made a few tests and this approach should work. Here you can find the explain of the query (I have set just 3 threads for now) http://pastie.org/private/ar4kbpnn7qiadulxp5g http://pastie.org/private/ar4kbpnn7qiadulxp5g . The operator ‘batpyapimap.eval’ is called 3 times so everything should be alright. Except for the fact that the 3 python calls do not run in parallel but one after the other. I can see clearly from the logs I print that the python function fires up the first time with the first slice of input parameters, when it completes it fires up a second time and so on… Why is that? Currently I am not gaining anything because the zero actual parallelization.. Thank you, Stefano
Hey Stefano,
Hm, according to the MAL plan you send you are still using a table-producing function, are you not? (explain select * from calcola_dati_volantino_simulato((select * from temp_input_table));"). How exactly are you getting that to execute in parallel, or is the query shown there not correct?
As for it not actually being executed in parallel, it might not be using multiple processes. In this case the Python GIL could be blocking your functions from executing in parallel. MonetDB will only use multiple processes if your operating system supports the fork() operation. You can check if it is using multiple processes by executing the shell command "pgrep mserver5" while a query is running, if multiple processes show up then MonetDB is using multiple processes, otherwise it is executing everything in the same process which means that the GIL could be blocking concurrent execution.
Regards,
Mark
----- Original Message -----
From: "Stefano Fioravanzo"
Hello, Yes I am using a table-producing function, but the returned table has just one column. I have noticed that in this case it works just like without a table-producing function. (But to be sure, I have continued my tests with the return type set to STRING). About the mserver5 processes: Now I am using 3 threads. When I fire the query, 3 mserver5 processes are created so I have 4 processes (the main mserver5 process + the 3 new ones). The 3 processes start to run in parallel ( At the beginning of the function I have a ‘print “function started”’, and I can see 3 of them in the log ), but then the main computations are carried on by the main mserver5 process, not the 3 new ones. The main mserver5 process is always executing at 100%, sometime the other 3 execute a little bit for a fraction of a second but that’s all. Regards, Stefano
On 01 Jul 2016, at 15:53, Mark Raasveldt
wrote: Hey Stefano,
Hm, according to the MAL plan you send you are still using a table-producing function, are you not? (explain select * from calcola_dati_volantino_simulato((select * from temp_input_table));"). How exactly are you getting that to execute in parallel, or is the query shown there not correct?
As for it not actually being executed in parallel, it might not be using multiple processes. In this case the Python GIL could be blocking your functions from executing in parallel. MonetDB will only use multiple processes if your operating system supports the fork() operation. You can check if it is using multiple processes by executing the shell command "pgrep mserver5" while a query is running, if multiple processes show up then MonetDB is using multiple processes, otherwise it is executing everything in the same process which means that the GIL could be blocking concurrent execution.
Regards,
Mark
----- Original Message ----- From: "Stefano Fioravanzo"
To: "users-list" Sent: Thursday, June 30, 2016 3:01:58 PM Subject: Re: python map Hey Mark,
I have tested the options you gave me to run queries with python_map. For now, I have decided to create an empty table and fill it later, using the parallelized function. So I made a few tests and this approach should work. Here you can find the explain of the query (I have set just 3 threads for now) http://pastie.org/private/ar4kbpnn7qiadulxp5g http://pastie.org/private/ar4kbpnn7qiadulxp5g . The operator ‘batpyapimap.eval’ is called 3 times so everything should be alright. Except for the fact that the 3 python calls do not run in parallel but one after the other. I can see clearly from the logs I print that the python function fires up the first time with the first slice of input parameters, when it completes it fires up a second time and so on… Why is that? Currently I am not gaining anything because the zero actual parallelization..
Thank you,
Stefano _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hey Stefano,
Are you perhaps using a lot of loopback queries in your UDF? Because of locking reasons, loopback queries are send back to the main process and executed there. After they are executed, the result data is copied back to the forked process. If loopback queries are your primary bottle neck, then parallelizing the Python function does not do much. In fact, it might degrade the performance because return data of the loopback query has to be copied between processes, which does not have to happen if the code is executed in the main process. The main advantage of parallelizing Python functions in this case would be sending multiple queries to the database at the same time, however, if the individual queries are already parallelized then you should not see a very big gain here either.
Regards,
Mark
----- Original Message -----
From: "Stefano Fioravanzo"
On 01 Jul 2016, at 15:53, Mark Raasveldt
wrote: Hey Stefano,
Hm, according to the MAL plan you send you are still using a table-producing function, are you not? (explain select * from calcola_dati_volantino_simulato((select * from temp_input_table));"). How exactly are you getting that to execute in parallel, or is the query shown there not correct?
As for it not actually being executed in parallel, it might not be using multiple processes. In this case the Python GIL could be blocking your functions from executing in parallel. MonetDB will only use multiple processes if your operating system supports the fork() operation. You can check if it is using multiple processes by executing the shell command "pgrep mserver5" while a query is running, if multiple processes show up then MonetDB is using multiple processes, otherwise it is executing everything in the same process which means that the GIL could be blocking concurrent execution.
Regards,
Mark
----- Original Message ----- From: "Stefano Fioravanzo"
To: "users-list" Sent: Thursday, June 30, 2016 3:01:58 PM Subject: Re: python map Hey Mark,
I have tested the options you gave me to run queries with python_map. For now, I have decided to create an empty table and fill it later, using the parallelized function. So I made a few tests and this approach should work. Here you can find the explain of the query (I have set just 3 threads for now) http://pastie.org/private/ar4kbpnn7qiadulxp5g http://pastie.org/private/ar4kbpnn7qiadulxp5g . The operator ‘batpyapimap.eval’ is called 3 times so everything should be alright. Except for the fact that the 3 python calls do not run in parallel but one after the other. I can see clearly from the logs I print that the python function fires up the first time with the first slice of input parameters, when it completes it fires up a second time and so on… Why is that? Currently I am not gaining anything because the zero actual parallelization..
Thank you,
Stefano _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
That is what I feared. Yes I have some loopback queries in the function, I was hoping that each process would manage the query on its own, but unfortunately that’s not the case. Well, thanks for your support! Regards, Stefano
On 04 Jul 2016, at 11:42, Mark Raasveldt
wrote: Hey Stefano,
Are you perhaps using a lot of loopback queries in your UDF? Because of locking reasons, loopback queries are send back to the main process and executed there. After they are executed, the result data is copied back to the forked process. If loopback queries are your primary bottle neck, then parallelizing the Python function does not do much. In fact, it might degrade the performance because return data of the loopback query has to be copied between processes, which does not have to happen if the code is executed in the main process. The main advantage of parallelizing Python functions in this case would be sending multiple queries to the database at the same time, however, if the individual queries are already parallelized then you should not see a very big gain here either.
Regards,
Mark
----- Original Message ----- From: "Stefano Fioravanzo"
To: "users-list" Sent: Monday, July 4, 2016 11:28:18 AM Subject: Re: python map Hello,
Yes I am using a table-producing function, but the returned table has just one column. I have noticed that in this case it works just like without a table-producing function. (But to be sure, I have continued my tests with the return type set to STRING).
About the mserver5 processes: Now I am using 3 threads. When I fire the query, 3 mserver5 processes are created so I have 4 processes (the main mserver5 process + the 3 new ones). The 3 processes start to run in parallel ( At the beginning of the function I have a ‘print “function started”’, and I can see 3 of them in the log ), but then the main computations are carried on by the main mserver5 process, not the 3 new ones. The main mserver5 process is always executing at 100%, sometime the other 3 execute a little bit for a fraction of a second but that’s all.
Regards,
Stefano
On 01 Jul 2016, at 15:53, Mark Raasveldt
wrote: Hey Stefano,
Hm, according to the MAL plan you send you are still using a table-producing function, are you not? (explain select * from calcola_dati_volantino_simulato((select * from temp_input_table));"). How exactly are you getting that to execute in parallel, or is the query shown there not correct?
As for it not actually being executed in parallel, it might not be using multiple processes. In this case the Python GIL could be blocking your functions from executing in parallel. MonetDB will only use multiple processes if your operating system supports the fork() operation. You can check if it is using multiple processes by executing the shell command "pgrep mserver5" while a query is running, if multiple processes show up then MonetDB is using multiple processes, otherwise it is executing everything in the same process which means that the GIL could be blocking concurrent execution.
Regards,
Mark
----- Original Message ----- From: "Stefano Fioravanzo"
To: "users-list" Sent: Thursday, June 30, 2016 3:01:58 PM Subject: Re: python map Hey Mark,
I have tested the options you gave me to run queries with python_map. For now, I have decided to create an empty table and fill it later, using the parallelized function. So I made a few tests and this approach should work. Here you can find the explain of the query (I have set just 3 threads for now) http://pastie.org/private/ar4kbpnn7qiadulxp5g http://pastie.org/private/ar4kbpnn7qiadulxp5g . The operator ‘batpyapimap.eval’ is called 3 times so everything should be alright. Except for the fact that the 3 python calls do not run in parallel but one after the other. I can see clearly from the logs I print that the python function fires up the first time with the first slice of input parameters, when it completes it fires up a second time and so on… Why is that? Currently I am not gaining anything because the zero actual parallelization..
Thank you,
Stefano _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Mark Raasveldt
-
Stefano Fioravanzo