NThreads Parameter and the Performance Related to the Number of Database Instance
Hello MonetDB Users, In short: 1) What does the nthreads setting mean? 2) Why does performance increase as you increase the number of database instances of a single farm? Is there any way to avoid this? In long: What does the nthreads setting mean? From the manpage[1] it's the number of worker threads that perform main processing. Is this the total number of threads for that database instance? Or is it per query? I've compared the performance of different settings. I find it strange that nthreads=8 would perform the best because I have 12 cores. I confirmed the number of cores by checking nproc. Here are my plots showing that 8 threads performs best on my system: Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Ttime: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-... The second thing is: why does the performance increase if I distribute the conncurent clients over multiple database instances on the same data farm? I expected the opposite; performance should decrease. I thought I would increase the overhead by adding more database instances to the same farm. Is there any way to avoid this? Below are the plots I have. I used 40 concurrent TPC-H clients for all: Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-... Context: To give you guys some context on my project: I'm a Master's student doing my research on DBaaS tenant placement. I am evaluating some placement algorithms by using the TPC-H workload and MonetDB as the database. I am using separate database instances on a single farm for isolation. The workload I'm testing with runs read only queries from the TPC-H benchmark. Each TPC-H client is a thread each with its own persistent connection to the database running the following pseudocode: while true for queryNum 1 ... 22 # except query 15 which creates a tmp table run queury queryNum Each database instance has 100MB of data. Thank you for your time, Joseph References: [1] https://www.monetdb.org/Documentation/monetdb-man-page
Hello MonetDB Users,
In short: 1) What does the nthreads setting mean? It is the minimal number of worker threads per mserver5 instance and the basis for multi-core
Hi On 09/04/15 07:53, Joseph Mate wrote: parallel processing. The number of worker threads can be set as a commandline parameter.
2) Why does performance increase as you increase the number of database instances of a single farm? Is there any way to avoid this? No immediate answer. Your experiment seems a mixture of hot/cold runs, which affect file system caches. The small database size leads to keeping all data more-or-less in memory.
regards, Martin
In long: What does the nthreads setting mean? From the manpage[1] it's the number of worker threads that perform main processing. Is this the total number of threads for that database instance? Or is it per query? I've compared the performance of different settings. I find it strange that nthreads=8 would perform the best because I have 12 cores. I confirmed the number of cores by checking nproc. Here are my plots showing that 8 threads performs best on my system:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Ttime: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
The second thing is: why does the performance increase if I distribute the conncurent clients over multiple database instances on the same data farm? I expected the opposite; performance should decrease. I thought I would increase the overhead by adding more database instances to the same farm. Is there any way to avoid this? Below are the plots I have. I used 40 concurrent TPC-H clients for all:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
Context: To give you guys some context on my project: I'm a Master's student doing my research on DBaaS tenant placement. I am evaluating some placement algorithms by using the TPC-H workload and MonetDB as the database. I am using separate database instances on a single farm for isolation.
The workload I'm testing with runs read only queries from the TPC-H benchmark. Each TPC-H client is a thread each with its own persistent connection to the database running the following pseudocode:
while true for queryNum 1 ... 22 # except query 15 which creates a tmp table run queury queryNum
Each database instance has 100MB of data.
Thank you for your time, Joseph
References: [1] https://www.monetdb.org/Documentation/monetdb-man-page
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 15-04-09 03:10 AM, Martin Kersten wrote:
Hi
Hello MonetDB Users,
In short: 1) What does the nthreads setting mean? It is the minimal number of worker threads per mserver5 instance and the basis for multi-core
On 09/04/15 07:53, Joseph Mate wrote: parallel processing. The number of worker threads can be set as a commandline parameter.
Can multiple worker threads work on a single query?
2) Why does performance increase as you increase the number of database instances of a single farm? Is there any way to avoid this? No immediate answer. Your experiment seems a mixture of hot/cold runs, which affect file system caches. The small database size leads to keeping all data more-or-less in memory.
I forgot to mention that I allowed the TPCH workload to run for five minutes before taking measurements to ensure measurements were taken during steady state. Thank you! Joseph
regards, Martin
In long: What does the nthreads setting mean? From the manpage[1] it's the number of worker threads that perform main processing. Is this the total number of threads for that database instance? Or is it per query? I've compared the performance of different settings. I find it strange that nthreads=8 would perform the best because I have 12 cores. I confirmed the number of cores by checking nproc. Here are my plots showing that 8 threads performs best on my system:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Ttime: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
The second thing is: why does the performance increase if I distribute the conncurent clients over multiple database instances on the same data farm? I expected the opposite; performance should decrease. I thought I would increase the overhead by adding more database instances to the same farm. Is there any way to avoid this? Below are the plots I have. I used 40 concurrent TPC-H clients for all:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
Context: To give you guys some context on my project: I'm a Master's student doing my research on DBaaS tenant placement. I am evaluating some placement algorithms by using the TPC-H workload and MonetDB as the database. I am using separate database instances on a single farm for isolation.
The workload I'm testing with runs read only queries from the TPC-H benchmark. Each TPC-H client is a thread each with its own persistent connection to the database running the following pseudocode:
while true for queryNum 1 ... 22 # except query 15 which creates a tmp table run queury queryNum
Each database instance has 100MB of data.
Thank you for your time, Joseph
References: [1] https://www.monetdb.org/Documentation/monetdb-man-page
_______________________________________________ users-list mailing list users-list at monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 10/04/15 21:32, Joseph Mate wrote:
On 15-04-09 03:10 AM, Martin Kersten wrote:
Hi
Hello MonetDB Users,
In short: 1) What does the nthreads setting mean? It is the minimal number of worker threads per mserver5 instance and the basis for multi-core
On 09/04/15 07:53, Joseph Mate wrote: parallel processing. The number of worker threads can be set as a commandline parameter.
Can multiple worker threads work on a single query?
yes, they do.
2) Why does performance increase as you increase the number of database instances of a single farm? Is there any way to avoid this? No immediate answer. Your experiment seems a mixture of hot/cold runs, which affect file system caches. The small database size leads to keeping all data more-or-less in memory.
I forgot to mention that I allowed the TPCH workload to run for five minutes before taking measurements to ensure measurements were taken during steady state.
Thank you! Joseph
regards, Martin
In long: What does the nthreads setting mean? From the manpage[1] it's the number of worker threads that perform main processing. Is this the total number of threads for that database instance? Or is it per query? I've compared the performance of different settings. I find it strange that nthreads=8 would perform the best because I have 12 cores. I confirmed the number of cores by checking nproc. Here are my plots showing that 8 threads performs best on my system:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Ttime: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
The second thing is: why does the performance increase if I distribute the conncurent clients over multiple database instances on the same data farm? I expected the opposite; performance should decrease. I thought I would increase the overhead by adding more database instances to the same farm. Is there any way to avoid this? Below are the plots I have. I used 40 concurrent TPC-H clients for all:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
Context: To give you guys some context on my project: I'm a Master's student doing my research on DBaaS tenant placement. I am evaluating some placement algorithms by using the TPC-H workload and MonetDB as the database. I am using separate database instances on a single farm for isolation.
The workload I'm testing with runs read only queries from the TPC-H benchmark. Each TPC-H client is a thread each with its own persistent connection to the database running the following pseudocode:
while true for queryNum 1 ... 22 # except query 15 which creates a tmp table run queury queryNum
Each database instance has 100MB of data.
Thank you for your time, Joseph
References: [1] https://www.monetdb.org/Documentation/monetdb-man-page
_______________________________________________ users-list mailing list users-list at monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Joseph, a few questions and comments on your observations: Which version of MonetDB are you using? With some older versions nthreads was indeed (incorrectly) the number of threads per client context, however, with recent versions it's the number of total worker threads of the server, i.e., the number of threads that do the actual query evaluation. Performance, specifically parallel performance and in particular multi-core performance is everthing but simple, surely in a data management context where data access does play a more important role than pure compute power. Thus, more cores/threads does not necessarily mean higher performance. A first question in order to be able to interpret your results is whether your machine has 12 physical cores or 12 virtual cores (due to hyperthreading), i.e., only 6 physical cores. If the latter, did you also try with 6 threads? Hyperthreading helps to improve performance only of some threads stall due to resource stall, and thus other threads can use the idle instruction units. Also, is your machine a single socket or a multi-socket machine, i.e., could NUMA effects play a role? Finally, is your machine idle except for the database server? Where do your clients run? on the same machine? Then they might also impact your results, in particular when trying to use all 12 cores for the server, which is then competing with the clients. In particular with your tiny 100MB database and thus fast query responses, clients will by rather busy to constantly receive results and issue new queries. One explanation could be that with 8 server threads, an "equlibrium" is reached such that 8 core are busy with server threads, and 4 with clients. The peak performance with exactly 4 clients IMHO support this idea. Did you check your system load? To fully understand the behavior, you might als want to use numbers of server threads and clients that are not powers of two --- and not only because your machine has a non power of two number of cores. For the multi DB instance, I don't have any idea, yet. However, multiple DB instance also means multiple servers, thus I assume your thread number are per server? Also, your data will be replicated and data access on disk and in memory will be different... Best, Stefan ----- Original Message -----
Hello MonetDB Users,
In short: 1) What does the nthreads setting mean? 2) Why does performance increase as you increase the number of database instances of a single farm? Is there any way to avoid this?
In long: What does the nthreads setting mean? From the manpage[1] it's the number of worker threads that perform main processing. Is this the total number of threads for that database instance? Or is it per query? I've compared the performance of different settings. I find it strange that nthreads=8 would perform the best because I have 12 cores. I confirmed the number of cores by checking nproc. Here are my plots showing that 8 threads performs best on my system:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Ttime: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
The second thing is: why does the performance increase if I distribute the conncurent clients over multiple database instances on the same data farm? I expected the opposite; performance should decrease. I thought I would increase the overhead by adding more database instances to the same farm. Is there any way to avoid this? Below are the plots I have. I used 40 concurrent TPC-H clients for all:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
Context: To give you guys some context on my project: I'm a Master's student doing my research on DBaaS tenant placement. I am evaluating some placement algorithms by using the TPC-H workload and MonetDB as the database. I am using separate database instances on a single farm for isolation.
The workload I'm testing with runs read only queries from the TPC-H benchmark. Each TPC-H client is a thread each with its own persistent connection to the database running the following pseudocode:
while true for queryNum 1 ... 22 # except query 15 which creates a tmp table run queury queryNum
Each database instance has 100MB of data.
Thank you for your time, Joseph
References: [1] https://www.monetdb.org/Documentation/monetdb-man-page
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
On 15-04-09 03:17 AM, Stefan Manegold wrote:
Hi Joseph,
a few questions and comments on your observations:
Which version of MonetDB are you using?
monetdbd -v MonetDB Database Server v1.7 (Oct2014-SP2)
With some older versions nthreads was indeed (incorrectly) the number of threads per client context, however, with recent versions it's the number of total worker threads of the server, i.e., the number of threads that do the actual query evaluation.
Performance, specifically parallel performance and in particular multi-core performance is everthing but simple, surely in a data management context where data access does play a more important role than pure compute power. Thus, more cores/threads does not necessarily mean higher performance. A first question in order to be able to interpret your results is whether your machine has 12 physical cores or 12 virtual cores (due to hyperthreading), i.e., only 6 physical cores. If the latter, did you also try with 6 threads? Hyperthreading helps to improve performance only of some threads stall due to resource stall, and thus other threads can use the idle instruction units.
I confirmed I have 12 physical cores because /proc/cpuinfo says cpu_cores=siblings [1][2]. We have two processors, each with six cores.
Also, is your machine a single socket or a multi-socket machine, i.e., could NUMA effects play a role?
Looks like I am susceptible to NUMA as I have two processors, each with 6 cores: dmesg | grep -i numa NUMA: Using 31 for the hash shift. pci_bus 0000:00: on NUMA node 0 pci_bus 0000:7f: on NUMA node 0 pci_bus 0000:80: on NUMA node 1 pci_bus 0000:ff: on NUMA node 1
Finally, is your machine idle except for the database server?
I have completely reserved the database server. However, there are small tasks running the background like sar recording hardware stats, and the grid scheduler stuff, like pbs_mom, that kills your processes if you run for too long or consume too much virtual memory. The scheduler does not use visualization (IE: Xen, VMWare, etc.).
Where do your clients run? on the same machine? Then they might also impact your results, in particular when trying to use all 12 cores for the server, which is then competing with the clients. In particular with your tiny 100MB database and thus fast query responses, clients will by rather busy to constantly receive results and issue new queries.
One explanation could be that with 8 server threads, an "equlibrium" is reached such that 8 core are busy with server threads, and 4 with clients. The peak performance with exactly 4 clients IMHO support this idea.
The clients are running on a different machine from the database server.
Did you check your system load?
To fully understand the behavior, you might als want to use numbers of server threads and clients that are not powers of two --- and not only because your machine has a non power of two number of cores.
This is good advice. Thank you.
For the multi DB instance, I don't have any idea, yet. However, multiple DB instance also means multiple servers, thus I assume your thread number are per server? Also, your data will be replicated and data access on disk and in memory will be different...
To give some context on this test: I'm trying to determine how significant of an impact there would be in multi-tenant scenarios. I was hoping that there wouldn't be too much extra overhead by collocating tenants on the same server. However, I did not expect performance to improve when co-locating tenants to the same server as there is additional overhead to managing multiple tenants versus one tenant. Having multiple database instances on the same server somehow improved the performance. The clients were kept constant throughout all the database instance experiments. For each number of database instances, I evenly distributed the clients (except in the cases where 40 clients was not evenly divisible by the number of database instances). For example the 2 database instances experiment: db1: 20 clients db2: 20 clients For example the 3 database instances experiment: db1: 14 clients db2: 13 clients db3: 13 clients There is a single machine generating the queries to run and submitting them to a different machine that is the single database server. Each database instance is just a monetdb create on a monetdbd farm. IE 2 db instances: monetdbd create /tmp/mfarm monetdbd start /tmp/mfarm monetdb create dbinstance1 monetdb set nthreads=4 dbinstance1 monetdb release dbinstance1 monetdb create dbinstance2 monetdb set nthreads=4 dbinstance2 monetdb release dbinstance2 The machines have 32GB of memory and everything will be in memory by the time I start taking measurements. I don't think disk is an issue. I agree that the memory access will be different because each database instance has it's own copy 100MB of the data. However, I wasn't expecting the average response time of a database server with many instances to be 2/3 of the average response time of a database server with one instance.
Thank you! Joseph References: [1] http://www.linuxforums.org/articles/finding-server-is-multi-processor-multi-... [2] grep -P "(processor)|(physical_id)|(siblings)|(core_id)|(cpu cores)" /proc/cpuinfo processor : 0 siblings : 6 cpu cores : 6 processor : 1 siblings : 6 cpu cores : 6 processor : 2 siblings : 6 cpu cores : 6 processor : 3 siblings : 6 cpu cores : 6 processor : 4 siblings : 6 cpu cores : 6 processor : 5 siblings : 6 cpu cores : 6 processor : 6 siblings : 6 cpu cores : 6 processor : 7 siblings : 6 cpu cores : 6 processor : 8 siblings : 6 cpu cores : 6 processor : 9 siblings : 6 cpu cores : 6 processor : 10 siblings : 6 cpu cores : 6 processor : 11 siblings : 6 cpu cores : 6
Best, Stefan
----- Original Message -----
Hello MonetDB Users,
In short: 1) What does the nthreads setting mean? 2) Why does performance increase as you increase the number of database instances of a single farm? Is there any way to avoid this?
In long: What does the nthreads setting mean? From the manpage[1] it's the number of worker threads that perform main processing. Is this the total number of threads for that database instance? Or is it per query? I've compared the performance of different settings. I find it strange that nthreads=8 would perform the best because I have 12 cores. I confirmed the number of cores by checking nproc. Here are my plots showing that 8 threads performs best on my system:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Ttime: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
The second thing is: why does the performance increase if I distribute the conncurent clients over multiple database instances on the same data farm? I expected the opposite; performance should decrease. I thought I would increase the overhead by adding more database instances to the same farm. Is there any way to avoid this? Below are the plots I have. I used 40 concurrent TPC-H clients for all:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
Context: To give you guys some context on my project: I'm a Master's student doing my research on DBaaS tenant placement. I am evaluating some placement algorithms by using the TPC-H workload and MonetDB as the database. I am using separate database instances on a single farm for isolation.
The workload I'm testing with runs read only queries from the TPC-H benchmark. Each TPC-H client is a thread each with its own persistent connection to the database running the following pseudocode:
while true for queryNum 1 ... 22 # except query 15 which creates a tmp table run queury queryNum
Each database instance has 100MB of data.
Thank you for your time, Joseph
References: [1] https://www.monetdb.org/Documentation/monetdb-man-page
_______________________________________________ users-list mailing list users-list at monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Joseph Mate
-
Martin Kersten
-
Stefan Manegold