FW: Monetdb on Solaris/Sparc: high system time problem
Dear Martin,
Thanks for your prompt reply. We have verified that the order in which the queries are executed doesn't affect the behavior of the query. The same problem arises at the same queries in any arbitrary order, which implies that it's related to the query itself. In addition, the queries' execution time and utilization are pretty deterministic.
We have also verified that there is plenty of memory available and there are no other processes competing for resources (i.e., disk bandwidth, memory).
We have used dtrace to profile what functions are being executed by MonetDB while being in the two phases (normal and problematic). In the normal phase, which lasts for a few tens of seconds and the behavior is what we see on Linux, MonetDB mainly access the following functions:
- BUNfastins
- re_match_no_ignore
- strstr
When MonetDB enters into that second (problematic) phase, the User/System time is 4% to 21% for a couple of minutes w/o any disk activity or swapping, MonetDB mainly accesses the following functions:
- MT_sleep_ms
- memset
- __pollsys
- _pollsys
- _save_nv_regs
- pselect
- select
Do these function names help you to understand the problematic behavior?
Thanks in advance,
Javier
________________________________
From: Martin Kersten
Dear Javier Query 7,8,9 are not that different from the rest of the set to search for a particular malicious call. If significant, I would have expected eq 8,9 and 11. Do the problems occur for both sizes of your test, i.e. sf10,sf100 ? Do you have a gdb backtrace of all threads in the worst case scenario? You might run the queries with "set optimizer='sequential_pipe'" or the minimal pipe. It would demonstrate if it is a result of concurrent behavior within the OS, e.g. thrashing on some system resource. regards, Martin On 1/21/13 12:32 AM, Javier Picorel wrote:
Dear Martin,
Thanks for your prompt reply. We have verified that the order in which the queries are executed doesn't affect the behavior of the query. The same problem arises at the same queries in any arbitrary order, which implies that it's related to the query itself. In addition, the queries' execution time and utilization are pretty deterministic. We have also verified that there is plenty of memory available and there are no other processes competing for resources (i.e., disk bandwidth, memory).
We have used dtrace to profile what functions are being executed by MonetDB while being in the two phases (normal and problematic). In the normal phase, which lasts for a few tens of seconds and the behavior is what we see on Linux, MonetDB mainly access the following functions:
- BUNfastins - re_match_no_ignore - strstr When MonetDB enters into that second (problematic) phase, the User/System time is 4% to 21% for a couple of minutes w/o any disk activity or swapping, MonetDB mainly accesses the following functions:
- MT_sleep_ms - memset - __pollsys - _pollsys - _save_nv_regs - pselect - select
Do these function names help you to understand the problematic behavior?
Thanks in advance, Javier
------------------------------------------------------------------------
*From: *Martin Kersten
mailto:martin@monetdb.org> *Subject: **Re: Monetdb on Solaris/Sparc: high system time problem* *Date: *January 17, 2013 6:56:00 PM GMT+01:00 *To: * mailto:users-list@monetdb.org> *Reply-To: *Communication channel for MonetDB users mailto:users-list@monetdb.org> Dear Javier,
Thank you for your report. It is hard to assess the consequences/impact of your particular setup. Your report suggests that something else was also running on your platform during this sequence, at that particular time. It could mean the MonetDB was fighting with another process over resources, i.e. memory and disk bandwidth. It could also indicate a thrashing kernel on shared kernel resources.
If the system persists during repetitive runs under controlled system settings and it is immune to query order, then we might look further into the issue.
regards, Martin
On 1/17/13 3:44 PM, Javier Picorel wrote:
Dear all,
I'm running MonetDB (Dec2011-SP2) on Sparc Solaris 10 u9. I'm testing the TPC-H queries using SF10 and SF100 as the scale factors. My machine has 4 cores and 32GB of RAM and my farm is on a disk array.
Unfortunately, there are some queries (e.g., Qry 7, 8, 9) that enters into phase in which the User/System time is 4% to 21% for a couple of minutes. Interestingly, the same issue doesn't show up on a Linux machine with a similar setup and the queries' system time is pretty low (<5%).
Interestingly, vmstat output doesn't show any disk activity during the phase where the process is just in system, only there are high values on "re" and "mf" fields. There is no swapping going on ("po" and "pi" values are 0 in vmstat).
These are the parameters of the DB:
type default database shared default yes nthreads default 4 optpipe default_pipe master default no slave default <unknown> readonly default no nclients default 64
The way that I test the query is by running just one at a time. How can I mitigate this OS time?
Thanks in advance, Javier
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Javier Picorel
-
Martin Kersten