Timing a query execution is supported in multiple ways, but largely depends on what you want to measure. Point-to-point wall clock time, or the actual behaviour of kernel operations.
The baseline is to use simple command line tools, such at time
on Linux to assess the performance of running a
script against mclient
. Beware that /bin/time
and /usr/bin/time
are not the same, they mainly measure and report the
wall-clock time spent by the given command/process. See their respective man pages for details.
The next approach is to use the --interactive
option of the mclient tool,
which will report on the timing of each individual SQL query in a script in easy human consumable terms.
It returns the wall-clock time between sending the query to the server and receiving the first block of answers.
The query history can also be maintained in a separate log for post analysis. (see below)
Thus, (1) includes everything from loading the mclient
binary and starting the mclient
process,
parsing the query in mclient
, sending to the server, having the server execute the query and serialise the result,
sending the result back to the client, to the client receiving, parsing and rendering the result, and sending the
result to /dev/null
("for free"), to a file (I/O), or to a terminal (scrolling).
(2) merely includes the time the server spends on receiving and executing the query and creating the result.
The above-mentioned costs on the client side to receive, parse, render, etc. The result are excluded.
The same holds for (3).
A detailed time of an SQL query can be obtained with prepending the query with the modifier TRACE
.
It will produce a queryable table with a break down of all relational algebra operations
(see TRACE command).
The profiling tools stethoscope
provides further details for those interested in the inner working of the system.
It provides a hook to many system parameters, e.g. input/output, CPU cycles, and threads' activities.
Timing a database query should be done with care. Often you will notice differences in response time for the same query ran multiple times. The underlying cause can be that the data itself resides on disk (slow) or is already available in the memory caches (fast), a single user runs queries (fast) or has to compete with other users (slow), including competing with other processes on your box fighting over cpu, memory, and IO resources. As a precaution you might want to flush the system caches. The Windows tool flushes the cache. You'll need to press the "Flush Cache WS" and "Flush All Standby" buttons. On Linux you have to create a little job that consumes all memory.
The SQL implementation comes with a simple query profiler to detect expensive queries. It is centred around two predefined internal tables that store the definitions of all executed SQL queries and their execution time.
Query logging can be started by calling the procedure querylog_enable()
, which saves
some major compilation information of a query in the querylog_catalog
table:
name | type | description |
---|---|---|
"id" | OID | |
"owner" | STRING | The SQL user who has executed this query |
"defined" | TIMESTAMP | Time when the query was started |
"query" | STRING | The query that has been executed |
"pipe" | STRING | The optimiser pipe line that has been used |
"plan" | STRING | Name of its MAL plan |
"mal" | INTEGER | Size of its MAL plan in the number of statements |
"optimize" | BIGINT | Time in microseconds for the optimiser pipeline |
Query logging can be stoped by calling procedure querylog_disable()
.
The query performance is stored in the table querylog_calls
.
The owner of the query definition is also the one who is referenced implicitly by
the id
of a call event. The key timing attributes are run
, i.e. the time
to execute the query to produce the result set, and ship
, i.e. the time
to render the result set and send it to the client. All times are in microseconds.
The remaining parameters illustrate resource claims. The tuples
attribute denotes
the size of the result set in the number of rows.
The cpu
load is derived from the operating system statistics (Linux only) and is given as a percentage. The same holds for the io
waiting time.
The sys.querylog_calls
table structure:
name | type | description |
---|---|---|
"id" | OID | sys.querylog_catalog.id |
"start" | TIMESTAMP | time the statement was started |
"stop" | TIMESTAMP | time the statement was completely finished |
"arguments" | STRING | actual call structure |
"tuples" | BIGINT | number of tuples in the result set |
"run" | BIGINT | time spent (in usec) until the result export |
"ship" | BIGINT | time spent (in usec) to ship the result set |
"cpu" | INT | average cpu load percentage during execution |
"io" | INT | percentage time waiting for IO to finish |
The view sys.querylog_history
includes some useful information from both tables:
create view sys.querylog_history as
select qd.*, ql."start",ql."stop", ql.arguments, ql.tuples, ql.run, ql.ship, ql.cpu, ql.io
from sys.querylog_catalog() qd, sys.querylog_calls() ql
where qd.id = ql.id and qd.owner = user;
Procedure | Description |
---|---|
call sys.querylog_disable(); | stop logging queries |
sys.querylog_empty() | empty the query log |
sys.querylog_enable() | start logging queries |
sys.querylog_enable(threshold_in_ms integer) | start logging queries but only the ones which execution time exceeds the threshold_in_ms time. |
The sys.querylog_enable()
function also take a parameter, threshold
, which is an integer in millisecond.
When the query log is enabled with this parameter, it will only log those queries whose execution times are longer
than the threshold. This feature can be handy to prevent the database from being swarmed by too many short
running queries, hence reduce the overhead incurred by the query log (see below), while helping the DBA detecting
expensive queries.
Disabling the query log will not remove existing query logs; it only prevents subsequent queries to be logged.
Once the query log is re-enabled, information of subsequently executed queries will be appended to the existing query logs.
Query logs are stored in persistent tables, ie they will survive a MonetDB server restart.
They can only be removed sys.querylog_empty()
.
A downside of this implementation is its relative high overhead because every read query will trigger a write transaction.