Re: [MonetDB-users] * Strange disk I/O behavior

8 Dec 2011

      Hi,

Stefan Manegold wrote:
...
0)
For easier readbility of the generated MAL plans, you might what to run your
mserver5 in read-only mode, i.e., either set the database read-only via
monetdb set readonly=yes 
or start mserver5 directly with extra commandline option "--readonly".
A side note: Is it correct that when I start mserver directly I can only
use the demo database and monetdb user? If I try another database and/or
another user I get an error message about wrong credentials.
...
1)
In case you have a multi-core/-CPU system, please first disable some of
MonetDB inherent plan parallelisation (aka. "mitosis") by setting (in/via
mclient) therespective optimizer pipeline as follows:
set optimizer='no_mitosis_pipe';
and then try to run your query, again, and report whether it works better or
not.
I observe no difference in runtime or data written to disk when I use
the no_mitosis_pipe optimizer. In fact, except for a few differently
named BATs in the plan, there is only one difference I could spot:

    --- q7-with-mitosis.plan        2011-12-08 13:58:37.181199000 +0100
    +++ q7-without-mitosis.plan     2011-12-08 13:58:56.297199000 +0100
    -     X_178:bat[:oid,:lng]  := batcalc.-(X_175,50:lng,true,false);
    +     X_178:bat[:oid,:lng]  := batcalc.-(X_175,50:lng);
...
2)
To analyze / profile the smaller case (that runs to completion), please run
it in mclient prefixed with keyword "TRACE", both with the default optimizer
pipeline and with mitosis disabled as described above, and report both
results (i.e., the detailed profiling traces).
The problem is that these traces are *very* large. I have 3 queries which
fill up the disk on the large dataset. 2 of these also cause
considerable disk I/O when evaluated on the small dataset (6 MB and 100
MB respectively). The traces of these 2 queries take up 537 MB and 3.7 GB.

The 3rd query runs quickly on the small dataset and according to iostat
only writes 92 kB to disk. The trace is 206 kB big, so I could post this
if it would help.

I've attached the MAL plans with and without mitosis for the first query
which writes 6 MB to disk.

I can also upload the traces to the web, but I'm not sure if that is
useful because they are so large. Please let me know if it would help.
I've also sorted the traces and can provide the statements taking up the
most time.

BTW, ticks is microseconds, correct? The output of stethoscope suggests
it.
...
3)
To check which MAL statement causes the excessive (intermediate) result
creation, you can use the "stethoscope" tool to "listen" to teh server.
Before running your query via mclient, start (in a separate terminal) the
stethoscope tool as follows:
Except for the quick query these also produce very large results.
...
Do not redirect or pipe the output to a files, as this might loose some data
due to a yet unresolved bug / design flaw.
Rather, once the query finishes (or crashes) cut&paste the stethoscope
output from the terminal to a file.
Once stethoscope is running and has connected to the server, run your query
as usually (perferably again with both optimizer setting, but in separate
sessions), and share the final stethoscope output with us.
For the quick query, cutting and pasting from a terminal and redirecting
to a file produced the same number of lines. But if I understand
correctly, you are only interested in the last couple of statements in
the case of a crash?

Can you also provide a command line to use for stethoscope, i.e. which
columns to display and if I could filter for some events? Currently I'm
using +trwbs.

I will try to simplify the queries while keeping the disk I/O behavior.
I hope that this will result in smaller traces.

Thanks,
Viktor
...
Kind regards,
Stefan
On Tue, Dec 06, 2011 at 04:53:50PM +0100, Viktor Rosenfeld wrote:
...
Hi,
I'm observing disk I/O when evaluating some queries which I am unable to
explain. I have two datasets, one relatively small with the BATs taking
up 1.6 MB on disk and one fairly large with 283 MB on disk.
According to iostat, when I evaluate the attached query on the small
dataset, 100 MB are written to disk, but none are read. This is on a
Linux machine with 4 GB RAM running the Aug2011-SP3 release.
I don't understand why data is written on disk, that
(a) is never read again and
(b) would easily fit into main memory.
I am unable to evaluate the queries on the large dataset because MonetDB
writes so much data that the disk fills up after a few minutes (more
than 22GB).
Please let me know if I can provide more information.
Cheers,
Viktor
...
SELECT 
  count(*)
FROM
  (
    SELECT DISTINCT
      node1.id AS id1, node2.id AS id2, node3.id AS id3, node4.id AS id4, node5.id AS id5, node6.id AS id6, node1.toplevel_corpus
    FROM
      node AS node1, rank AS rank1, component AS component1, node_annotation AS node_annotation1,
      node AS node2, rank AS rank2, component AS component2, edge_annotation AS edge_annotation2,
      node AS node3, rank AS rank3, component AS component3, node_annotation AS node_annotation3,
      node AS node4, rank AS rank4, component AS component4, edge_annotation AS edge_annotation4,
      node AS node5, rank AS rank5, component AS component5, node_annotation AS node_annotation5,
      node AS node6, rank AS rank6, component AS component6, node_annotation AS node_annotation6
    WHERE
      component2.name IS NULL AND
      component2.type = 'd' AND
      component3.name IS NULL AND
      component3.type = 'd' AND
      component4.name IS NULL AND
      component4.type = 'd' AND
      edge_annotation2.name = 'func' AND
      edge_annotation2.rank_ref = rank2.pre AND
      edge_annotation2.value = 'OA' AND
      edge_annotation4.name = 'func' AND
      edge_annotation4.rank_ref = rank4.pre AND
      edge_annotation4.value = 'SB' AND
      node1.corpus_ref = node2.corpus_ref AND
      node1.corpus_ref = node3.corpus_ref AND
      node1.corpus_ref = node4.corpus_ref AND
      node1.corpus_ref = node5.corpus_ref AND
      node1.corpus_ref = node6.corpus_ref AND
      node1.id = node5.id AND
      node1.id = node6.id AND
      node2.corpus_ref = node4.corpus_ref AND
      node2.corpus_ref = node5.corpus_ref AND
      node2.corpus_ref = node6.corpus_ref AND
      node2.right_token BETWEEN SYMMETRIC node3.left_token - 1 AND node3.left_token - 50 AND
      node2.text_ref = node3.text_ref AND
      node3.corpus_ref = node5.corpus_ref AND
      node3.corpus_ref = node6.corpus_ref AND
      node3.right_token BETWEEN SYMMETRIC node4.left_token - 1 AND node4.left_token - 50 AND
      node3.text_ref = node4.text_ref AND
      node4.corpus_ref = node5.corpus_ref AND
      node4.corpus_ref = node6.corpus_ref AND
      node5.corpus_ref = node6.corpus_ref AND
      node_annotation1.name = 'cat' AND
      node_annotation1.node_ref = node1.id AND
      node_annotation1.value = 'S' AND
      node_annotation3.name = 'pos' AND
      node_annotation3.node_ref = node3.id AND
      node_annotation3.value = 'VVFIN' AND
      node_annotation5.name = 'cat' AND
      node_annotation5.node_ref = node5.id AND
      node_annotation5.value = 'S' AND
      node_annotation6.name = 'cat' AND
      node_annotation6.node_ref = node6.id AND
      node_annotation6.value = 'S' AND
      rank1.component_ref = component1.id AND
      rank1.node_ref = node1.id AND
      rank1.pre = rank4.parent AND
      rank2.component_ref = component2.id AND
      rank2.node_ref = node2.id AND
      rank3.component_ref = component3.id AND
      rank3.node_ref = node3.id AND
      rank4.component_ref = component4.id AND
      rank4.node_ref = node4.id AND
      rank5.component_ref = component5.id AND
      rank5.node_ref = node5.id AND
      rank5.pre = rank2.parent AND
      rank6.component_ref = component6.id AND
      rank6.node_ref = node6.id AND
      rank6.pre = rank3.parent
  ) AS solutions;
...
------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
...
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- 
| Stefan.Manegold @ CWI.nl | DB Architectures (INS1) |
| http://CWI.nl/~manegold/ | Science Park 123 (L321) |
| Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam  (NL) |
------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users