Hi, Stefan Manegold wrote:
0) For easier readbility of the generated MAL plans, you might what to run your mserver5 in read-only mode, i.e., either set the database read-only via
monetdb set readonly=yes
or start mserver5 directly with extra commandline option "--readonly".
A side note: Is it correct that when I start mserver directly I can only use the demo database and monetdb user? If I try another database and/or another user I get an error message about wrong credentials.
1) In case you have a multi-core/-CPU system, please first disable some of MonetDB inherent plan parallelisation (aka. "mitosis") by setting (in/via mclient) therespective optimizer pipeline as follows:
set optimizer='no_mitosis_pipe';
and then try to run your query, again, and report whether it works better or not.
I observe no difference in runtime or data written to disk when I use the no_mitosis_pipe optimizer. In fact, except for a few differently named BATs in the plan, there is only one difference I could spot: --- q7-with-mitosis.plan 2011-12-08 13:58:37.181199000 +0100 +++ q7-without-mitosis.plan 2011-12-08 13:58:56.297199000 +0100 - X_178:bat[:oid,:lng] := batcalc.-(X_175,50:lng,true,false); + X_178:bat[:oid,:lng] := batcalc.-(X_175,50:lng);
2) To analyze / profile the smaller case (that runs to completion), please run it in mclient prefixed with keyword "TRACE", both with the default optimizer pipeline and with mitosis disabled as described above, and report both results (i.e., the detailed profiling traces).
The problem is that these traces are *very* large. I have 3 queries which fill up the disk on the large dataset. 2 of these also cause considerable disk I/O when evaluated on the small dataset (6 MB and 100 MB respectively). The traces of these 2 queries take up 537 MB and 3.7 GB. The 3rd query runs quickly on the small dataset and according to iostat only writes 92 kB to disk. The trace is 206 kB big, so I could post this if it would help. I've attached the MAL plans with and without mitosis for the first query which writes 6 MB to disk. I can also upload the traces to the web, but I'm not sure if that is useful because they are so large. Please let me know if it would help. I've also sorted the traces and can provide the statements taking up the most time. BTW, ticks is microseconds, correct? The output of stethoscope suggests it.
3) To check which MAL statement causes the excessive (intermediate) result creation, you can use the "stethoscope" tool to "listen" to teh server. Before running your query via mclient, start (in a separate terminal) the stethoscope tool as follows:
Except for the quick query these also produce very large results.
Do not redirect or pipe the output to a files, as this might loose some data due to a yet unresolved bug / design flaw. Rather, once the query finishes (or crashes) cut&paste the stethoscope output from the terminal to a file.
Once stethoscope is running and has connected to the server, run your query as usually (perferably again with both optimizer setting, but in separate sessions), and share the final stethoscope output with us.
For the quick query, cutting and pasting from a terminal and redirecting to a file produced the same number of lines. But if I understand correctly, you are only interested in the last couple of statements in the case of a crash? Can you also provide a command line to use for stethoscope, i.e. which columns to display and if I could filter for some events? Currently I'm using +trwbs. I will try to simplify the queries while keeping the disk I/O behavior. I hope that this will result in smaller traces. Thanks, Viktor
Kind regards, Stefan
On Tue, Dec 06, 2011 at 04:53:50PM +0100, Viktor Rosenfeld wrote:
Hi,
I'm observing disk I/O when evaluating some queries which I am unable to explain. I have two datasets, one relatively small with the BATs taking up 1.6 MB on disk and one fairly large with 283 MB on disk.
According to iostat, when I evaluate the attached query on the small dataset, 100 MB are written to disk, but none are read. This is on a Linux machine with 4 GB RAM running the Aug2011-SP3 release.
I don't understand why data is written on disk, that
(a) is never read again and (b) would easily fit into main memory.
I am unable to evaluate the queries on the large dataset because MonetDB writes so much data that the disk fills up after a few minutes (more than 22GB).
Please let me know if I can provide more information.
Cheers, Viktor
SELECT count(*) FROM ( SELECT DISTINCT node1.id AS id1, node2.id AS id2, node3.id AS id3, node4.id AS id4, node5.id AS id5, node6.id AS id6, node1.toplevel_corpus FROM node AS node1, rank AS rank1, component AS component1, node_annotation AS node_annotation1, node AS node2, rank AS rank2, component AS component2, edge_annotation AS edge_annotation2, node AS node3, rank AS rank3, component AS component3, node_annotation AS node_annotation3, node AS node4, rank AS rank4, component AS component4, edge_annotation AS edge_annotation4, node AS node5, rank AS rank5, component AS component5, node_annotation AS node_annotation5, node AS node6, rank AS rank6, component AS component6, node_annotation AS node_annotation6 WHERE component2.name IS NULL AND component2.type = 'd' AND component3.name IS NULL AND component3.type = 'd' AND component4.name IS NULL AND component4.type = 'd' AND edge_annotation2.name = 'func' AND edge_annotation2.rank_ref = rank2.pre AND edge_annotation2.value = 'OA' AND edge_annotation4.name = 'func' AND edge_annotation4.rank_ref = rank4.pre AND edge_annotation4.value = 'SB' AND node1.corpus_ref = node2.corpus_ref AND node1.corpus_ref = node3.corpus_ref AND node1.corpus_ref = node4.corpus_ref AND node1.corpus_ref = node5.corpus_ref AND node1.corpus_ref = node6.corpus_ref AND node1.id = node5.id AND node1.id = node6.id AND node2.corpus_ref = node4.corpus_ref AND node2.corpus_ref = node5.corpus_ref AND node2.corpus_ref = node6.corpus_ref AND node2.right_token BETWEEN SYMMETRIC node3.left_token - 1 AND node3.left_token - 50 AND node2.text_ref = node3.text_ref AND node3.corpus_ref = node5.corpus_ref AND node3.corpus_ref = node6.corpus_ref AND node3.right_token BETWEEN SYMMETRIC node4.left_token - 1 AND node4.left_token - 50 AND node3.text_ref = node4.text_ref AND node4.corpus_ref = node5.corpus_ref AND node4.corpus_ref = node6.corpus_ref AND node5.corpus_ref = node6.corpus_ref AND node_annotation1.name = 'cat' AND node_annotation1.node_ref = node1.id AND node_annotation1.value = 'S' AND node_annotation3.name = 'pos' AND node_annotation3.node_ref = node3.id AND node_annotation3.value = 'VVFIN' AND node_annotation5.name = 'cat' AND node_annotation5.node_ref = node5.id AND node_annotation5.value = 'S' AND node_annotation6.name = 'cat' AND node_annotation6.node_ref = node6.id AND node_annotation6.value = 'S' AND rank1.component_ref = component1.id AND rank1.node_ref = node1.id AND rank1.pre = rank4.parent AND rank2.component_ref = component2.id AND rank2.node_ref = node2.id AND rank3.component_ref = component3.id AND rank3.node_ref = node3.id AND rank4.component_ref = component4.id AND rank4.node_ref = node4.id AND rank5.component_ref = component5.id AND rank5.node_ref = node5.id AND rank5.pre = rank2.parent AND rank6.component_ref = component6.id AND rank6.node_ref = node6.id AND rank6.pre = rank3.parent ) AS solutions;
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users