Hi, I forgot to attach the query to my last mail. This one writes 6 MB to disk. Cheers, Viktor Stefan Manegold wrote:
Hi Viktor,
thanks for reporting, and sorry for the inconveniences!
Here are some steps you could perform and report about to help us with analysing and understanding the problem:
0) For easier readbility of the generated MAL plans, you might what to run your mserver5 in read-only mode, i.e., either set the database read-only via
monetdb set readonly=yes
or start mserver5 directly with extra commandline option "--readonly".
1) In case you have a multi-core/-CPU system, please first disable some of MonetDB inherent plan parallelisation (aka. "mitosis") by setting (in/via mclient) therespective optimizer pipeline as follows:
set optimizer='no_mitosis_pipe';
and then try to run your query, again, and report whether it works better or not.
In some cases, in particular with complex queries like your's MonetDB's plan parallelisation can be sub-optimal and, e.g., result in unexpectely/unnecessarily large intermediate results.
You can check the details of active optimizer pipeline via
select optimizer;
You can reset the optimizer pipeline to the default via
set optimizer='default_pipe';
You can check the generated MAL plans by prefixing your query in mclient with keyword "EXPLAIN".
2) To analyze / profile the smaller case (that runs to completion), please run it in mclient prefixed with keyword "TRACE", both with the default optimizer pipeline and with mitosis disabled as described above, and report both results (i.e., the detailed profiling traces).
3) To check which MAL statement causes the excessive (intermediate) result creation, you can use the "stethoscope" tool to "listen" to teh server. Before running your query via mclient, start (in a separate terminal) the stethoscope tool as follows:
stethoscope -u <username> -P <password> -D
Do not redirect or pipe the output to a files, as this might loose some data due to a yet unresolved bug / design flaw. Rather, once the query finishes (or crashes) cut&paste the stethoscope output from the terminal to a file.
Once stethoscope is running and has connected to the server, run your query as usually (perferably again with both optimizer setting, but in separate sessions), and share the final stethoscope output with us.
Kind regards, Stefan
On Tue, Dec 06, 2011 at 04:53:50PM +0100, Viktor Rosenfeld wrote:
Hi,
I'm observing disk I/O when evaluating some queries which I am unable to explain. I have two datasets, one relatively small with the BATs taking up 1.6 MB on disk and one fairly large with 283 MB on disk.
According to iostat, when I evaluate the attached query on the small dataset, 100 MB are written to disk, but none are read. This is on a Linux machine with 4 GB RAM running the Aug2011-SP3 release.
I don't understand why data is written on disk, that
(a) is never read again and (b) would easily fit into main memory.
I am unable to evaluate the queries on the large dataset because MonetDB writes so much data that the disk fills up after a few minutes (more than 22GB).
Please let me know if I can provide more information.
Cheers, Viktor
SELECT count(*) FROM ( SELECT DISTINCT node1.id AS id1, node2.id AS id2, node3.id AS id3, node4.id AS id4, node5.id AS id5, node6.id AS id6, node1.toplevel_corpus FROM node AS node1, rank AS rank1, component AS component1, node_annotation AS node_annotation1, node AS node2, rank AS rank2, component AS component2, edge_annotation AS edge_annotation2, node AS node3, rank AS rank3, component AS component3, node_annotation AS node_annotation3, node AS node4, rank AS rank4, component AS component4, edge_annotation AS edge_annotation4, node AS node5, rank AS rank5, component AS component5, node_annotation AS node_annotation5, node AS node6, rank AS rank6, component AS component6, node_annotation AS node_annotation6 WHERE component2.name IS NULL AND component2.type = 'd' AND component3.name IS NULL AND component3.type = 'd' AND component4.name IS NULL AND component4.type = 'd' AND edge_annotation2.name = 'func' AND edge_annotation2.rank_ref = rank2.pre AND edge_annotation2.value = 'OA' AND edge_annotation4.name = 'func' AND edge_annotation4.rank_ref = rank4.pre AND edge_annotation4.value = 'SB' AND node1.corpus_ref = node2.corpus_ref AND node1.corpus_ref = node3.corpus_ref AND node1.corpus_ref = node4.corpus_ref AND node1.corpus_ref = node5.corpus_ref AND node1.corpus_ref = node6.corpus_ref AND node1.id = node5.id AND node1.id = node6.id AND node2.corpus_ref = node4.corpus_ref AND node2.corpus_ref = node5.corpus_ref AND node2.corpus_ref = node6.corpus_ref AND node2.right_token BETWEEN SYMMETRIC node3.left_token - 1 AND node3.left_token - 50 AND node2.text_ref = node3.text_ref AND node3.corpus_ref = node5.corpus_ref AND node3.corpus_ref = node6.corpus_ref AND node3.right_token BETWEEN SYMMETRIC node4.left_token - 1 AND node4.left_token - 50 AND node3.text_ref = node4.text_ref AND node4.corpus_ref = node5.corpus_ref AND node4.corpus_ref = node6.corpus_ref AND node5.corpus_ref = node6.corpus_ref AND node_annotation1.name = 'cat' AND node_annotation1.node_ref = node1.id AND node_annotation1.value = 'S' AND node_annotation3.name = 'pos' AND node_annotation3.node_ref = node3.id AND node_annotation3.value = 'VVFIN' AND node_annotation5.name = 'cat' AND node_annotation5.node_ref = node5.id AND node_annotation5.value = 'S' AND node_annotation6.name = 'cat' AND node_annotation6.node_ref = node6.id AND node_annotation6.value = 'S' AND rank1.component_ref = component1.id AND rank1.node_ref = node1.id AND rank1.pre = rank4.parent AND rank2.component_ref = component2.id AND rank2.node_ref = node2.id AND rank3.component_ref = component3.id AND rank3.node_ref = node3.id AND rank4.component_ref = component4.id AND rank4.node_ref = node4.id AND rank5.component_ref = component5.id AND rank5.node_ref = node5.id AND rank5.pre = rank2.parent AND rank6.component_ref = component6.id AND rank6.node_ref = node6.id AND rank6.pre = rank3.parent ) AS solutions;
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users