[MonetDB-users] Strange disk I/O behavior
Hi, I'm observing disk I/O when evaluating some queries which I am unable to explain. I have two datasets, one relatively small with the BATs taking up 1.6 MB on disk and one fairly large with 283 MB on disk. According to iostat, when I evaluate the attached query on the small dataset, 100 MB are written to disk, but none are read. This is on a Linux machine with 4 GB RAM running the Aug2011-SP3 release. I don't understand why data is written on disk, that (a) is never read again and (b) would easily fit into main memory. I am unable to evaluate the queries on the large dataset because MonetDB writes so much data that the disk fills up after a few minutes (more than 22GB). Please let me know if I can provide more information. Cheers, Viktor
Hi Viktor,
thanks for reporting, and sorry for the inconveniences!
Here are some steps you could perform and report about to help us with
analysing and understanding the problem:
0)
For easier readbility of the generated MAL plans, you might what to run your
mserver5 in read-only mode, i.e., either set the database read-only via
monetdb set readonly=yes
Hi,
I'm observing disk I/O when evaluating some queries which I am unable to explain. I have two datasets, one relatively small with the BATs taking up 1.6 MB on disk and one fairly large with 283 MB on disk.
According to iostat, when I evaluate the attached query on the small dataset, 100 MB are written to disk, but none are read. This is on a Linux machine with 4 GB RAM running the Aug2011-SP3 release.
I don't understand why data is written on disk, that
(a) is never read again and (b) would easily fit into main memory.
I am unable to evaluate the queries on the large dataset because MonetDB writes so much data that the disk fills up after a few minutes (more than 22GB).
Please let me know if I can provide more information.
Cheers, Viktor
SELECT count(*) FROM ( SELECT DISTINCT node1.id AS id1, node2.id AS id2, node3.id AS id3, node4.id AS id4, node5.id AS id5, node6.id AS id6, node1.toplevel_corpus FROM node AS node1, rank AS rank1, component AS component1, node_annotation AS node_annotation1, node AS node2, rank AS rank2, component AS component2, edge_annotation AS edge_annotation2, node AS node3, rank AS rank3, component AS component3, node_annotation AS node_annotation3, node AS node4, rank AS rank4, component AS component4, edge_annotation AS edge_annotation4, node AS node5, rank AS rank5, component AS component5, node_annotation AS node_annotation5, node AS node6, rank AS rank6, component AS component6, node_annotation AS node_annotation6 WHERE component2.name IS NULL AND component2.type = 'd' AND component3.name IS NULL AND component3.type = 'd' AND component4.name IS NULL AND component4.type = 'd' AND edge_annotation2.name = 'func' AND edge_annotation2.rank_ref = rank2.pre AND edge_annotation2.value = 'OA' AND edge_annotation4.name = 'func' AND edge_annotation4.rank_ref = rank4.pre AND edge_annotation4.value = 'SB' AND node1.corpus_ref = node2.corpus_ref AND node1.corpus_ref = node3.corpus_ref AND node1.corpus_ref = node4.corpus_ref AND node1.corpus_ref = node5.corpus_ref AND node1.corpus_ref = node6.corpus_ref AND node1.id = node5.id AND node1.id = node6.id AND node2.corpus_ref = node4.corpus_ref AND node2.corpus_ref = node5.corpus_ref AND node2.corpus_ref = node6.corpus_ref AND node2.right_token BETWEEN SYMMETRIC node3.left_token - 1 AND node3.left_token - 50 AND node2.text_ref = node3.text_ref AND node3.corpus_ref = node5.corpus_ref AND node3.corpus_ref = node6.corpus_ref AND node3.right_token BETWEEN SYMMETRIC node4.left_token - 1 AND node4.left_token - 50 AND node3.text_ref = node4.text_ref AND node4.corpus_ref = node5.corpus_ref AND node4.corpus_ref = node6.corpus_ref AND node5.corpus_ref = node6.corpus_ref AND node_annotation1.name = 'cat' AND node_annotation1.node_ref = node1.id AND node_annotation1.value = 'S' AND node_annotation3.name = 'pos' AND node_annotation3.node_ref = node3.id AND node_annotation3.value = 'VVFIN' AND node_annotation5.name = 'cat' AND node_annotation5.node_ref = node5.id AND node_annotation5.value = 'S' AND node_annotation6.name = 'cat' AND node_annotation6.node_ref = node6.id AND node_annotation6.value = 'S' AND rank1.component_ref = component1.id AND rank1.node_ref = node1.id AND rank1.pre = rank4.parent AND rank2.component_ref = component2.id AND rank2.node_ref = node2.id AND rank3.component_ref = component3.id AND rank3.node_ref = node3.id AND rank4.component_ref = component4.id AND rank4.node_ref = node4.id AND rank5.component_ref = component5.id AND rank5.node_ref = node5.id AND rank5.pre = rank2.parent AND rank6.component_ref = component6.id AND rank6.node_ref = node6.id AND rank6.pre = rank3.parent ) AS solutions;
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
On 06-12-2011 18:24:51 +0100, Stefan Manegold wrote:
3) To check which MAL statement causes the excessive (intermediate) result creation, you can use the "stethoscope" tool to "listen" to teh server. Before running your query via mclient, start (in a separate terminal) the stethoscope tool as follows:
stethoscope -u <username> -P <password> -D
Do not redirect or pipe the output to a files, as this might loose some data due to a yet unresolved bug / design flaw. Rather, once the query finishes (or crashes) cut&paste the stethoscope output from the terminal to a file.
In fact, redirecting the output to a file probably makes the data loss less, since writing to file is often faster than writing to a terminal. It is unavoidable that data gets lost when there is a large amount of events being generated, and you can't tell other than when you figure parts must be missing.
On Tue, Dec 06, 2011 at 06:36:27PM +0100, Fabian Groffen wrote:
On 06-12-2011 18:24:51 +0100, Stefan Manegold wrote:
3) To check which MAL statement causes the excessive (intermediate) result creation, you can use the "stethoscope" tool to "listen" to teh server. Before running your query via mclient, start (in a separate terminal) the stethoscope tool as follows:
stethoscope -u <username> -P <password> -D
Do not redirect or pipe the output to a files, as this might loose some data due to a yet unresolved bug / design flaw. Rather, once the query finishes (or crashes) cut&paste the stethoscope output from the terminal to a file.
In fact, redirecting the output to a file probably makes the data loss less, since writing to file is often faster than writing to a terminal.
It is unavoidable that data gets lost when there is a large amount of events being generated, and you can't tell other than when you figure parts must be missing.
Well, the most important information we need in this case is the very last MAL statement issued that keeps the server busy (with creating a very large intermediate result); hence, loss of some previous statements is actually really a big problem in this case. Stefan -- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi, Stefan Manegold wrote:
0) For easier readbility of the generated MAL plans, you might what to run your mserver5 in read-only mode, i.e., either set the database read-only via
monetdb set readonly=yes
or start mserver5 directly with extra commandline option "--readonly".
A side note: Is it correct that when I start mserver directly I can only use the demo database and monetdb user? If I try another database and/or another user I get an error message about wrong credentials.
1) In case you have a multi-core/-CPU system, please first disable some of MonetDB inherent plan parallelisation (aka. "mitosis") by setting (in/via mclient) therespective optimizer pipeline as follows:
set optimizer='no_mitosis_pipe';
and then try to run your query, again, and report whether it works better or not.
I observe no difference in runtime or data written to disk when I use the no_mitosis_pipe optimizer. In fact, except for a few differently named BATs in the plan, there is only one difference I could spot: --- q7-with-mitosis.plan 2011-12-08 13:58:37.181199000 +0100 +++ q7-without-mitosis.plan 2011-12-08 13:58:56.297199000 +0100 - X_178:bat[:oid,:lng] := batcalc.-(X_175,50:lng,true,false); + X_178:bat[:oid,:lng] := batcalc.-(X_175,50:lng);
2) To analyze / profile the smaller case (that runs to completion), please run it in mclient prefixed with keyword "TRACE", both with the default optimizer pipeline and with mitosis disabled as described above, and report both results (i.e., the detailed profiling traces).
The problem is that these traces are *very* large. I have 3 queries which fill up the disk on the large dataset. 2 of these also cause considerable disk I/O when evaluated on the small dataset (6 MB and 100 MB respectively). The traces of these 2 queries take up 537 MB and 3.7 GB. The 3rd query runs quickly on the small dataset and according to iostat only writes 92 kB to disk. The trace is 206 kB big, so I could post this if it would help. I've attached the MAL plans with and without mitosis for the first query which writes 6 MB to disk. I can also upload the traces to the web, but I'm not sure if that is useful because they are so large. Please let me know if it would help. I've also sorted the traces and can provide the statements taking up the most time. BTW, ticks is microseconds, correct? The output of stethoscope suggests it.
3) To check which MAL statement causes the excessive (intermediate) result creation, you can use the "stethoscope" tool to "listen" to teh server. Before running your query via mclient, start (in a separate terminal) the stethoscope tool as follows:
Except for the quick query these also produce very large results.
Do not redirect or pipe the output to a files, as this might loose some data due to a yet unresolved bug / design flaw. Rather, once the query finishes (or crashes) cut&paste the stethoscope output from the terminal to a file.
Once stethoscope is running and has connected to the server, run your query as usually (perferably again with both optimizer setting, but in separate sessions), and share the final stethoscope output with us.
For the quick query, cutting and pasting from a terminal and redirecting to a file produced the same number of lines. But if I understand correctly, you are only interested in the last couple of statements in the case of a crash? Can you also provide a command line to use for stethoscope, i.e. which columns to display and if I could filter for some events? Currently I'm using +trwbs. I will try to simplify the queries while keeping the disk I/O behavior. I hope that this will result in smaller traces. Thanks, Viktor
Kind regards, Stefan
On Tue, Dec 06, 2011 at 04:53:50PM +0100, Viktor Rosenfeld wrote:
Hi,
I'm observing disk I/O when evaluating some queries which I am unable to explain. I have two datasets, one relatively small with the BATs taking up 1.6 MB on disk and one fairly large with 283 MB on disk.
According to iostat, when I evaluate the attached query on the small dataset, 100 MB are written to disk, but none are read. This is on a Linux machine with 4 GB RAM running the Aug2011-SP3 release.
I don't understand why data is written on disk, that
(a) is never read again and (b) would easily fit into main memory.
I am unable to evaluate the queries on the large dataset because MonetDB writes so much data that the disk fills up after a few minutes (more than 22GB).
Please let me know if I can provide more information.
Cheers, Viktor
SELECT count(*) FROM ( SELECT DISTINCT node1.id AS id1, node2.id AS id2, node3.id AS id3, node4.id AS id4, node5.id AS id5, node6.id AS id6, node1.toplevel_corpus FROM node AS node1, rank AS rank1, component AS component1, node_annotation AS node_annotation1, node AS node2, rank AS rank2, component AS component2, edge_annotation AS edge_annotation2, node AS node3, rank AS rank3, component AS component3, node_annotation AS node_annotation3, node AS node4, rank AS rank4, component AS component4, edge_annotation AS edge_annotation4, node AS node5, rank AS rank5, component AS component5, node_annotation AS node_annotation5, node AS node6, rank AS rank6, component AS component6, node_annotation AS node_annotation6 WHERE component2.name IS NULL AND component2.type = 'd' AND component3.name IS NULL AND component3.type = 'd' AND component4.name IS NULL AND component4.type = 'd' AND edge_annotation2.name = 'func' AND edge_annotation2.rank_ref = rank2.pre AND edge_annotation2.value = 'OA' AND edge_annotation4.name = 'func' AND edge_annotation4.rank_ref = rank4.pre AND edge_annotation4.value = 'SB' AND node1.corpus_ref = node2.corpus_ref AND node1.corpus_ref = node3.corpus_ref AND node1.corpus_ref = node4.corpus_ref AND node1.corpus_ref = node5.corpus_ref AND node1.corpus_ref = node6.corpus_ref AND node1.id = node5.id AND node1.id = node6.id AND node2.corpus_ref = node4.corpus_ref AND node2.corpus_ref = node5.corpus_ref AND node2.corpus_ref = node6.corpus_ref AND node2.right_token BETWEEN SYMMETRIC node3.left_token - 1 AND node3.left_token - 50 AND node2.text_ref = node3.text_ref AND node3.corpus_ref = node5.corpus_ref AND node3.corpus_ref = node6.corpus_ref AND node3.right_token BETWEEN SYMMETRIC node4.left_token - 1 AND node4.left_token - 50 AND node3.text_ref = node4.text_ref AND node4.corpus_ref = node5.corpus_ref AND node4.corpus_ref = node6.corpus_ref AND node5.corpus_ref = node6.corpus_ref AND node_annotation1.name = 'cat' AND node_annotation1.node_ref = node1.id AND node_annotation1.value = 'S' AND node_annotation3.name = 'pos' AND node_annotation3.node_ref = node3.id AND node_annotation3.value = 'VVFIN' AND node_annotation5.name = 'cat' AND node_annotation5.node_ref = node5.id AND node_annotation5.value = 'S' AND node_annotation6.name = 'cat' AND node_annotation6.node_ref = node6.id AND node_annotation6.value = 'S' AND rank1.component_ref = component1.id AND rank1.node_ref = node1.id AND rank1.pre = rank4.parent AND rank2.component_ref = component2.id AND rank2.node_ref = node2.id AND rank3.component_ref = component3.id AND rank3.node_ref = node3.id AND rank4.component_ref = component4.id AND rank4.node_ref = node4.id AND rank5.component_ref = component5.id AND rank5.node_ref = node5.id AND rank5.pre = rank2.parent AND rank6.component_ref = component6.id AND rank6.node_ref = node6.id AND rank6.pre = rank3.parent ) AS solutions;
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi Viktor, On Thu, Dec 08, 2011 at 04:51:03PM +0100, Viktor Rosenfeld wrote:
Hi,
Stefan Manegold wrote:
0) For easier readbility of the generated MAL plans, you might what to run your mserver5 in read-only mode, i.e., either set the database read-only via
monetdb set readonly=yes
or start mserver5 directly with extra commandline option "--readonly".
A side note: Is it correct that when I start mserver directly I can only use the demo database and monetdb user? If I try another database and/or another user I get an error message about wrong credentials.
Each mserver5 process serves only one database, the default is "demo". You can tell mserver5 via a --dbname commandline option, which database to serve. See `man mserver5` and `mserver5 --help` for more details. Please note that I mentioned the direct use of mserver5 only in case you are already familiar with that. In general, we strongly recomment to have monetdbd (merovingian) manage your database and respective server processes.
1) In case you have a multi-core/-CPU system, please first disable some of MonetDB inherent plan parallelisation (aka. "mitosis") by setting (in/via mclient) therespective optimizer pipeline as follows:
set optimizer='no_mitosis_pipe';
and then try to run your query, again, and report whether it works better or not.
I observe no difference in runtime or data written to disk when I use the no_mitosis_pipe optimizer. In fact, except for a few differently named BATs in the plan, there is only one difference I could spot:
--- q7-with-mitosis.plan 2011-12-08 13:58:37.181199000 +0100 +++ q7-without-mitosis.plan 2011-12-08 13:58:56.297199000 +0100 - X_178:bat[:oid,:lng] := batcalc.-(X_175,50:lng,true,false); + X_178:bat[:oid,:lng] := batcalc.-(X_175,50:lng);
Ah, I now releaize that you have point predicates in you query/ies, right? In that case, mitosis does not trigger since we want to (build and) exploit hash indices in the base column(s).
2) To analyze / profile the smaller case (that runs to completion), please run it in mclient prefixed with keyword "TRACE", both with the default optimizer pipeline and with mitosis disabled as described above, and report both results (i.e., the detailed profiling traces).
The problem is that these traces are *very* large. I have 3 queries which fill up the disk on the large dataset. 2 of these also cause considerable disk I/O when evaluated on the small dataset (6 MB and 100 MB respectively). The traces of these 2 queries take up 537 MB and 3.7 GB.
The 3rd query runs quickly on the small dataset and according to iostat only writes 92 kB to disk. The trace is 206 kB big, so I could post this if it would help.
I've attached the MAL plans with and without mitosis for the first query which writes 6 MB to disk.
Briefly looking at your plans, I notice that they contain BAT iterators in MAL; that's why the traces blow up. I don't have the time right now to explain what that means (the "experts will know" ;-)). It appears that they are triggered by the "between SYMMETRIC" construct. I filed a bug report / feature request abut this: http://bugs.monetdb.org/show_bug.cgi?id=2945 For now, if you would be able to omit the "SYMMETRIC", I expect at least the traces not to blow up. I cannot tell, whether also the memory usage / IO explosion would be avoided by this ...
I can also upload the traces to the web, but I'm not sure if that is useful because they are so large. Please let me know if it would help. I've also sorted the traces and can provide the statements taking up the most time.
Given the above, there is no need to share these huge traces. In case you manage to avoid the "SYMMETRIC", or once we have fixed it's translation to (efficient) MAL, and then the queries still "explode", the new (the smaller) traces would be helpful to dissect the problem further.
BTW, ticks is microseconds, correct? The output of stethoscope suggests it.
yes.
3) To check which MAL statement causes the excessive (intermediate) result creation, you can use the "stethoscope" tool to "listen" to teh server. Before running your query via mclient, start (in a separate terminal) the stethoscope tool as follows:
Except for the quick query these also produce very large results.
Do not redirect or pipe the output to a files, as this might loose some data due to a yet unresolved bug / design flaw. Rather, once the query finishes (or crashes) cut&paste the stethoscope output from the terminal to a file.
Once stethoscope is running and has connected to the server, run your query as usually (perferably again with both optimizer setting, but in separate sessions), and share the final stethoscope output with us.
For the quick query, cutting and pasting from a terminal and redirecting to a file produced the same number of lines. But if I understand correctly, you are only interested in the last couple of statements in the case of a crash?
Yes, indeed.
Can you also provide a command line to use for stethoscope, i.e. which columns to display and if I could filter for some events? Currently I'm using +trwbs.
For now, just use the default.
I will try to simplify the queries while keeping the disk I/O behavior. I hope that this will result in smaller traces.
Thanks, Viktor
Stefan -- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi, Stefan Manegold wrote:
Ah, I now releaize that you have point predicates in you query/ies, right? In that case, mitosis does not trigger since we want to (build and) exploit hash indices in the base column(s).
What do you mean by point predicates?
Briefly looking at your plans, I notice that they contain BAT iterators in MAL; that's why the traces blow up. I don't have the time right now to explain what that means (the "experts will know" ;-)). It appears that they are triggered by the "between SYMMETRIC" construct. I filed a bug report / feature request abut this: http://bugs.monetdb.org/show_bug.cgi?id=2945 For now, if you would be able to omit the "SYMMETRIC", I expect at least the traces not to blow up. I cannot tell, whether also the memory usage / IO explosion would be avoided by this ...
I have done a more detailed analysis with the 3 problematic queries. (They are called 7 through 9 in my benchmark set and I kept that numbering for simplicity.) I've uploaded the original SQL queries: http://www.informatik.hu-berlin.de/~rosenfel/monet/q7-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-count.sql All three queries fill up the hard disk on the large dataset. Only queries 7 and 9 use a BETWEEN SYMMETRIC predicate. Query 8 uses an inequality predicate on node.id (see below). The table below shows their evaluation time and disk I/O behavior on the small dataset. | Query | Time (ms) | Data written | |-------+-----------+--------------| | 7 | 962 | 6M | | 8 | 203 | 92k | | 9 | 9646 | 100M | As per Stefan's suggestion, I removed the SYMMETRIC predicate. I also tried 2 other strategies: - rewriting the BETWEEN predicate with an equivalent >= and <= comparison and - substituting the BETWEEN predicate with a single < comparison. (The last strategy is not semantically equivalent, but nevertheless correct for my application for these queries. However, I use the BETWEEN predicate in other places where this option is not available.) The next table shows the evaluation times (in ms) on the small dataset for query 7 and 9 using these 4 strategies: | Query | BETWEEN SYMMETRIC | BETWEEN | >= AND <= | < | |-------+-------------------+---------+-----------+-----| | 7 | 1002 | 342 | 335 | 66 | | 9 | 9908 | 6040 | 5990 | 185 | The next table shows the disk I/O behavior of query 9 using these 4 strategies on the small dataset. | Strategy | Read (MB) | Write (MB) | |-------------------+-----------+------------| | BETWEEN SYMMETRIC | 0 | 100 | | BETWEEN | 0 | 84 | | >= AND <= | 0 | 84 | | < | 0 | 0 | Using BETWEEN and using >= and <= has the same runtime behavior. Indeed, the MAL plans are identical. I have uploaded the MAL plans for query 9 for all 4 strategies. http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-with-mitosis.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<) Without the SYMMETRIC predicate the trace do not explode any longer. I have uploaded these for query 9. http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.trace (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.trace (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<) Out of curiosity, I also generated plans for a much simpler query. Starting from the query below, I generated MAL plans for all 4 strategies: SELECT count(*) FROM ( SELECT node1.id AS id1, node2.id AS id2, node1.toplevel_corpus FROM node AS node1, node AS node2 WHERE node1.right_token BETWEEN SYMMETRIC node2.left_token - 1 AND node2.left_token - 50 AND node1.text_ref = node2.text_ref ) AS solutions; http://www.informatik.hu-berlin.de/~rosenfel/monet/between-symmetric.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/no-precedence-restriction... (<) Interestingly, the BETWEEN variant and the >= AND <= variant do not produce the same MAL plan for the simple query as they do for query 9. Without the SYMMETRIC predicate, queries 7 and 9 still fill up my disk when I evaluate them on the large dataset. However, using only the < comparison they finish. The next table shows their runtime and disk I/O behavior for the < strategy. | Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 7 | 990 | 0 | 3 | | 9 | 56903 | 266 | 1621 | There was considerable variation in the evaluation times, which I usually do not observe for other queries. - Query 7 variation: 685 - 1833 - Query 9 variation: 48253 - 64511 There was also some variation in the amount of disk I/O for query 9: - Query 9 reads: 257, 296, 254, 237, 285 - Query 9 writes: 1736, 1545, 1545, 1736, 1545 (However, due to the long runtime, there might have been competing I/O from other processes.) Query 8 does not use a BETWEEN SYMMETRIC predicate, but does a few comparisons of the type nodeX.id <> nodeY.id where nodeX and nodeY are aliases for the node table. This seems to be the reason for filling up the disk. If I remove these predicates, query 8 finishes on the large dataset, although it still generates quite a lot of I/O: | Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 8 | 6443 | 0 | 107 | I've uploaded the traces and plans for query 8 with and without the <> comparisons (the traces are from the small dataset): http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.p... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.t... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.plan (without <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.trac... (without <>) I think the next step would be to run stethoscope on the query variations (except BETWEEN SYMMETRIC) that still fill up the disk and report the last few statements before the crash. Would this be helpful? I'm still puzzled though why data is being written to disk in the first place. The query runtime appears to be roughly proportional to the amount written (which isn't surprising). Thanks, Viktor
Hi Viktor, thanks for the detailed report. I'll read it later once I find the time; might be only end of next week, though ... With "point predicates" I refer to predicates the check for equality with a scalar value: column-expression = scalar-value, i.e., predicates that can benefit from a hash index. And my first guess with your Q8 and the inequality (self-?)join(?) is that this creates almost a Cartesian product of the one/two inputs, with large inputs, this can indeed become hugh ... More hopefully soon, Stefan On Fri, Dec 16, 2011 at 04:06:58PM +0100, Viktor Rosenfeld wrote:
Hi,
Stefan Manegold wrote:
Ah, I now releaize that you have point predicates in you query/ies, right? In that case, mitosis does not trigger since we want to (build and) exploit hash indices in the base column(s).
What do you mean by point predicates?
Briefly looking at your plans, I notice that they contain BAT iterators in MAL; that's why the traces blow up. I don't have the time right now to explain what that means (the "experts will know" ;-)). It appears that they are triggered by the "between SYMMETRIC" construct. I filed a bug report / feature request abut this: http://bugs.monetdb.org/show_bug.cgi?id=2945 For now, if you would be able to omit the "SYMMETRIC", I expect at least the traces not to blow up. I cannot tell, whether also the memory usage / IO explosion would be avoided by this ...
I have done a more detailed analysis with the 3 problematic queries. (They are called 7 through 9 in my benchmark set and I kept that numbering for simplicity.)
I've uploaded the original SQL queries:
http://www.informatik.hu-berlin.de/~rosenfel/monet/q7-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-count.sql
All three queries fill up the hard disk on the large dataset. Only queries 7 and 9 use a BETWEEN SYMMETRIC predicate. Query 8 uses an inequality predicate on node.id (see below).
The table below shows their evaluation time and disk I/O behavior on the small dataset.
| Query | Time (ms) | Data written | |-------+-----------+--------------| | 7 | 962 | 6M | | 8 | 203 | 92k | | 9 | 9646 | 100M |
As per Stefan's suggestion, I removed the SYMMETRIC predicate. I also tried 2 other strategies:
- rewriting the BETWEEN predicate with an equivalent >= and <= comparison and - substituting the BETWEEN predicate with a single < comparison.
(The last strategy is not semantically equivalent, but nevertheless correct for my application for these queries. However, I use the BETWEEN predicate in other places where this option is not available.)
The next table shows the evaluation times (in ms) on the small dataset for query 7 and 9 using these 4 strategies:
| Query | BETWEEN SYMMETRIC | BETWEEN | >= AND <= | < | |-------+-------------------+---------+-----------+-----| | 7 | 1002 | 342 | 335 | 66 | | 9 | 9908 | 6040 | 5990 | 185 |
The next table shows the disk I/O behavior of query 9 using these 4 strategies on the small dataset.
| Strategy | Read (MB) | Write (MB) | |-------------------+-----------+------------| | BETWEEN SYMMETRIC | 0 | 100 | | BETWEEN | 0 | 84 | | >= AND <= | 0 | 84 | | < | 0 | 0 |
Using BETWEEN and using >= and <= has the same runtime behavior. Indeed, the MAL plans are identical. I have uploaded the MAL plans for query 9 for all 4 strategies.
http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-with-mitosis.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<)
Without the SYMMETRIC predicate the trace do not explode any longer. I have uploaded these for query 9.
http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.trace (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.trace (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<)
Out of curiosity, I also generated plans for a much simpler query. Starting from the query below, I generated MAL plans for all 4 strategies:
SELECT count(*) FROM ( SELECT node1.id AS id1, node2.id AS id2, node1.toplevel_corpus FROM node AS node1, node AS node2 WHERE node1.right_token BETWEEN SYMMETRIC node2.left_token - 1 AND node2.left_token - 50 AND node1.text_ref = node2.text_ref ) AS solutions;
http://www.informatik.hu-berlin.de/~rosenfel/monet/between-symmetric.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/no-precedence-restriction... (<)
Interestingly, the BETWEEN variant and the >= AND <= variant do not produce the same MAL plan for the simple query as they do for query 9.
Without the SYMMETRIC predicate, queries 7 and 9 still fill up my disk when I evaluate them on the large dataset. However, using only the < comparison they finish. The next table shows their runtime and disk I/O behavior for the < strategy.
| Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 7 | 990 | 0 | 3 | | 9 | 56903 | 266 | 1621 |
There was considerable variation in the evaluation times, which I usually do not observe for other queries.
- Query 7 variation: 685 - 1833 - Query 9 variation: 48253 - 64511
There was also some variation in the amount of disk I/O for query 9:
- Query 9 reads: 257, 296, 254, 237, 285 - Query 9 writes: 1736, 1545, 1545, 1736, 1545
(However, due to the long runtime, there might have been competing I/O from other processes.)
Query 8 does not use a BETWEEN SYMMETRIC predicate, but does a few comparisons of the type nodeX.id <> nodeY.id where nodeX and nodeY are aliases for the node table. This seems to be the reason for filling up the disk. If I remove these predicates, query 8 finishes on the large dataset, although it still generates quite a lot of I/O:
| Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 8 | 6443 | 0 | 107 |
I've uploaded the traces and plans for query 8 with and without the <> comparisons (the traces are from the small dataset):
http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.p... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.t... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.plan (without <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.trac... (without <>)
I think the next step would be to run stethoscope on the query variations (except BETWEEN SYMMETRIC) that still fill up the disk and report the last few statements before the crash. Would this be helpful?
I'm still puzzled though why data is being written to disk in the first place. The query runtime appears to be roughly proportional to the amount written (which isn't surprising).
Thanks, Viktor
------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi, Stefan Manegold wrote:
Hi Viktor,
And my first guess with your Q8 and the inequality (self-?)join(?) is that this creates almost a Cartesian product of the one/two inputs, with large inputs, this can indeed become hugh ...
The self-join on node.id should not produce a big result, because it is restricted by an equality join on rank.parent. I.e. the pattern looks like this: rank1.parent = rank2.parent AND node1.id <> node2.id AND rank1.node_ref = node1.id AND rank2.node_ref = node2.id Cheers, Viktor
Hi Stephan, has there been any progress on this problem? Or is there something I can do to move it along? Also, I still don't fully understand why the data is written to disk at all especially since it would easily fit into main memory. Is it because BATs are memory-mapped to files? Cheers, Viktor Stefan Manegold wrote:
Hi Viktor,
thanks for the detailed report. I'll read it later once I find the time; might be only end of next week, though ...
With "point predicates" I refer to predicates the check for equality with a scalar value: column-expression = scalar-value, i.e., predicates that can benefit from a hash index.
And my first guess with your Q8 and the inequality (self-?)join(?) is that this creates almost a Cartesian product of the one/two inputs, with large inputs, this can indeed become hugh ...
More hopefully soon, Stefan
On Fri, Dec 16, 2011 at 04:06:58PM +0100, Viktor Rosenfeld wrote:
Hi,
Stefan Manegold wrote:
Ah, I now releaize that you have point predicates in you query/ies, right? In that case, mitosis does not trigger since we want to (build and) exploit hash indices in the base column(s).
What do you mean by point predicates?
Briefly looking at your plans, I notice that they contain BAT iterators in MAL; that's why the traces blow up. I don't have the time right now to explain what that means (the "experts will know" ;-)). It appears that they are triggered by the "between SYMMETRIC" construct. I filed a bug report / feature request abut this: http://bugs.monetdb.org/show_bug.cgi?id=2945 For now, if you would be able to omit the "SYMMETRIC", I expect at least the traces not to blow up. I cannot tell, whether also the memory usage / IO explosion would be avoided by this ...
I have done a more detailed analysis with the 3 problematic queries. (They are called 7 through 9 in my benchmark set and I kept that numbering for simplicity.)
I've uploaded the original SQL queries:
http://www.informatik.hu-berlin.de/~rosenfel/monet/q7-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-count.sql
All three queries fill up the hard disk on the large dataset. Only queries 7 and 9 use a BETWEEN SYMMETRIC predicate. Query 8 uses an inequality predicate on node.id (see below).
The table below shows their evaluation time and disk I/O behavior on the small dataset.
| Query | Time (ms) | Data written | |-------+-----------+--------------| | 7 | 962 | 6M | | 8 | 203 | 92k | | 9 | 9646 | 100M |
As per Stefan's suggestion, I removed the SYMMETRIC predicate. I also tried 2 other strategies:
- rewriting the BETWEEN predicate with an equivalent >= and <= comparison and - substituting the BETWEEN predicate with a single < comparison.
(The last strategy is not semantically equivalent, but nevertheless correct for my application for these queries. However, I use the BETWEEN predicate in other places where this option is not available.)
The next table shows the evaluation times (in ms) on the small dataset for query 7 and 9 using these 4 strategies:
| Query | BETWEEN SYMMETRIC | BETWEEN | >= AND <= | < | |-------+-------------------+---------+-----------+-----| | 7 | 1002 | 342 | 335 | 66 | | 9 | 9908 | 6040 | 5990 | 185 |
The next table shows the disk I/O behavior of query 9 using these 4 strategies on the small dataset.
| Strategy | Read (MB) | Write (MB) | |-------------------+-----------+------------| | BETWEEN SYMMETRIC | 0 | 100 | | BETWEEN | 0 | 84 | | >= AND <= | 0 | 84 | | < | 0 | 0 |
Using BETWEEN and using >= and <= has the same runtime behavior. Indeed, the MAL plans are identical. I have uploaded the MAL plans for query 9 for all 4 strategies.
http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-with-mitosis.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<)
Without the SYMMETRIC predicate the trace do not explode any longer. I have uploaded these for query 9.
http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.trace (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.trace (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<)
Out of curiosity, I also generated plans for a much simpler query. Starting from the query below, I generated MAL plans for all 4 strategies:
SELECT count(*) FROM ( SELECT node1.id AS id1, node2.id AS id2, node1.toplevel_corpus FROM node AS node1, node AS node2 WHERE node1.right_token BETWEEN SYMMETRIC node2.left_token - 1 AND node2.left_token - 50 AND node1.text_ref = node2.text_ref ) AS solutions;
http://www.informatik.hu-berlin.de/~rosenfel/monet/between-symmetric.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/no-precedence-restriction... (<)
Interestingly, the BETWEEN variant and the >= AND <= variant do not produce the same MAL plan for the simple query as they do for query 9.
Without the SYMMETRIC predicate, queries 7 and 9 still fill up my disk when I evaluate them on the large dataset. However, using only the < comparison they finish. The next table shows their runtime and disk I/O behavior for the < strategy.
| Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 7 | 990 | 0 | 3 | | 9 | 56903 | 266 | 1621 |
There was considerable variation in the evaluation times, which I usually do not observe for other queries.
- Query 7 variation: 685 - 1833 - Query 9 variation: 48253 - 64511
There was also some variation in the amount of disk I/O for query 9:
- Query 9 reads: 257, 296, 254, 237, 285 - Query 9 writes: 1736, 1545, 1545, 1736, 1545
(However, due to the long runtime, there might have been competing I/O from other processes.)
Query 8 does not use a BETWEEN SYMMETRIC predicate, but does a few comparisons of the type nodeX.id <> nodeY.id where nodeX and nodeY are aliases for the node table. This seems to be the reason for filling up the disk. If I remove these predicates, query 8 finishes on the large dataset, although it still generates quite a lot of I/O:
| Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 8 | 6443 | 0 | 107 |
I've uploaded the traces and plans for query 8 with and without the <> comparisons (the traces are from the small dataset):
http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.p... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.t... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.plan (without <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.trac... (without <>)
I think the next step would be to run stethoscope on the query variations (except BETWEEN SYMMETRIC) that still fill up the disk and report the last few statements before the crash. Would this be helpful?
I'm still puzzled though why data is being written to disk in the first place. The query runtime appears to be roughly proportional to the amount written (which isn't surprising).
Thanks, Viktor
------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi Viktor, unfortunately, we did not manage to dive deeper into these problems, yet. I hope to find some time by end of next week --- too much travelling and other obligations in between. Wrt. to the I/O, I did locate and fix some issues in the Dec2011 branch; cf., http://dev.monetdb.org/hg/MonetDB/rev/dc8066d67faa http://dev.monetdb.org/hg/MonetDB/rev/bb56535aa6df http://dev.monetdb.org/hg/MonetDB/rev/d30146b4d610 http://dev.monetdb.org/hg/MonetDB/rev/cf2ff66d08fa It would be great, if you could test your scenario(s) with the latest Dec2011 branch, either from HG or from http://monetdb.cwi.nl/testweb/web/42751:3ab84718f7e1/ or http://monetdb.cwi.nl/testweb/web/MonetDB-Dec2011-latest.tar.bz2 and report. (Dec2011 has not yet been released, yet, but we hope to do so still this month.) Please note that efficient handling of "BETWEEN SYMMETRIC" has not been implemented, yet; cf., http://bugs.monetdb.org/show_bug.cgi?id=2945 regards, Stefan On Wed, Jan 11, 2012 at 11:19:25AM +0100, Viktor Rosenfeld wrote:
Hi Stephan,
has there been any progress on this problem? Or is there something I can do to move it along?
Also, I still don't fully understand why the data is written to disk at all especially since it would easily fit into main memory. Is it because BATs are memory-mapped to files?
Cheers, Viktor
Stefan Manegold wrote:
Hi Viktor,
thanks for the detailed report. I'll read it later once I find the time; might be only end of next week, though ...
With "point predicates" I refer to predicates the check for equality with a scalar value: column-expression = scalar-value, i.e., predicates that can benefit from a hash index.
And my first guess with your Q8 and the inequality (self-?)join(?) is that this creates almost a Cartesian product of the one/two inputs, with large inputs, this can indeed become hugh ...
More hopefully soon, Stefan
On Fri, Dec 16, 2011 at 04:06:58PM +0100, Viktor Rosenfeld wrote:
Hi,
Stefan Manegold wrote:
Ah, I now releaize that you have point predicates in you query/ies, right? In that case, mitosis does not trigger since we want to (build and) exploit hash indices in the base column(s).
What do you mean by point predicates?
Briefly looking at your plans, I notice that they contain BAT iterators in MAL; that's why the traces blow up. I don't have the time right now to explain what that means (the "experts will know" ;-)). It appears that they are triggered by the "between SYMMETRIC" construct. I filed a bug report / feature request abut this: http://bugs.monetdb.org/show_bug.cgi?id=2945 For now, if you would be able to omit the "SYMMETRIC", I expect at least the traces not to blow up. I cannot tell, whether also the memory usage / IO explosion would be avoided by this ...
I have done a more detailed analysis with the 3 problematic queries. (They are called 7 through 9 in my benchmark set and I kept that numbering for simplicity.)
I've uploaded the original SQL queries:
http://www.informatik.hu-berlin.de/~rosenfel/monet/q7-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-count.sql
All three queries fill up the hard disk on the large dataset. Only queries 7 and 9 use a BETWEEN SYMMETRIC predicate. Query 8 uses an inequality predicate on node.id (see below).
The table below shows their evaluation time and disk I/O behavior on the small dataset.
| Query | Time (ms) | Data written | |-------+-----------+--------------| | 7 | 962 | 6M | | 8 | 203 | 92k | | 9 | 9646 | 100M |
As per Stefan's suggestion, I removed the SYMMETRIC predicate. I also tried 2 other strategies:
- rewriting the BETWEEN predicate with an equivalent >= and <= comparison and - substituting the BETWEEN predicate with a single < comparison.
(The last strategy is not semantically equivalent, but nevertheless correct for my application for these queries. However, I use the BETWEEN predicate in other places where this option is not available.)
The next table shows the evaluation times (in ms) on the small dataset for query 7 and 9 using these 4 strategies:
| Query | BETWEEN SYMMETRIC | BETWEEN | >= AND <= | < | |-------+-------------------+---------+-----------+-----| | 7 | 1002 | 342 | 335 | 66 | | 9 | 9908 | 6040 | 5990 | 185 |
The next table shows the disk I/O behavior of query 9 using these 4 strategies on the small dataset.
| Strategy | Read (MB) | Write (MB) | |-------------------+-----------+------------| | BETWEEN SYMMETRIC | 0 | 100 | | BETWEEN | 0 | 84 | | >= AND <= | 0 | 84 | | < | 0 | 0 |
Using BETWEEN and using >= and <= has the same runtime behavior. Indeed, the MAL plans are identical. I have uploaded the MAL plans for query 9 for all 4 strategies.
http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-with-mitosis.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<)
Without the SYMMETRIC predicate the trace do not explode any longer. I have uploaded these for query 9.
http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.trace (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.trace (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<)
Out of curiosity, I also generated plans for a much simpler query. Starting from the query below, I generated MAL plans for all 4 strategies:
SELECT count(*) FROM ( SELECT node1.id AS id1, node2.id AS id2, node1.toplevel_corpus FROM node AS node1, node AS node2 WHERE node1.right_token BETWEEN SYMMETRIC node2.left_token - 1 AND node2.left_token - 50 AND node1.text_ref = node2.text_ref ) AS solutions;
http://www.informatik.hu-berlin.de/~rosenfel/monet/between-symmetric.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/no-precedence-restriction... (<)
Interestingly, the BETWEEN variant and the >= AND <= variant do not produce the same MAL plan for the simple query as they do for query 9.
Without the SYMMETRIC predicate, queries 7 and 9 still fill up my disk when I evaluate them on the large dataset. However, using only the < comparison they finish. The next table shows their runtime and disk I/O behavior for the < strategy.
| Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 7 | 990 | 0 | 3 | | 9 | 56903 | 266 | 1621 |
There was considerable variation in the evaluation times, which I usually do not observe for other queries.
- Query 7 variation: 685 - 1833 - Query 9 variation: 48253 - 64511
There was also some variation in the amount of disk I/O for query 9:
- Query 9 reads: 257, 296, 254, 237, 285 - Query 9 writes: 1736, 1545, 1545, 1736, 1545
(However, due to the long runtime, there might have been competing I/O from other processes.)
Query 8 does not use a BETWEEN SYMMETRIC predicate, but does a few comparisons of the type nodeX.id <> nodeY.id where nodeX and nodeY are aliases for the node table. This seems to be the reason for filling up the disk. If I remove these predicates, query 8 finishes on the large dataset, although it still generates quite a lot of I/O:
| Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 8 | 6443 | 0 | 107 |
I've uploaded the traces and plans for query 8 with and without the <> comparisons (the traces are from the small dataset):
http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.p... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.t... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.plan (without <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.trac... (without <>)
I think the next step would be to run stethoscope on the query variations (except BETWEEN SYMMETRIC) that still fill up the disk and report the last few statements before the crash. Would this be helpful?
I'm still puzzled though why data is being written to disk in the first place. The query runtime appears to be roughly proportional to the amount written (which isn't surprising).
Thanks, Viktor
------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi Stefan, the latest Dec2011 branch is a big improvement. None of my test queries generate I/O anymore, except for one which writes 3.7 GB to disk. With the Aug2011 release the other test queries wrote between 10 and 520 MB. My laptop has 4 GB RAM so from your patches I think the cut-off for mmap is 667 MB. On my laptop most queries are sped up by factor 5 to 10. On a big-iron server the speedup is less pronounced, around 1.2 to 1.5. Queries which are dominated by regular expression precidates (pcre.match function) are not affected so much. On my laptop the speedup is between 1.2 and 1.8 and on the server there is no noticable difference. Also, it appears that there were a few other changes in the Dec2011 release which help with some queries. I tested the queries without using BETWEEN or BETWEEN SYMMETRIC predicates for precedence operations and without inequality joins for sibling operations (i.e. I used the workarounds described in one of previous mails.) Thanks, Viktor Stefan Manegold wrote:
Hi Viktor,
unfortunately, we did not manage to dive deeper into these problems, yet. I hope to find some time by end of next week --- too much travelling and other obligations in between.
Wrt. to the I/O, I did locate and fix some issues in the Dec2011 branch; cf., http://dev.monetdb.org/hg/MonetDB/rev/dc8066d67faa http://dev.monetdb.org/hg/MonetDB/rev/bb56535aa6df http://dev.monetdb.org/hg/MonetDB/rev/d30146b4d610 http://dev.monetdb.org/hg/MonetDB/rev/cf2ff66d08fa
It would be great, if you could test your scenario(s) with the latest Dec2011 branch, either from HG or from http://monetdb.cwi.nl/testweb/web/42751:3ab84718f7e1/ or http://monetdb.cwi.nl/testweb/web/MonetDB-Dec2011-latest.tar.bz2 and report.
(Dec2011 has not yet been released, yet, but we hope to do so still this month.)
Please note that efficient handling of "BETWEEN SYMMETRIC" has not been implemented, yet; cf., http://bugs.monetdb.org/show_bug.cgi?id=2945
regards, Stefan
On Wed, Jan 11, 2012 at 11:19:25AM +0100, Viktor Rosenfeld wrote:
Hi Stephan,
has there been any progress on this problem? Or is there something I can do to move it along?
Also, I still don't fully understand why the data is written to disk at all especially since it would easily fit into main memory. Is it because BATs are memory-mapped to files?
Cheers, Viktor
Stefan Manegold wrote:
Hi Viktor,
thanks for the detailed report. I'll read it later once I find the time; might be only end of next week, though ...
With "point predicates" I refer to predicates the check for equality with a scalar value: column-expression = scalar-value, i.e., predicates that can benefit from a hash index.
And my first guess with your Q8 and the inequality (self-?)join(?) is that this creates almost a Cartesian product of the one/two inputs, with large inputs, this can indeed become hugh ...
More hopefully soon, Stefan
On Fri, Dec 16, 2011 at 04:06:58PM +0100, Viktor Rosenfeld wrote:
Hi,
Stefan Manegold wrote:
Ah, I now releaize that you have point predicates in you query/ies, right? In that case, mitosis does not trigger since we want to (build and) exploit hash indices in the base column(s).
What do you mean by point predicates?
Briefly looking at your plans, I notice that they contain BAT iterators in MAL; that's why the traces blow up. I don't have the time right now to explain what that means (the "experts will know" ;-)). It appears that they are triggered by the "between SYMMETRIC" construct. I filed a bug report / feature request abut this: http://bugs.monetdb.org/show_bug.cgi?id=2945 For now, if you would be able to omit the "SYMMETRIC", I expect at least the traces not to blow up. I cannot tell, whether also the memory usage / IO explosion would be avoided by this ...
I have done a more detailed analysis with the 3 problematic queries. (They are called 7 through 9 in my benchmark set and I kept that numbering for simplicity.)
I've uploaded the original SQL queries:
http://www.informatik.hu-berlin.de/~rosenfel/monet/q7-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-count.sql http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-count.sql
All three queries fill up the hard disk on the large dataset. Only queries 7 and 9 use a BETWEEN SYMMETRIC predicate. Query 8 uses an inequality predicate on node.id (see below).
The table below shows their evaluation time and disk I/O behavior on the small dataset.
| Query | Time (ms) | Data written | |-------+-----------+--------------| | 7 | 962 | 6M | | 8 | 203 | 92k | | 9 | 9646 | 100M |
As per Stefan's suggestion, I removed the SYMMETRIC predicate. I also tried 2 other strategies:
- rewriting the BETWEEN predicate with an equivalent >= and <= comparison and - substituting the BETWEEN predicate with a single < comparison.
(The last strategy is not semantically equivalent, but nevertheless correct for my application for these queries. However, I use the BETWEEN predicate in other places where this option is not available.)
The next table shows the evaluation times (in ms) on the small dataset for query 7 and 9 using these 4 strategies:
| Query | BETWEEN SYMMETRIC | BETWEEN | >= AND <= | < | |-------+-------------------+---------+-----------+-----| | 7 | 1002 | 342 | 335 | 66 | | 9 | 9908 | 6040 | 5990 | 185 |
The next table shows the disk I/O behavior of query 9 using these 4 strategies on the small dataset.
| Strategy | Read (MB) | Write (MB) | |-------------------+-----------+------------| | BETWEEN SYMMETRIC | 0 | 100 | | BETWEEN | 0 | 84 | | >= AND <= | 0 | 84 | | < | 0 | 0 |
Using BETWEEN and using >= and <= has the same runtime behavior. Indeed, the MAL plans are identical. I have uploaded the MAL plans for query 9 for all 4 strategies.
http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-with-mitosis.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<)
Without the SYMMETRIC predicate the trace do not explode any longer. I have uploaded these for query 9.
http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-symmetric.trace (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-workaround.trace (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/q9-no-precedence-restrict... (<)
Out of curiosity, I also generated plans for a much simpler query. Starting from the query below, I generated MAL plans for all 4 strategies:
SELECT count(*) FROM ( SELECT node1.id AS id1, node2.id AS id2, node1.toplevel_corpus FROM node AS node1, node AS node2 WHERE node1.right_token BETWEEN SYMMETRIC node2.left_token - 1 AND node2.left_token - 50 AND node1.text_ref = node2.text_ref ) AS solutions;
http://www.informatik.hu-berlin.de/~rosenfel/monet/between-symmetric.plan (BETWEEN SYMMETRIC) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-no-symmetric.plan (BETWEEN) http://www.informatik.hu-berlin.de/~rosenfel/monet/between-workaround.plan (>= AND <=) http://www.informatik.hu-berlin.de/~rosenfel/monet/no-precedence-restriction... (<)
Interestingly, the BETWEEN variant and the >= AND <= variant do not produce the same MAL plan for the simple query as they do for query 9.
Without the SYMMETRIC predicate, queries 7 and 9 still fill up my disk when I evaluate them on the large dataset. However, using only the < comparison they finish. The next table shows their runtime and disk I/O behavior for the < strategy.
| Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 7 | 990 | 0 | 3 | | 9 | 56903 | 266 | 1621 |
There was considerable variation in the evaluation times, which I usually do not observe for other queries.
- Query 7 variation: 685 - 1833 - Query 9 variation: 48253 - 64511
There was also some variation in the amount of disk I/O for query 9:
- Query 9 reads: 257, 296, 254, 237, 285 - Query 9 writes: 1736, 1545, 1545, 1736, 1545
(However, due to the long runtime, there might have been competing I/O from other processes.)
Query 8 does not use a BETWEEN SYMMETRIC predicate, but does a few comparisons of the type nodeX.id <> nodeY.id where nodeX and nodeY are aliases for the node table. This seems to be the reason for filling up the disk. If I remove these predicates, query 8 finishes on the large dataset, although it still generates quite a lot of I/O:
| Query | Time (ms) | Read (MB) | Written (MB) | |-------+-----------+-----------+--------------| | 8 | 6443 | 0 | 107 |
I've uploaded the traces and plans for query 8 with and without the <> comparisons (the traces are from the small dataset):
http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.p... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-no-identical-sibling.t... (with <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.plan (without <>) http://www.informatik.hu-berlin.de/~rosenfel/monet/q8-identical-sibling.trac... (without <>)
I think the next step would be to run stethoscope on the query variations (except BETWEEN SYMMETRIC) that still fill up the disk and report the last few statements before the crash. Would this be helpful?
I'm still puzzled though why data is being written to disk in the first place. The query runtime appears to be roughly proportional to the amount written (which isn't surprising).
Thanks, Viktor
------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi, I forgot to attach the query to my last mail. This one writes 6 MB to disk. Cheers, Viktor Stefan Manegold wrote:
Hi Viktor,
thanks for reporting, and sorry for the inconveniences!
Here are some steps you could perform and report about to help us with analysing and understanding the problem:
0) For easier readbility of the generated MAL plans, you might what to run your mserver5 in read-only mode, i.e., either set the database read-only via
monetdb set readonly=yes
or start mserver5 directly with extra commandline option "--readonly".
1) In case you have a multi-core/-CPU system, please first disable some of MonetDB inherent plan parallelisation (aka. "mitosis") by setting (in/via mclient) therespective optimizer pipeline as follows:
set optimizer='no_mitosis_pipe';
and then try to run your query, again, and report whether it works better or not.
In some cases, in particular with complex queries like your's MonetDB's plan parallelisation can be sub-optimal and, e.g., result in unexpectely/unnecessarily large intermediate results.
You can check the details of active optimizer pipeline via
select optimizer;
You can reset the optimizer pipeline to the default via
set optimizer='default_pipe';
You can check the generated MAL plans by prefixing your query in mclient with keyword "EXPLAIN".
2) To analyze / profile the smaller case (that runs to completion), please run it in mclient prefixed with keyword "TRACE", both with the default optimizer pipeline and with mitosis disabled as described above, and report both results (i.e., the detailed profiling traces).
3) To check which MAL statement causes the excessive (intermediate) result creation, you can use the "stethoscope" tool to "listen" to teh server. Before running your query via mclient, start (in a separate terminal) the stethoscope tool as follows:
stethoscope -u <username> -P <password> -D
Do not redirect or pipe the output to a files, as this might loose some data due to a yet unresolved bug / design flaw. Rather, once the query finishes (or crashes) cut&paste the stethoscope output from the terminal to a file.
Once stethoscope is running and has connected to the server, run your query as usually (perferably again with both optimizer setting, but in separate sessions), and share the final stethoscope output with us.
Kind regards, Stefan
On Tue, Dec 06, 2011 at 04:53:50PM +0100, Viktor Rosenfeld wrote:
Hi,
I'm observing disk I/O when evaluating some queries which I am unable to explain. I have two datasets, one relatively small with the BATs taking up 1.6 MB on disk and one fairly large with 283 MB on disk.
According to iostat, when I evaluate the attached query on the small dataset, 100 MB are written to disk, but none are read. This is on a Linux machine with 4 GB RAM running the Aug2011-SP3 release.
I don't understand why data is written on disk, that
(a) is never read again and (b) would easily fit into main memory.
I am unable to evaluate the queries on the large dataset because MonetDB writes so much data that the disk fills up after a few minutes (more than 22GB).
Please let me know if I can provide more information.
Cheers, Viktor
SELECT count(*) FROM ( SELECT DISTINCT node1.id AS id1, node2.id AS id2, node3.id AS id3, node4.id AS id4, node5.id AS id5, node6.id AS id6, node1.toplevel_corpus FROM node AS node1, rank AS rank1, component AS component1, node_annotation AS node_annotation1, node AS node2, rank AS rank2, component AS component2, edge_annotation AS edge_annotation2, node AS node3, rank AS rank3, component AS component3, node_annotation AS node_annotation3, node AS node4, rank AS rank4, component AS component4, edge_annotation AS edge_annotation4, node AS node5, rank AS rank5, component AS component5, node_annotation AS node_annotation5, node AS node6, rank AS rank6, component AS component6, node_annotation AS node_annotation6 WHERE component2.name IS NULL AND component2.type = 'd' AND component3.name IS NULL AND component3.type = 'd' AND component4.name IS NULL AND component4.type = 'd' AND edge_annotation2.name = 'func' AND edge_annotation2.rank_ref = rank2.pre AND edge_annotation2.value = 'OA' AND edge_annotation4.name = 'func' AND edge_annotation4.rank_ref = rank4.pre AND edge_annotation4.value = 'SB' AND node1.corpus_ref = node2.corpus_ref AND node1.corpus_ref = node3.corpus_ref AND node1.corpus_ref = node4.corpus_ref AND node1.corpus_ref = node5.corpus_ref AND node1.corpus_ref = node6.corpus_ref AND node1.id = node5.id AND node1.id = node6.id AND node2.corpus_ref = node4.corpus_ref AND node2.corpus_ref = node5.corpus_ref AND node2.corpus_ref = node6.corpus_ref AND node2.right_token BETWEEN SYMMETRIC node3.left_token - 1 AND node3.left_token - 50 AND node2.text_ref = node3.text_ref AND node3.corpus_ref = node5.corpus_ref AND node3.corpus_ref = node6.corpus_ref AND node3.right_token BETWEEN SYMMETRIC node4.left_token - 1 AND node4.left_token - 50 AND node3.text_ref = node4.text_ref AND node4.corpus_ref = node5.corpus_ref AND node4.corpus_ref = node6.corpus_ref AND node5.corpus_ref = node6.corpus_ref AND node_annotation1.name = 'cat' AND node_annotation1.node_ref = node1.id AND node_annotation1.value = 'S' AND node_annotation3.name = 'pos' AND node_annotation3.node_ref = node3.id AND node_annotation3.value = 'VVFIN' AND node_annotation5.name = 'cat' AND node_annotation5.node_ref = node5.id AND node_annotation5.value = 'S' AND node_annotation6.name = 'cat' AND node_annotation6.node_ref = node6.id AND node_annotation6.value = 'S' AND rank1.component_ref = component1.id AND rank1.node_ref = node1.id AND rank1.pre = rank4.parent AND rank2.component_ref = component2.id AND rank2.node_ref = node2.id AND rank3.component_ref = component3.id AND rank3.node_ref = node3.id AND rank4.component_ref = component4.id AND rank4.node_ref = node4.id AND rank5.component_ref = component5.id AND rank5.node_ref = node5.id AND rank5.pre = rank2.parent AND rank6.component_ref = component6.id AND rank6.node_ref = node6.id AND rank6.pre = rank3.parent ) AS solutions;
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (3)
-
Fabian Groffen
-
Stefan Manegold
-
Viktor Rosenfeld