[MonetDB-users] Performance question between schema
With Monet DB I've encountered a counter- intuitive performance characteristic. I have a dataset of 33554432 (2^25) triples, (a, b, c). The values a and b are uniformly distributed on [0, 16), and c is unique to each triple, on [0, 2^25). I store them in two ways: Schema 1: One table with three columns (a, b, c) Schema 2: Two tables with two columns each, (a, c) and (b, c) I'm running SQL queries to select the "c" value from the triples based on a=X and b=Y. With schema 1, this is a single select. With schema 2, this is an inner join between two selects. What surprised me is that the query against schema 2 is on average >50 ms faster to return than the query against schema 1. To me this was counter- intuitive since schema 1 stores less data, and the data are pre-joined (in a row-oriented sense) for the type of filter applied. However, I'm guessing this has to do with how MonetDB stores data in columns. Is the query against schema 1 actually doing three joins, "a" -> "b" -> "c", on an artificial join key? For example, -- Schema 1) create table abc (a bigint, b bigint, c bigint); -- Schema 2) create table ac (a bigint, c bigint); create table bc (b bigint, c bigint); copy 33554432 records into abc from '/home/hadoop/abc.monetdb'; copy 33554432 records into ac from '/home/hadoop/ac.monetdb'; copy 33554432 records into bc from '/home/hadoop/bc.monetdb'; -- Query Schema 1) select abc.c from abc where abc.a=3 and abc.b=7 order by abc.c asc -- Times to return (JDBC) -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.287, SQL query: 0.287, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.288, SQL query: 0.288, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.35, SQL query: 0.35, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.372, SQL query: 0.372, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.405, SQL query: 0.405, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.285, SQL query: 0.285, Building output: 0 -- Query Schema 2) select ac.c from ac inner join bc on bc.b=7 and bc.c = ac.c where ac.a = 3 order by ac.c asc -- Times to return (JDBC) -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.218, SQL query: 0.218, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.216, SQL query: 0.216, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.318, SQL query: 0.318, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.322, SQL query: 0.322, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.218, SQL query: 0.218, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.219, SQL query: 0.219, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.222, SQL query: 0.222, Building output: 0
You might want to use the EXPLAIN <sqlstmt> or the TRACE <sqlstmt> to assess where the time goes. On 1/26/11 6:35 PM, brien colwell wrote:
With Monet DB I've encountered a counter- intuitive performance characteristic. I have a dataset of 33554432 (2^25) triples, (a, b, c). The values a and b are uniformly distributed on [0, 16), and c is unique to each triple, on [0, 2^25).
I store them in two ways: Schema 1: One table with three columns (a, b, c) Schema 2: Two tables with two columns each, (a, c) and (b, c)
I'm running SQL queries to select the "c" value from the triples based on a=X and b=Y. With schema 1, this is a single select. With schema 2, this is an inner join between two selects.
What surprised me is that the query against schema 2 is on average>50 ms faster to return than the query against schema 1. To me this was counter- intuitive since schema 1 stores less data, and the data are pre-joined (in a row-oriented sense) for the type of filter applied.
However, I'm guessing this has to do with how MonetDB stores data in columns. Is the query against schema 1 actually doing three joins, "a" -> "b" -> "c", on an artificial join key?
For example,
-- Schema 1) create table abc (a bigint, b bigint, c bigint); -- Schema 2) create table ac (a bigint, c bigint); create table bc (b bigint, c bigint);
copy 33554432 records into abc from '/home/hadoop/abc.monetdb';
copy 33554432 records into ac from '/home/hadoop/ac.monetdb';
copy 33554432 records into bc from '/home/hadoop/bc.monetdb';
-- Query Schema 1) select abc.c from abc where abc.a=3 and abc.b=7 order by abc.c asc -- Times to return (JDBC) -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.287, SQL query: 0.287, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.288, SQL query: 0.288, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.35, SQL query: 0.35, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.372, SQL query: 0.372, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.405, SQL query: 0.405, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.285, SQL query: 0.285, Building output: 0
-- Query Schema 2) select ac.c from ac inner join bc on bc.b=7 and bc.c = ac.c where ac.a = 3 order by ac.c asc -- Times to return (JDBC) -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.218, SQL query: 0.218, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.216, SQL query: 0.216, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.318, SQL query: 0.318, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.322, SQL query: 0.322, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.218, SQL query: 0.218, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.219, SQL query: 0.219, Building output: 0 -- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.222, SQL query: 0.222, Building output: 0
------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
brien colwell
-
Martin Kersten