[MonetDB-users] Performance question between schema

26 Jan 2011

      With Monet DB I've encountered a counter- intuitive performance
characteristic. I have a dataset of 33554432 (2^25) triples, (a, b,
c). The values a and b are uniformly distributed on [0, 16), and c is
unique to each triple, on [0, 2^25).

I store them in two ways:
Schema 1: One table with three columns (a, b, c)
Schema 2: Two tables with two columns each, (a, c) and (b, c)

I'm running SQL queries to select the "c" value from the triples based
on a=X and b=Y. With schema 1, this is a single select. With schema 2,
this is an inner join between two selects.

What surprised me is that the query against schema 2 is on average >50
ms faster to return than the query against schema 1. To me this was
counter- intuitive since schema 1 stores less data, and the data are
pre-joined (in a row-oriented sense) for the type of filter applied.

However, I'm guessing this has to do with how MonetDB stores data in
columns. Is the query against schema 1 actually doing three joins, "a"
-> "b" -> "c", on an artificial join key?

For example,

-- Schema 1)
create table abc (a bigint, b bigint, c bigint);
-- Schema 2)
create table ac (a bigint, c bigint);
create table bc (b bigint, c bigint);

copy 33554432 records into abc
from '/home/hadoop/abc.monetdb';

copy 33554432 records into ac
from '/home/hadoop/ac.monetdb';

copy 33554432 records into bc
from '/home/hadoop/bc.monetdb';

-- Query Schema 1)
select abc.c from abc
where abc.a=3 and abc.b=7
order by abc.c asc
-- Times to return (JDBC)
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.287, SQL query: 0.287, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.288, SQL query: 0.288, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total: 0.35,
SQL query: 0.35, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.372, SQL query: 0.372, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.405, SQL query: 0.405, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.285, SQL query: 0.285, Building output: 0

-- Query Schema 2)
select ac.c from ac
inner join bc
on bc.b=7 and bc.c = ac.c
where ac.a = 3
order by ac.c asc
-- Times to return (JDBC)
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.218, SQL query: 0.218, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.216, SQL query: 0.216, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.318, SQL query: 0.318, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.322, SQL query: 0.322, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.218, SQL query: 0.218, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.219, SQL query: 0.219, Building output: 0
-- Query 1 of 1, Rows read: 100, Elapsed time (seconds) - Total:
0.222, SQL query: 0.222, Building output: 0

brien colwell

Martin Kersten

tags

participants (2)