[MonetDB-users] Problem with running TPC-H query No.5
I am using MonetDB for benchmarking on the TPC-H workload. I have successfully run queries 1, 3 and 10. However, when I try to run query 5: select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' year group by n_name order by revenue desc; it takes about two (2) hours to evaluate it. It is obvious that the query cannot be executed inside the main memory, so the system thrashes through paging. Still, this behaviour does not appear for the rest of the queries, as they are computed within one or two seconds. In all cases, the results are correct. The specifications of my system are: Processor: Intel Core 2 Duo @ 1.86 Mhz RAM: 2 Gb Operating System: Linux Ubuntu Gutsy 7.10 MonetDB version: Mserver5 (recently downloaded from your site). Can you give me some indications about why this query takes too long and how can I bypass this response delay? Thanks.
On Mon, Dec 17, 2007 at 11:56:14AM +0000, Konstantinos Krikellas wrote:
I am using MonetDB for benchmarking on the TPC-H workload. I have successfully run queries 1, 3 and 10. However, when I try to run query 5:
For this query to be quick you need all keys, ie primary and foreign. Does you tpch-h schema include those? Niels
select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' year group by n_name order by revenue desc; it takes about two (2) hours to evaluate it. It is obvious that the query cannot be executed inside the main memory, so the system thrashes through paging. Still, this behaviour does not appear for the rest of the queries, as they are computed within one or two seconds. In all cases, the results are correct. The specifications of my system are: Processor: Intel Core 2 Duo @ 1.86 Mhz RAM: 2 Gb Operating System: Linux Ubuntu Gutsy 7.10 MonetDB version: Mserver5 (recently downloaded from your site). Can you give me some indications about why this query takes too long and how can I bypass this response delay? Thanks.
------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
For this query to be quick you need all keys, ie primary and foreign. Does you tpch-h schema include those?
Niels
Yes, I have correctly set up the database schema. The population parameter was set to 1 during table initialization. The strange thing is that the other tested queries are executed as expected. i suspect that this is an optimization problem, as if we look at the query: select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' year group by n_name order by revenue desc; , there is a chain among the joined tables, as supplier joins with nation and lineitem, while lineitem joins with customer and customer joins with supplier again. So, if the firstly executed join is customer with supplier on nationkey, the result will be much bigger than the main memory capacity. Still, the selection predicates on tables orders make joining first orders with lineitem and then the result with customer a more reasonable choice, with much smaller needs in memory. I have also attached the EXPLAIN output. Could that be the case? Thanks.
On Tue, Dec 18, 2007 at 12:09:13PM +0000, Konstantinos Krikellas wrote:
For this query to be quick you need all keys, ie primary and foreign. Does you tpch-h schema include those?
Niels
Yes, I have correctly set up the database schema. The population parameter was set to 1 during table initialization. The strange thing is that the other tested queries are executed as expected. i suspect that this is an optimization problem, as
I did the same explain and found 7 idx's used. You only have 4, ie I don't think you have all foreign keys defined. See sql/src/benchmark/tpch/c.sql-dec-primary-foreign (also included); Niels START TRANSACTION; CREATE TABLE region (r_regionkey INT NOT NULL, r_name VARCHAR(25) NOT NULL, r_comment VARCHAR(152) NOT NULL, PRIMARY KEY (r_regionkey)) ; CREATE TABLE nation (n_nationkey INT NOT NULL, n_name VARCHAR(25) NOT NULL, n_regionkey INT NOT NULL, n_comment VARCHAR(152) NOT NULL, PRIMARY KEY (n_nationkey), FOREIGN KEY (n_regionkey) REFERENCES region (r_regionkey)) ; CREATE TABLE supplier (s_suppkey INT NOT NULL, s_name VARCHAR(25) NOT NULL, s_address VARCHAR(40) NOT NULL, s_nationkey INT NOT NULL, s_phone VARCHAR(15) NOT NULL, s_acctbal DECIMAL(15,2) NOT NULL, s_comment VARCHAR(101) NOT NULL, PRIMARY KEY (s_suppkey), FOREIGN KEY (s_nationkey) REFERENCES nation (n_nationkey)) ; CREATE TABLE customer (c_custkey INT NOT NULL, c_name VARCHAR(25) NOT NULL, c_address VARCHAR(40) NOT NULL, c_nationkey INT NOT NULL, c_phone VARCHAR(15) NOT NULL, c_acctbal DECIMAL(15,2) NOT NULL, c_mktsegment VARCHAR(10) NOT NULL, c_comment VARCHAR(117) NOT NULL, PRIMARY KEY (c_custkey), FOREIGN KEY (c_nationkey) REFERENCES nation (n_nationkey)) ; CREATE TABLE part (p_partkey INT NOT NULL, p_name VARCHAR(55) NOT NULL, p_mfgr VARCHAR(25) NOT NULL, p_brand VARCHAR(10) NOT NULL, p_type VARCHAR(25) NOT NULL, p_size INT NOT NULL, p_container VARCHAR(10) NOT NULL, p_retailprice DECIMAL(15,2) NOT NULL, p_comment VARCHAR(23) NOT NULL, PRIMARY KEY (p_partkey)) ; CREATE TABLE partsupp (ps_partkey INT NOT NULL, ps_suppkey INT NOT NULL, ps_availqty INT NOT NULL, ps_supplycost DECIMAL(15,2) NOT NULL, ps_comment VARCHAR(199) NOT NULL, PRIMARY KEY (ps_partkey, ps_suppkey), FOREIGN KEY (ps_partkey) REFERENCES part (p_partkey), FOREIGN KEY (ps_suppkey) REFERENCES supplier (s_suppkey)) ; CREATE TABLE orders (o_orderkey INT NOT NULL, o_custkey INT NOT NULL, o_orderstatus VARCHAR(1) NOT NULL, o_totalprice DECIMAL(15,2) NOT NULL, o_orderdate DATE NOT NULL, o_orderpriority VARCHAR(15) NOT NULL, o_clerk VARCHAR(15) NOT NULL, o_shippriority INT NOT NULL, o_comment VARCHAR(79) NOT NULL, PRIMARY KEY (o_orderkey), FOREIGN KEY (o_custkey) REFERENCES customer (c_custkey)) ; CREATE TABLE lineitem (l_orderkey INT NOT NULL, l_partkey INT NOT NULL, l_suppkey INT NOT NULL, l_linenumber INT NOT NULL, l_quantity INTEGER NOT NULL, l_extendedprice DECIMAL(15,2) NOT NULL, l_discount DECIMAL(15,2) NOT NULL, l_tax DECIMAL(15,2) NOT NULL, l_returnflag VARCHAR(1) NOT NULL, l_linestatus VARCHAR(1) NOT NULL, l_shipdate DATE NOT NULL, l_commitdate DATE NOT NULL, l_receiptdate DATE NOT NULL, l_shipinstruct VARCHAR(25) NOT NULL, l_shipmode VARCHAR(10) NOT NULL, l_comment VARCHAR(44) NOT NULL, PRIMARY KEY (l_orderkey,l_linenumber), FOREIGN KEY (l_orderkey) REFERENCES orders (o_orderkey), FOREIGN KEY (l_partkey) REFERENCES part (p_partkey), FOREIGN KEY (l_suppkey) REFERENCES supplier (s_suppkey), FOREIGN KEY (l_partkey,l_suppkey) REFERENCES partsupp (ps_partkey,ps_suppkey)) ; COMMIT;
if we look at the query: select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' year group by n_name order by revenue desc; , there is a chain among the joined tables, as supplier joins with nation and lineitem, while lineitem joins with customer and customer joins with supplier again. So, if the firstly executed join is customer with supplier on nationkey, the result will be much bigger than the main memory capacity. Still, the selection predicates on tables orders make joining first orders with lineitem and then the result with customer a more reasonable choice, with much smaller needs in memory. I have also attached the EXPLAIN output. Could that be the case? Thanks.
function user.s0_1():void; _1:bat[:oid,:int]{rows=6001215:lng,bid=5468} := sql.bind("sys","lineitem","l_orderkey",0); _6:bat[:oid,:int]{rows=0:lng,bid=9790} := sql.bind("sys","lineitem","l_orderkey",1); constraints.emptySet(_6); _6:bat[:oid,:int]{rows=0:lng,bid=9790} := nil; _8:bat[:oid,:int]{rows=0:lng,bid=9791} := sql.bind("sys","lineitem","l_orderkey",2); constraints.emptySet(_8); _8:bat[:oid,:int]{rows=0:lng,bid=9791} := nil; _10:bat[:oid,:int]{rows=1500000:lng,bid=5374} := sql.bind("sys","orders","o_orderkey",0); _13:bat[:oid,:int]{rows=0:lng,bid=9765} := sql.bind("sys","orders","o_orderkey",1); constraints.emptySet(_13); _13:bat[:oid,:int]{rows=0:lng,bid=9765} := nil; _14:bat[:oid,:int]{rows=0:lng,bid=9766} := sql.bind("sys","orders","o_orderkey",2); constraints.emptySet(_14); _14:bat[:oid,:int]{rows=0:lng,bid=9766} := nil; _15:bat[:oid,:date]{notnil=true,rows=1500000:lng,bid=5371} := sql.bind("sys","orders","o_orderdate",0); _25:bat[:oid,:oid]{rows=1500000:lng,bid=5485} := sql.bind_idxbat("sys","orders","orders_customer_fk",0); _38:bat[:oid,:int]{notnil=true,rows=150000:lng,bid=5205} := sql.bind("sys","customer","c_nationkey",0); _41:bat[:oid,:int]{notnil=true,rows=0:lng,bid=9748} := sql.bind("sys","customer","c_nationkey",1); constraints.emptySet(_41); _41:bat[:oid,:int]{notnil=true,rows=0:lng,bid=9748} := nil; _42:bat[:oid,:int]{notnil=true,rows=0:lng,bid=9749} := sql.bind("sys","customer","c_nationkey",2); constraints.emptySet(_42); _42:bat[:oid,:int]{notnil=true,rows=0:lng,bid=9749} := nil; _43:bat[:oid,:int]{notnil=true,rows=10000:lng,bid=5572} := sql.bind("sys","supplier","s_nationkey",0); _46:bat[:oid,:int]{notnil=true,rows=0:lng,bid=9729} := sql.bind("sys","supplier","s_nationkey",1); constraints.emptySet(_46); _46:bat[:oid,:int]{notnil=true,rows=0:lng,bid=9729} := nil; _47:bat[:oid,:int]{notnil=true,rows=0:lng,bid=9730} := sql.bind("sys","supplier","s_nationkey",2); constraints.emptySet(_47); _47:bat[:oid,:int]{notnil=true,rows=0:lng,bid=9730} := nil; _54:bat[:oid,:oid]{rows=10000:lng,bid=5022} := sql.bind_idxbat("sys","supplier","supplier_nation_fk",0); _64:bat[:oid,:oid]{rows=6001215:lng,bid=5566} := sql.bind_idxbat("sys","lineitem","lineitem_supplier_fk",0); _87:bat[:oid,:oid]{rows=25:lng,bid=5150} := sql.bind_idxbat("sys","nation","nation_region_fk",0); _92:bat[:oid,:str]{notnil=true,rows=5:lng,bid=5334} := sql.bind("sys","region","r_name",0); _97:bat[:oid,:str]{notnil=true,rows=0:lng,bid=9678} := sql.bind("sys","region","r_name",1); constraints.emptySet(_97); _97:bat[:oid,:str]{notnil=true,rows=0:lng,bid=9678} := nil; _98:bat[:oid,:str]{notnil=true,rows=0:lng,bid=9679} := sql.bind("sys","region","r_name",2); constraints.emptySet(_98); _98:bat[:oid,:str]{notnil=true,rows=0:lng,bid=9679} := nil; _104:bat[:oid,:lng]{notnil=true,rows=6001215:lng,bid=5609} := sql.bind("sys","lineitem","l_extendedprice",0); _108:bat[:oid,:lng]{notnil=true,rows=6001215:lng,bid=5461} := sql.bind("sys","lineitem","l_discount",0); _113:bat[:oid,:str]{notnil=true,rows=25:lng,bid=5332} := sql.bind("sys","nation","n_name",0); _115:bat[:oid,:str]{notnil=true,rows=0:lng,bid=9665} := sql.bind("sys","nation","n_name",1); constraints.emptySet(_115); _115:bat[:oid,:str]{notnil=true,rows=0:lng,bid=9665} := nil; _116:bat[:oid,:str]{notnil=true,rows=0:lng,bid=9666} := sql.bind("sys","nation","n_name",2); constraints.emptySet(_116); _116:bat[:oid,:str]{notnil=true,rows=0:lng,bid=9666} := nil; _19 := mtime.addmonths(1994-01-01,12); _20{rows=750001:lng} := algebra.uselect(_15,1994-01-01,_19,true,false); _15:bat[:oid,:date]{notnil=true,rows=1500000:lng,bid=5371} := nil; _23{rows=750001:lng} := algebra.semijoin(_10,_20); _10:bat[:oid,:int]{rows=1500000:lng,bid=5374} := nil; _24{rows=750001:lng} := bat.reverse(_23); _23{rows=750001:lng} := nil; _27{rows=750001:lng} := algebra.semijoin(_25,_20); _25:bat[:oid,:oid]{rows=1500000:lng,bid=5485} := nil; _20{rows=750001:lng} := nil; _29{rows=750001:lng} := algebra.markT(_27,0@0); _30{rows=750001:lng} := algebra.joinPath(_1,_24,_29); _1:bat[:oid,:int]{rows=6001215:lng,bid=5468} := nil; _24{rows=750001:lng} := nil; _29{rows=750001:lng} := nil; _31{rows=750001:lng} := bat.reverse(_30); _33{rows=750001:lng} := algebra.markT(_31,1@0); _31{rows=750001:lng} := nil; _34{rows=750001:lng} := bat.reverse(_33); _33{rows=750001:lng} := nil; _35{rows=750001:lng} := bat.reverse(_27); _27{rows=750001:lng} := nil; _36{rows=750001:lng} := algebra.markT(_35,0@0); _35{rows=750001:lng} := nil; _37{rows=750001:lng} := bat.reverse(_36); _36{rows=750001:lng} := nil; _48{rows=10000:lng} := bat.reverse(_43); _43:bat[:oid,:int]{notnil=true,rows=10000:lng,bid=5572} := nil; _49{rows=10000:lng} := algebra.joinPath(_34,_37,_38,_48); _34{rows=750001:lng} := nil; _37{rows=750001:lng} := nil; _38:bat[:oid,:int]{notnil=true,rows=150000:lng,bid=5205} := nil; _48{rows=10000:lng} := nil; _50{rows=10000:lng} := bat.reverse(_49); _52{rows=10000:lng} := algebra.markT(_50,2@0); _50{rows=10000:lng} := nil; _53{rows=10000:lng} := bat.reverse(_52); _52{rows=10000:lng} := nil; _56{rows=10000:lng} := algebra.join(_53,_54); _54:bat[:oid,:oid]{rows=10000:lng,bid=5022} := nil; _58{rows=10000:lng} := algebra.markT(_56,3@0); _59{rows=10000:lng} := bat.reverse(_58); _58{rows=10000:lng} := nil; _60{rows=10000:lng} := algebra.markT(_49,2@0); _49{rows=10000:lng} := nil; _61{rows=10000:lng} := bat.reverse(_60); _60{rows=10000:lng} := nil; _62{rows=750001:lng} := algebra.markT(_30,1@0); _30{rows=750001:lng} := nil; _63{rows=750001:lng} := bat.reverse(_62); _62{rows=750001:lng} := nil; _66{rows=10000:lng} := algebra.joinPath(_59,_61,_63,_64); _67{rows=10000:lng} := algebra.join(_59,_53); _68{rows=10000:lng} := bat.mirror(_66); _66{rows=10000:lng} := nil; _69{rows=10000:lng} := bat.mirror(_67); _67{rows=10000:lng} := nil; _70{rows=10000:lng} := algebra.join(_68,_69); _68{rows=10000:lng} := nil; _69{rows=10000:lng} := nil; _71{rows=10000:lng} := algebra.markH(_70,0@0); _72{rows=10000:lng} := algebra.markT(_70,0@0); _70{rows=10000:lng} := nil; _73{rows=10000:lng} := bat.reverse(_72); _72{rows=10000:lng} := nil; _74{rows=10000:lng} := algebra.joinPath(_73,_59,_61,_63,_64); _73{rows=10000:lng} := nil; _64:bat[:oid,:oid]{rows=6001215:lng,bid=5566} := nil; _75{rows=10000:lng} := algebra.joinPath(_71,_59,_53); _53{rows=10000:lng} := nil; _76:bat[:oid,:bit]{rows=10000:lng} := batcalc.==(_74,_75); _74{rows=10000:lng} := nil; _75{rows=10000:lng} := nil; _77{rows=5001:lng} := algebra.uselect(_76,true); _76:bat[:oid,:bit]{rows=10000:lng} := nil; _78{rows=5001:lng} := bat.reverse(_77); _77{rows=5001:lng} := nil; _79{rows=5001:lng} := algebra.join(_78,_71); _78{rows=5001:lng} := nil; _71{rows=10000:lng} := nil; _80{rows=5001:lng} := bat.reverse(_79); _79{rows=5001:lng} := nil; _82{rows=5001:lng} := algebra.markT(_80,4@0); _80{rows=5001:lng} := nil; _83{rows=5001:lng} := bat.reverse(_82); _82{rows=5001:lng} := nil; _84{rows=10000:lng} := bat.reverse(_56); _56{rows=10000:lng} := nil; _85{rows=10000:lng} := algebra.markT(_84,3@0); _84{rows=10000:lng} := nil; _86{rows=10000:lng} := bat.reverse(_85); _85{rows=10000:lng} := nil; _90{rows=25:lng} := algebra.joinPath(_83,_86,_87); _87:bat[:oid,:oid]{rows=25:lng,bid=5150} := nil; _91{rows=25:lng} := bat.reverse(_90); _90{rows=25:lng} := nil; _96{rows=5:lng} := algebra.uselect(_92,"ASIA"); _92:bat[:oid,:str]{notnil=true,rows=5:lng,bid=5334} := nil; _99{rows=5:lng} := algebra.semijoin(_91,_96); _91{rows=25:lng} := nil; _96{rows=5:lng} := nil; _100{rows=5:lng} := bat.reverse(_99); _99{rows=5:lng} := nil; _102{rows=5:lng} := algebra.markT(_100,5@0); _100{rows=5:lng} := nil; _103{rows=5:lng} := bat.reverse(_102); _102{rows=5:lng} := nil; _106{rows=5:lng} := algebra.joinPath(_103,_83,_59,_61,_63,_104); _104:bat[:oid,:lng]{notnil=true,rows=6001215:lng,bid=5609} := nil; _110{rows=5:lng} := algebra.joinPath(_103,_83,_59,_61,_63,_108); _59{rows=10000:lng} := nil; _61{rows=10000:lng} := nil; _63{rows=750001:lng} := nil; _108:bat[:oid,:lng]{notnil=true,rows=6001215:lng,bid=5461} := nil; _152 := algebra.reuse(_110); _111:bat[:oid,:lng]{rows=5:lng} := batcalc.-(_152,100,_110); _152 := nil; _110{rows=5:lng} := nil; _111:bat[:oid,:lng]{rows=5:lng} := batcalc.*(_111,_106,_111); _106{rows=5:lng} := nil; _112 := algebra.selectNotNil(_111); _111:bat[:oid,:lng]{rows=5:lng} := nil; _117{rows=5:lng} := algebra.joinPath(_103,_83,_86,_113); (ext274,grp272):= group.new(_117); _117{rows=5:lng} := nil; _120 := bat.mirror(ext274); ext274 := nil; _121:bat[:oid,:lng] := aggr.sum(_112,grp272,_120); _112 := nil; grp272 := nil; _122 := algebra.sortTail(_121); _123 := algebra.sortReverseTail(_122); _122 := nil; _124 := algebra.joinPath(_120,_103,_83,_86,_113); _120 := nil; _103{rows=5:lng} := nil; _83{rows=5001:lng} := nil; _86{rows=10000:lng} := nil; _113:bat[:oid,:str]{notnil=true,rows=25:lng,bid=5332} := nil; _125 := sql.resultSet(2,1,_123); _123 := nil; sql.rsColumn(_125,"sys.nation","n_name","char",25,0,_124); _124 := nil; sql.rsColumn(_125,"sys.","revenue","decimal",19,4,_121); _121:bat[:oid,:lng] := nil; sql.exportResult(_125,""); end s0_1;
------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
participants (2)
-
Konstantinos Krikellas
-
Niels Nes