
On Fri, Feb 10, 2012 at 09:51:45AM -0800, zack_pcd wrote:
The copy into query finished in 317 minutes. The volume we use is 63% full. However the copy-into query does not start writing in the first few hours (I don't see the file getting created). In fact, I see almost no disk usage, and the query is only using 1 or 2 cores out of 32 (nthreads=32). I'll try to run a trace of the copy-into as well but that will take a while. I'm using the default_pipe in the dec2011 release. I don't think I've actually changed any of the default settings.
Trace and explain of the count distinct query:
Could you also send the plan and explain of the copy into statements? Niels
sql>explain select count(distinct uuid) from f_sorted where yyyymm='201110'; +-------------------------------------------------------------------------------------+ | mal | +=====================================================================================+ | function user.s7_1{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_23:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_25 := algebra.uselect(X_23,"201110":str); | | X_21:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_26 := algebra.kdifference(X_25,X_21); | | X_22 := algebra.uselect(X_21,"201110":str); | | X_27 := algebra.kunion(X_26,X_22); | | X_19:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_20 := algebra.uselect(X_19,"201110":str); | | X_28 := algebra.kunion(X_27,X_20); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_29 := algebra.kdifference(X_28,X_17); | | X_30 := algebra.markT(X_29,0@0:oid); | | X_31 := bat.reverse(X_30); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_4:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_4); | | X_32 := algebra.leftjoin(X_31,X_14); | | X_33 := algebra.selectNotNil(X_32); | | X_34 := bat.reverse(X_33); | | X_35 := algebra.kunique(X_34); | | X_36 := bat.reverse(X_35); | | X_37 := aggr.count(X_36); | | sql.exportValue(1,"sys.f_sorted","L7":str,"wrd",64,0,6,X_37,""); | | end s7_1; | +-------------------------------------------------------------------------------------+ 30 tuples (1.278ms) sql>trace select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L10 | +=========+ | 5941581 | +---------+ 1 tuple (27.3s) +----------+--------------------------------------------------------------------------------------------------------------+ | ticks | stmt | +==========+==============================================================================================================+ | 6 | X_3 := sql.mvc(); | | 12 | X_23:bat[:oid,:str] =
[102612097] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",0); | | 2497718 | X_24= [23505343] := algebra.uselect(X_23=nil:bat[:oid,:str],A0="201110"); | | 21 | X_21:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",2); | | 28 | X_25= [23505343] := algebra.kdifference(X_24=nil,X_21= :bat[:oid,:str][0]); | | 29 | X_22= [0] := algebra.uselect(X_21=nil:bat[:oid,:str],A0="201110"); | | 10 | X_26= [23505343] := algebra.kunion(X_25=nil,X_22=nil); | | 5 | X_19:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",1); | | 11 | X_20= [0] := algebra.uselect(X_19=nil:bat[:oid,:str],A0="201110"); | | 9 | X_27= [23505343] := algebra.kunion(X_26=nil,X_20=nil); | | 6 | X_17:bat[:oid,:oid] = [0] := sql.bind_dbat(X_3=0,"sys","f_sorted",1); | | 4 | X_18= [0] := bat.reverse(X_17=nil:bat[:oid,:oid]); | | 5 | X_28= [23505343] := algebra.kdifference(X_27=nil,X_18=nil); | | 7 | X_29= [23505343] := algebra.markT(X_28=nil,0@0:oid); | | 3 | X_30= [23505343] := bat.reverse(X_29=nil); | | 12 | X_11:bat[:oid,:str] = [102612097] := sql.bind(X_3=0,"sys","f_sorted","uuid",0); | | 3 | X_9:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","uuid",2); | | 9 | X_13= [102612097] := algebra.kdifference(X_11=nil:bat[:oid,:str],X_9= :bat[:oid,:str][0]); | | 9 | X_14= [102612097] := algebra.kunion(X_13=nil,X_9=nil:bat[:oid,:str]); | | 4 | X_5:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","uuid",1); | | 7 | X_15= [102612097] := algebra.kunion(X_14=nil,X_5=nil:bat[:oid,:str]); | | 885318 | X_31= [23505343] := algebra.leftjoin(X_30=nil,X_15=nil); | | 12 | X_32= [23505343] := algebra.selectNotNil(X_31=nil); | | 4 | X_33= [23505343] := bat.reverse(X_32=nil); | | 23833624 | X_34= [5941581] := algebra.kunique(X_33=nil); | | 14 | X_35= [5941581] := bat.reverse(X_34=nil); | | 87995 | X_36 := aggr.count(X_35=nil); | | 25 | sql.exportValue(1,"sys.f_sorted","L10":str,"wrd",64,0,6,X_36=5941581:wrd,""); | | 27306480 | X_5:void := user.s8_1("201110"); | +----------+--------------------------------------------------------------------------------------------------------------+ 29 tuples (27.3s) Stefan Manegold wrote:
Could you possibly also share the EXPLAIN, or even TRACE of your count(distinct uuid) query with us?
Thanks!
Stefan
On Thu, Feb 09, 2012 at 05:04:51PM -0800, zack_pcd wrote:
I am seeing the following: select count(*) from f_sorted; +-----------+ | L7 | +===========+ | 102612097 | +-----------+ 1 tuple (0.988ms)
select count(*) from f_sorted where yyyymm='201110'; +----------+ | L10 | +==========+ | 23505343 | +----------+ 1 tuple (1.0s)
select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L6 | +=========+ | 5941581 | +---------+ 1 tuple (23.8s)
Very fast thus far! However:
copy (select distinct(uuid) from f_sorted where yyyymm='201110') into '/home/user/filename' USING DELIMITERS ',','\n','"' NULL as 'null';
This has been running for hours and has yet to start writing to the file.
I get this when I explain the query: +----------------------------------------------------------------------------------------+ | mal | +========================================================================================+ | function user.s5_3{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_22:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_23 := algebra.uselect(X_22,"201110":str); | | X_20:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_24 := algebra.kdifference(X_23,X_20); | | X_21 := algebra.uselect(X_20,"201110":str); | | X_25 := algebra.kunion(X_24,X_21); | | X_18:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_19 := algebra.uselect(X_18,"201110":str); | | X_26 := algebra.kunion(X_25,X_19); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_27 := algebra.kdifference(X_26,X_17); | | X_28 := algebra.markT(X_27,0@0:oid); | | X_29 := bat.reverse(X_28); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_5:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_5); | | X_30 := algebra.leftjoin(X_29,X_14); | | (ext44,grp42) := group.done(X_30); | | X_33 := bat.mirror(ext44); | | X_34 := algebra.leftjoin(X_33,X_30); | | X_35 := sql.resultSet(1,1,X_34); | | sql.rsColumn(X_35,"sys.f_sorted","uuid":str,"varchar",2048,0,X_34); | | X_43 := io.stdout(); | | sql.exportResult(X_43,X_35); | | end s5_3; | +----------------------------------------------------------------------------------------+ 31 tuples (1.350ms)
Is it doing something wrong? I don't expect the query to take so long (hours) to produce the output, especially since the distinct count is also very fast.
The copy-into *seems* to be irrelevant to the problem, but I decided to include it just in case, as that's how I am using the query. -- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3329757... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3330212... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl