
The copy into query finished in 317 minutes.
The volume we use is 63% full. However the copy-into query does not start
writing in the first few hours (I don't see the file getting created).
In fact, I see almost no disk usage, and the query is only using 1 or 2
cores out of 32 (nthreads=32).
I'll try to run a trace of the copy-into as well but that will take a while.
I'm using the default_pipe in the dec2011 release. I don't think I've
actually changed any of the default settings.
Trace and explain of the count distinct query:
sql>explain select count(distinct uuid) from f_sorted where yyyymm='201110';
+-------------------------------------------------------------------------------------+
| mal
|
+=====================================================================================+
| function user.s7_1{autoCommit=true}():void;
|
| X_2 := sql.mvc();
|
| X_23:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0);
|
| X_25 := algebra.uselect(X_23,"201110":str);
|
| X_21:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2);
|
| X_26 := algebra.kdifference(X_25,X_21);
|
| X_22 := algebra.uselect(X_21,"201110":str);
|
| X_27 := algebra.kunion(X_26,X_22);
|
| X_19:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1);
|
| X_20 := algebra.uselect(X_19,"201110":str);
|
| X_28 := algebra.kunion(X_27,X_20);
|
| X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1);
|
| X_17 := bat.reverse(X_16);
|
| X_29 := algebra.kdifference(X_28,X_17);
|
| X_30 := algebra.markT(X_29,0@0:oid);
|
| X_31 := bat.reverse(X_30);
|
| X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); |
| X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); |
| X_12 := algebra.kdifference(X_10,X_8);
|
| X_13 := algebra.kunion(X_12,X_8);
|
| X_4:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); |
| X_14 := algebra.kunion(X_13,X_4);
|
| X_32 := algebra.leftjoin(X_31,X_14);
|
| X_33 := algebra.selectNotNil(X_32);
|
| X_34 := bat.reverse(X_33);
|
| X_35 := algebra.kunique(X_34);
|
| X_36 := bat.reverse(X_35);
|
| X_37 := aggr.count(X_36);
|
| sql.exportValue(1,"sys.f_sorted","L7":str,"wrd",64,0,6,X_37,"");
|
| end s7_1;
|
+-------------------------------------------------------------------------------------+
30 tuples (1.278ms)
sql>trace select count(distinct uuid) from f_sorted where yyyymm='201110';
+---------+
| L10 |
+=========+
| 5941581 |
+---------+
1 tuple (27.3s)
+----------+--------------------------------------------------------------------------------------------------------------+
| ticks | stmt
|
+==========+==============================================================================================================+
| 6 | X_3 := sql.mvc();
|
| 12 | X_23:bat[:oid,:str] =
Could you possibly also share the EXPLAIN, or even TRACE of your count(distinct uuid) query with us?
Thanks!
Stefan
On Thu, Feb 09, 2012 at 05:04:51PM -0800, zack_pcd wrote:
I am seeing the following: select count(*) from f_sorted; +-----------+ | L7 | +===========+ | 102612097 | +-----------+ 1 tuple (0.988ms)
select count(*) from f_sorted where yyyymm='201110'; +----------+ | L10 | +==========+ | 23505343 | +----------+ 1 tuple (1.0s)
select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L6 | +=========+ | 5941581 | +---------+ 1 tuple (23.8s)
Very fast thus far! However:
copy (select distinct(uuid) from f_sorted where yyyymm='201110') into '/home/user/filename' USING DELIMITERS ',','\n','"' NULL as 'null';
This has been running for hours and has yet to start writing to the file.
I get this when I explain the query: +----------------------------------------------------------------------------------------+ | mal | +========================================================================================+ | function user.s5_3{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_22:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_23 := algebra.uselect(X_22,"201110":str); | | X_20:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_24 := algebra.kdifference(X_23,X_20); | | X_21 := algebra.uselect(X_20,"201110":str); | | X_25 := algebra.kunion(X_24,X_21); | | X_18:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_19 := algebra.uselect(X_18,"201110":str); | | X_26 := algebra.kunion(X_25,X_19); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_27 := algebra.kdifference(X_26,X_17); | | X_28 := algebra.markT(X_27,0@0:oid); | | X_29 := bat.reverse(X_28); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_5:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_5); | | X_30 := algebra.leftjoin(X_29,X_14); | | (ext44,grp42) := group.done(X_30); | | X_33 := bat.mirror(ext44); | | X_34 := algebra.leftjoin(X_33,X_30); | | X_35 := sql.resultSet(1,1,X_34); | | sql.rsColumn(X_35,"sys.f_sorted","uuid":str,"varchar",2048,0,X_34); | | X_43 := io.stdout(); | | sql.exportResult(X_43,X_35); | | end s5_3; | +----------------------------------------------------------------------------------------+ 31 tuples (1.350ms)
Is it doing something wrong? I don't expect the query to take so long (hours) to produce the output, especially since the distinct count is also very fast.
The copy-into *seems* to be irrelevant to the problem, but I decided to include it just in case, as that's how I am using the query. -- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3329757... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3330212... Sent from the monetdb-users mailing list archive at Nabble.com.