[MonetDB-users] select distinct ... where algorithm

I am seeing the following: select count(*) from f_sorted; +-----------+ | L7 | +===========+ | 102612097 | +-----------+ 1 tuple (0.988ms) select count(*) from f_sorted where yyyymm='201110'; +----------+ | L10 | +==========+ | 23505343 | +----------+ 1 tuple (1.0s) select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L6 | +=========+ | 5941581 | +---------+ 1 tuple (23.8s) Very fast thus far! However: copy (select distinct(uuid) from f_sorted where yyyymm='201110') into '/home/user/filename' USING DELIMITERS ',','\n','"' NULL as 'null'; This has been running for hours and has yet to start writing to the file. I get this when I explain the query: +----------------------------------------------------------------------------------------+ | mal | +========================================================================================+ | function user.s5_3{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_22:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_23 := algebra.uselect(X_22,"201110":str); | | X_20:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_24 := algebra.kdifference(X_23,X_20); | | X_21 := algebra.uselect(X_20,"201110":str); | | X_25 := algebra.kunion(X_24,X_21); | | X_18:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_19 := algebra.uselect(X_18,"201110":str); | | X_26 := algebra.kunion(X_25,X_19); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_27 := algebra.kdifference(X_26,X_17); | | X_28 := algebra.markT(X_27,0@0:oid); | | X_29 := bat.reverse(X_28); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_5:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_5); | | X_30 := algebra.leftjoin(X_29,X_14); | | (ext44,grp42) := group.done(X_30); | | X_33 := bat.mirror(ext44); | | X_34 := algebra.leftjoin(X_33,X_30); | | X_35 := sql.resultSet(1,1,X_34); | | sql.rsColumn(X_35,"sys.f_sorted","uuid":str,"varchar",2048,0,X_34); | | X_43 := io.stdout(); | | sql.exportResult(X_43,X_35); | | end s5_3; | +----------------------------------------------------------------------------------------+ 31 tuples (1.350ms) Is it doing something wrong? I don't expect the query to take so long (hours) to produce the output, especially since the distinct count is also very fast. The copy-into *seems* to be irrelevant to the problem, but I decided to include it just in case, as that's how I am using the query. -- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3329757... Sent from the monetdb-users mailing list archive at Nabble.com.

select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L6 | +=========+ | 5941581 | +---------+ 1 tuple (23.8s)
Very fast thus far! However:
copy (select distinct(uuid) from f_sorted where yyyymm='201110') into '/home/user/filename' USING DELIMITERS ',','\n','"' NULL as 'null';
This has been running for hours and has yet to start writing to the file. Indeed, it should finish quickly. This indicates that there other issues involved in your environment, e.g. competition of other resource hungry
processors. Or, the (dbfarm) disk on which you want to write is (almost) full.

Could you possibly also share the EXPLAIN, or even TRACE of your count(distinct uuid) query with us? Thanks! Stefan On Thu, Feb 09, 2012 at 05:04:51PM -0800, zack_pcd wrote:
I am seeing the following: select count(*) from f_sorted; +-----------+ | L7 | +===========+ | 102612097 | +-----------+ 1 tuple (0.988ms)
select count(*) from f_sorted where yyyymm='201110'; +----------+ | L10 | +==========+ | 23505343 | +----------+ 1 tuple (1.0s)
select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L6 | +=========+ | 5941581 | +---------+ 1 tuple (23.8s)
Very fast thus far! However:
copy (select distinct(uuid) from f_sorted where yyyymm='201110') into '/home/user/filename' USING DELIMITERS ',','\n','"' NULL as 'null';
This has been running for hours and has yet to start writing to the file.
I get this when I explain the query: +----------------------------------------------------------------------------------------+ | mal | +========================================================================================+ | function user.s5_3{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_22:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_23 := algebra.uselect(X_22,"201110":str); | | X_20:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_24 := algebra.kdifference(X_23,X_20); | | X_21 := algebra.uselect(X_20,"201110":str); | | X_25 := algebra.kunion(X_24,X_21); | | X_18:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_19 := algebra.uselect(X_18,"201110":str); | | X_26 := algebra.kunion(X_25,X_19); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_27 := algebra.kdifference(X_26,X_17); | | X_28 := algebra.markT(X_27,0@0:oid); | | X_29 := bat.reverse(X_28); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_5:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_5); | | X_30 := algebra.leftjoin(X_29,X_14); | | (ext44,grp42) := group.done(X_30); | | X_33 := bat.mirror(ext44); | | X_34 := algebra.leftjoin(X_33,X_30); | | X_35 := sql.resultSet(1,1,X_34); | | sql.rsColumn(X_35,"sys.f_sorted","uuid":str,"varchar",2048,0,X_34); | | X_43 := io.stdout(); | | sql.exportResult(X_43,X_35); | | end s5_3; | +----------------------------------------------------------------------------------------+ 31 tuples (1.350ms)
Is it doing something wrong? I don't expect the query to take so long (hours) to produce the output, especially since the distinct count is also very fast.
The copy-into *seems* to be irrelevant to the problem, but I decided to include it just in case, as that's how I am using the query. -- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3329757... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

The copy into query finished in 317 minutes.
The volume we use is 63% full. However the copy-into query does not start
writing in the first few hours (I don't see the file getting created).
In fact, I see almost no disk usage, and the query is only using 1 or 2
cores out of 32 (nthreads=32).
I'll try to run a trace of the copy-into as well but that will take a while.
I'm using the default_pipe in the dec2011 release. I don't think I've
actually changed any of the default settings.
Trace and explain of the count distinct query:
sql>explain select count(distinct uuid) from f_sorted where yyyymm='201110';
+-------------------------------------------------------------------------------------+
| mal
|
+=====================================================================================+
| function user.s7_1{autoCommit=true}():void;
|
| X_2 := sql.mvc();
|
| X_23:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0);
|
| X_25 := algebra.uselect(X_23,"201110":str);
|
| X_21:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2);
|
| X_26 := algebra.kdifference(X_25,X_21);
|
| X_22 := algebra.uselect(X_21,"201110":str);
|
| X_27 := algebra.kunion(X_26,X_22);
|
| X_19:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1);
|
| X_20 := algebra.uselect(X_19,"201110":str);
|
| X_28 := algebra.kunion(X_27,X_20);
|
| X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1);
|
| X_17 := bat.reverse(X_16);
|
| X_29 := algebra.kdifference(X_28,X_17);
|
| X_30 := algebra.markT(X_29,0@0:oid);
|
| X_31 := bat.reverse(X_30);
|
| X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); |
| X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); |
| X_12 := algebra.kdifference(X_10,X_8);
|
| X_13 := algebra.kunion(X_12,X_8);
|
| X_4:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); |
| X_14 := algebra.kunion(X_13,X_4);
|
| X_32 := algebra.leftjoin(X_31,X_14);
|
| X_33 := algebra.selectNotNil(X_32);
|
| X_34 := bat.reverse(X_33);
|
| X_35 := algebra.kunique(X_34);
|
| X_36 := bat.reverse(X_35);
|
| X_37 := aggr.count(X_36);
|
| sql.exportValue(1,"sys.f_sorted","L7":str,"wrd",64,0,6,X_37,"");
|
| end s7_1;
|
+-------------------------------------------------------------------------------------+
30 tuples (1.278ms)
sql>trace select count(distinct uuid) from f_sorted where yyyymm='201110';
+---------+
| L10 |
+=========+
| 5941581 |
+---------+
1 tuple (27.3s)
+----------+--------------------------------------------------------------------------------------------------------------+
| ticks | stmt
|
+==========+==============================================================================================================+
| 6 | X_3 := sql.mvc();
|
| 12 | X_23:bat[:oid,:str] =
Could you possibly also share the EXPLAIN, or even TRACE of your count(distinct uuid) query with us?
Thanks!
Stefan
On Thu, Feb 09, 2012 at 05:04:51PM -0800, zack_pcd wrote:
I am seeing the following: select count(*) from f_sorted; +-----------+ | L7 | +===========+ | 102612097 | +-----------+ 1 tuple (0.988ms)
select count(*) from f_sorted where yyyymm='201110'; +----------+ | L10 | +==========+ | 23505343 | +----------+ 1 tuple (1.0s)
select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L6 | +=========+ | 5941581 | +---------+ 1 tuple (23.8s)
Very fast thus far! However:
copy (select distinct(uuid) from f_sorted where yyyymm='201110') into '/home/user/filename' USING DELIMITERS ',','\n','"' NULL as 'null';
This has been running for hours and has yet to start writing to the file.
I get this when I explain the query: +----------------------------------------------------------------------------------------+ | mal | +========================================================================================+ | function user.s5_3{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_22:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_23 := algebra.uselect(X_22,"201110":str); | | X_20:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_24 := algebra.kdifference(X_23,X_20); | | X_21 := algebra.uselect(X_20,"201110":str); | | X_25 := algebra.kunion(X_24,X_21); | | X_18:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_19 := algebra.uselect(X_18,"201110":str); | | X_26 := algebra.kunion(X_25,X_19); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_27 := algebra.kdifference(X_26,X_17); | | X_28 := algebra.markT(X_27,0@0:oid); | | X_29 := bat.reverse(X_28); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_5:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_5); | | X_30 := algebra.leftjoin(X_29,X_14); | | (ext44,grp42) := group.done(X_30); | | X_33 := bat.mirror(ext44); | | X_34 := algebra.leftjoin(X_33,X_30); | | X_35 := sql.resultSet(1,1,X_34); | | sql.rsColumn(X_35,"sys.f_sorted","uuid":str,"varchar",2048,0,X_34); | | X_43 := io.stdout(); | | sql.exportResult(X_43,X_35); | | end s5_3; | +----------------------------------------------------------------------------------------+ 31 tuples (1.350ms)
Is it doing something wrong? I don't expect the query to take so long (hours) to produce the output, especially since the distinct count is also very fast.
The copy-into *seems* to be irrelevant to the problem, but I decided to include it just in case, as that's how I am using the query. -- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3329757... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3330212... Sent from the monetdb-users mailing list archive at Nabble.com.

Hi Zack, thank you very much. The fact that MonetDB only uses 1-2 cores (with this query) is to be expected, as we currently cannot run "distinct" efficiently i parallel. Also the fact that MonetDB writes the result only at the end is to be expected due to MonetDB's operator-at-a-time "bulk processing" execution paradigm. For analyzing this issue in more detail, it would be very helpful for us, if you could also re-run you copy into query with TRACE and send us the result --- provided you could block you machine for another 5-6 hours ... Thanks! Stefan On Fri, Feb 10, 2012 at 09:51:45AM -0800, zack_pcd wrote:
The copy into query finished in 317 minutes. The volume we use is 63% full. However the copy-into query does not start writing in the first few hours (I don't see the file getting created). In fact, I see almost no disk usage, and the query is only using 1 or 2 cores out of 32 (nthreads=32). I'll try to run a trace of the copy-into as well but that will take a while. I'm using the default_pipe in the dec2011 release. I don't think I've actually changed any of the default settings.
Trace and explain of the count distinct query:
sql>explain select count(distinct uuid) from f_sorted where yyyymm='201110'; +-------------------------------------------------------------------------------------+ | mal | +=====================================================================================+ | function user.s7_1{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_23:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_25 := algebra.uselect(X_23,"201110":str); | | X_21:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_26 := algebra.kdifference(X_25,X_21); | | X_22 := algebra.uselect(X_21,"201110":str); | | X_27 := algebra.kunion(X_26,X_22); | | X_19:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_20 := algebra.uselect(X_19,"201110":str); | | X_28 := algebra.kunion(X_27,X_20); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_29 := algebra.kdifference(X_28,X_17); | | X_30 := algebra.markT(X_29,0@0:oid); | | X_31 := bat.reverse(X_30); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_4:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_4); | | X_32 := algebra.leftjoin(X_31,X_14); | | X_33 := algebra.selectNotNil(X_32); | | X_34 := bat.reverse(X_33); | | X_35 := algebra.kunique(X_34); | | X_36 := bat.reverse(X_35); | | X_37 := aggr.count(X_36); | | sql.exportValue(1,"sys.f_sorted","L7":str,"wrd",64,0,6,X_37,""); | | end s7_1; | +-------------------------------------------------------------------------------------+ 30 tuples (1.278ms) sql>trace select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L10 | +=========+ | 5941581 | +---------+ 1 tuple (27.3s) +----------+--------------------------------------------------------------------------------------------------------------+ | ticks | stmt | +==========+==============================================================================================================+ | 6 | X_3 := sql.mvc(); | | 12 | X_23:bat[:oid,:str] =
[102612097] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",0); | | 2497718 | X_24= [23505343] := algebra.uselect(X_23=nil:bat[:oid,:str],A0="201110"); | | 21 | X_21:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",2); | | 28 | X_25= [23505343] := algebra.kdifference(X_24=nil,X_21= :bat[:oid,:str][0]); | | 29 | X_22= [0] := algebra.uselect(X_21=nil:bat[:oid,:str],A0="201110"); | | 10 | X_26= [23505343] := algebra.kunion(X_25=nil,X_22=nil); | | 5 | X_19:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",1); | | 11 | X_20= [0] := algebra.uselect(X_19=nil:bat[:oid,:str],A0="201110"); | | 9 | X_27= [23505343] := algebra.kunion(X_26=nil,X_20=nil); | | 6 | X_17:bat[:oid,:oid] = [0] := sql.bind_dbat(X_3=0,"sys","f_sorted",1); | | 4 | X_18= [0] := bat.reverse(X_17=nil:bat[:oid,:oid]); | | 5 | X_28= [23505343] := algebra.kdifference(X_27=nil,X_18=nil); | | 7 | X_29= [23505343] := algebra.markT(X_28=nil,0@0:oid); | | 3 | X_30= [23505343] := bat.reverse(X_29=nil); | | 12 | X_11:bat[:oid,:str] = [102612097] := sql.bind(X_3=0,"sys","f_sorted","uuid",0); | | 3 | X_9:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","uuid",2); | | 9 | X_13= [102612097] := algebra.kdifference(X_11=nil:bat[:oid,:str],X_9= :bat[:oid,:str][0]); | | 9 | X_14= [102612097] := algebra.kunion(X_13=nil,X_9=nil:bat[:oid,:str]); | | 4 | X_5:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","uuid",1); | | 7 | X_15= [102612097] := algebra.kunion(X_14=nil,X_5=nil:bat[:oid,:str]); | | 885318 | X_31= [23505343] := algebra.leftjoin(X_30=nil,X_15=nil); | | 12 | X_32= [23505343] := algebra.selectNotNil(X_31=nil); | | 4 | X_33= [23505343] := bat.reverse(X_32=nil); | | 23833624 | X_34= [5941581] := algebra.kunique(X_33=nil); | | 14 | X_35= [5941581] := bat.reverse(X_34=nil); | | 87995 | X_36 := aggr.count(X_35=nil); | | 25 | sql.exportValue(1,"sys.f_sorted","L10":str,"wrd",64,0,6,X_36=5941581:wrd,""); | | 27306480 | X_5:void := user.s8_1("201110"); | +----------+--------------------------------------------------------------------------------------------------------------+ 29 tuples (27.3s) Stefan Manegold wrote:
Could you possibly also share the EXPLAIN, or even TRACE of your count(distinct uuid) query with us?
Thanks!
Stefan
On Thu, Feb 09, 2012 at 05:04:51PM -0800, zack_pcd wrote:
I am seeing the following: select count(*) from f_sorted; +-----------+ | L7 | +===========+ | 102612097 | +-----------+ 1 tuple (0.988ms)
select count(*) from f_sorted where yyyymm='201110'; +----------+ | L10 | +==========+ | 23505343 | +----------+ 1 tuple (1.0s)
select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L6 | +=========+ | 5941581 | +---------+ 1 tuple (23.8s)
Very fast thus far! However:
copy (select distinct(uuid) from f_sorted where yyyymm='201110') into '/home/user/filename' USING DELIMITERS ',','\n','"' NULL as 'null';
This has been running for hours and has yet to start writing to the file.
I get this when I explain the query: +----------------------------------------------------------------------------------------+ | mal | +========================================================================================+ | function user.s5_3{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_22:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_23 := algebra.uselect(X_22,"201110":str); | | X_20:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_24 := algebra.kdifference(X_23,X_20); | | X_21 := algebra.uselect(X_20,"201110":str); | | X_25 := algebra.kunion(X_24,X_21); | | X_18:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_19 := algebra.uselect(X_18,"201110":str); | | X_26 := algebra.kunion(X_25,X_19); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_27 := algebra.kdifference(X_26,X_17); | | X_28 := algebra.markT(X_27,0@0:oid); | | X_29 := bat.reverse(X_28); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_5:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_5); | | X_30 := algebra.leftjoin(X_29,X_14); | | (ext44,grp42) := group.done(X_30); | | X_33 := bat.mirror(ext44); | | X_34 := algebra.leftjoin(X_33,X_30); | | X_35 := sql.resultSet(1,1,X_34); | | sql.rsColumn(X_35,"sys.f_sorted","uuid":str,"varchar",2048,0,X_34); | | X_43 := io.stdout(); | | sql.exportResult(X_43,X_35); | | end s5_3; | +----------------------------------------------------------------------------------------+ 31 tuples (1.350ms)
Is it doing something wrong? I don't expect the query to take so long (hours) to produce the output, especially since the distinct count is also very fast.
The copy-into *seems* to be irrelevant to the problem, but I decided to include it just in case, as that's how I am using the query. -- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3329757... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3330212... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

On Fri, Feb 10, 2012 at 09:51:45AM -0800, zack_pcd wrote:
The copy into query finished in 317 minutes. The volume we use is 63% full. However the copy-into query does not start writing in the first few hours (I don't see the file getting created). In fact, I see almost no disk usage, and the query is only using 1 or 2 cores out of 32 (nthreads=32). I'll try to run a trace of the copy-into as well but that will take a while. I'm using the default_pipe in the dec2011 release. I don't think I've actually changed any of the default settings.
Trace and explain of the count distinct query:
Could you also send the plan and explain of the copy into statements? Niels
sql>explain select count(distinct uuid) from f_sorted where yyyymm='201110'; +-------------------------------------------------------------------------------------+ | mal | +=====================================================================================+ | function user.s7_1{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_23:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_25 := algebra.uselect(X_23,"201110":str); | | X_21:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_26 := algebra.kdifference(X_25,X_21); | | X_22 := algebra.uselect(X_21,"201110":str); | | X_27 := algebra.kunion(X_26,X_22); | | X_19:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_20 := algebra.uselect(X_19,"201110":str); | | X_28 := algebra.kunion(X_27,X_20); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_29 := algebra.kdifference(X_28,X_17); | | X_30 := algebra.markT(X_29,0@0:oid); | | X_31 := bat.reverse(X_30); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_4:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_4); | | X_32 := algebra.leftjoin(X_31,X_14); | | X_33 := algebra.selectNotNil(X_32); | | X_34 := bat.reverse(X_33); | | X_35 := algebra.kunique(X_34); | | X_36 := bat.reverse(X_35); | | X_37 := aggr.count(X_36); | | sql.exportValue(1,"sys.f_sorted","L7":str,"wrd",64,0,6,X_37,""); | | end s7_1; | +-------------------------------------------------------------------------------------+ 30 tuples (1.278ms) sql>trace select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L10 | +=========+ | 5941581 | +---------+ 1 tuple (27.3s) +----------+--------------------------------------------------------------------------------------------------------------+ | ticks | stmt | +==========+==============================================================================================================+ | 6 | X_3 := sql.mvc(); | | 12 | X_23:bat[:oid,:str] =
[102612097] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",0); | | 2497718 | X_24= [23505343] := algebra.uselect(X_23=nil:bat[:oid,:str],A0="201110"); | | 21 | X_21:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",2); | | 28 | X_25= [23505343] := algebra.kdifference(X_24=nil,X_21= :bat[:oid,:str][0]); | | 29 | X_22= [0] := algebra.uselect(X_21=nil:bat[:oid,:str],A0="201110"); | | 10 | X_26= [23505343] := algebra.kunion(X_25=nil,X_22=nil); | | 5 | X_19:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","yyyymm",1); | | 11 | X_20= [0] := algebra.uselect(X_19=nil:bat[:oid,:str],A0="201110"); | | 9 | X_27= [23505343] := algebra.kunion(X_26=nil,X_20=nil); | | 6 | X_17:bat[:oid,:oid] = [0] := sql.bind_dbat(X_3=0,"sys","f_sorted",1); | | 4 | X_18= [0] := bat.reverse(X_17=nil:bat[:oid,:oid]); | | 5 | X_28= [23505343] := algebra.kdifference(X_27=nil,X_18=nil); | | 7 | X_29= [23505343] := algebra.markT(X_28=nil,0@0:oid); | | 3 | X_30= [23505343] := bat.reverse(X_29=nil); | | 12 | X_11:bat[:oid,:str] = [102612097] := sql.bind(X_3=0,"sys","f_sorted","uuid",0); | | 3 | X_9:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","uuid",2); | | 9 | X_13= [102612097] := algebra.kdifference(X_11=nil:bat[:oid,:str],X_9= :bat[:oid,:str][0]); | | 9 | X_14= [102612097] := algebra.kunion(X_13=nil,X_9=nil:bat[:oid,:str]); | | 4 | X_5:bat[:oid,:str] = [0] := sql.bind(X_3=0,"sys","f_sorted","uuid",1); | | 7 | X_15= [102612097] := algebra.kunion(X_14=nil,X_5=nil:bat[:oid,:str]); | | 885318 | X_31= [23505343] := algebra.leftjoin(X_30=nil,X_15=nil); | | 12 | X_32= [23505343] := algebra.selectNotNil(X_31=nil); | | 4 | X_33= [23505343] := bat.reverse(X_32=nil); | | 23833624 | X_34= [5941581] := algebra.kunique(X_33=nil); | | 14 | X_35= [5941581] := bat.reverse(X_34=nil); | | 87995 | X_36 := aggr.count(X_35=nil); | | 25 | sql.exportValue(1,"sys.f_sorted","L10":str,"wrd",64,0,6,X_36=5941581:wrd,""); | | 27306480 | X_5:void := user.s8_1("201110"); | +----------+--------------------------------------------------------------------------------------------------------------+ 29 tuples (27.3s) Stefan Manegold wrote:
Could you possibly also share the EXPLAIN, or even TRACE of your count(distinct uuid) query with us?
Thanks!
Stefan
On Thu, Feb 09, 2012 at 05:04:51PM -0800, zack_pcd wrote:
I am seeing the following: select count(*) from f_sorted; +-----------+ | L7 | +===========+ | 102612097 | +-----------+ 1 tuple (0.988ms)
select count(*) from f_sorted where yyyymm='201110'; +----------+ | L10 | +==========+ | 23505343 | +----------+ 1 tuple (1.0s)
select count(distinct uuid) from f_sorted where yyyymm='201110'; +---------+ | L6 | +=========+ | 5941581 | +---------+ 1 tuple (23.8s)
Very fast thus far! However:
copy (select distinct(uuid) from f_sorted where yyyymm='201110') into '/home/user/filename' USING DELIMITERS ',','\n','"' NULL as 'null';
This has been running for hours and has yet to start writing to the file.
I get this when I explain the query: +----------------------------------------------------------------------------------------+ | mal | +========================================================================================+ | function user.s5_3{autoCommit=true}():void; | | X_2 := sql.mvc(); | | X_22:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",0); | | X_23 := algebra.uselect(X_22,"201110":str); | | X_20:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",2); | | X_24 := algebra.kdifference(X_23,X_20); | | X_21 := algebra.uselect(X_20,"201110":str); | | X_25 := algebra.kunion(X_24,X_21); | | X_18:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","yyyymm",1); | | X_19 := algebra.uselect(X_18,"201110":str); | | X_26 := algebra.kunion(X_25,X_19); | | X_16:bat[:oid,:oid] := sql.bind_dbat(X_2,"sys","f_sorted",1); | | X_17 := bat.reverse(X_16); | | X_27 := algebra.kdifference(X_26,X_17); | | X_28 := algebra.markT(X_27,0@0:oid); | | X_29 := bat.reverse(X_28); | | X_10:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",0); | | X_8:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",2); | | X_12 := algebra.kdifference(X_10,X_8); | | X_13 := algebra.kunion(X_12,X_8); | | X_5:bat[:oid,:str] := sql.bind(X_2,"sys","f_sorted","uuid",1); | | X_14 := algebra.kunion(X_13,X_5); | | X_30 := algebra.leftjoin(X_29,X_14); | | (ext44,grp42) := group.done(X_30); | | X_33 := bat.mirror(ext44); | | X_34 := algebra.leftjoin(X_33,X_30); | | X_35 := sql.resultSet(1,1,X_34); | | sql.rsColumn(X_35,"sys.f_sorted","uuid":str,"varchar",2048,0,X_34); | | X_43 := io.stdout(); | | sql.exportResult(X_43,X_35); | | end s5_3; | +----------------------------------------------------------------------------------------+ 31 tuples (1.350ms)
Is it doing something wrong? I don't expect the query to take so long (hours) to produce the output, especially since the distinct count is also very fast.
The copy-into *seems* to be irrelevant to the problem, but I decided to include it just in case, as that's how I am using the query. -- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3329757... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- View this message in context: http://old.nabble.com/select-distinct-...-where-algorithm-tp33297578p3330212... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

Well, the trace/explain of the copy select distinct query certainly is much
larger than the other traces:
trace copy (select distinct(uuid) from f_sorted where yyyymm='201110') into
'/home/user/file' USING DELIMITERS ',','\n','"' NULL as 'null';
result:
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ticks | stmt
|
+=============+=====================================================================================================================================================================================+
| 5 | X_19 := nil:bat[:oid,:str];
|
| 6 | X_3 := sql.mvc();
|
| 61 | X_378:bat[:oid,:str] =