Hello, I have a large database that is commonly used for relatively simple SELECT queries. Some of these can take quite a bit of time (hours... overnight) so I've been looking into it a bit. What seems clear is that mserver5 (Oct2014-SP1) is disk-bound, but one curious thing is that most of the time it is write-bound. Despite doing only SELECT queries (no updates or inserts), the amount of data being written is nearly two times as much as data read. Why is that? Is the server restructuring the database on the fly in order to get better performance in later queries? 3X the disk I/O seems excessive so I wonder if there is some way to tune this, or investigate further to see what is happening? Thanks! Tim
Have a look at (and post here if needed) the trace of one such queries (
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/QueryTiming), so
that you can see what the query actually does.
It sounds likely that your queries need to materialize to main memory large
intermediate results before they complete.
This, together with possibly little main memory available on the server,
can turn into disk I/O because of memory mapping.
On 30 January 2015 at 10:00, Tim Burress
Hello,
I have a large database that is commonly used for relatively simple SELECT queries. Some of these can take quite a bit of time (hours... overnight) so I've been looking into it a bit. What seems clear is that mserver5 (Oct2014-SP1) is disk-bound, but one curious thing is that most of the time it is write-bound. Despite doing only SELECT queries (no updates or inserts), the amount of data being written is nearly two times as much as data read.
Why is that? Is the server restructuring the database on the fly in order to get better performance in later queries? 3X the disk I/O seems excessive so I wonder if there is some way to tune this, or investigate further to see what is happening?
Thanks!
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Which version and on what platform are you running your queries? A snippet of your query could be helpful. regards, Martin On 30/01/15 10:00, Tim Burress wrote:
Hello,
I have a large database that is commonly used for relatively simple SELECT queries. Some of these can take quite a bit of time (hours... overnight) so I've been looking into it a bit. What seems clear is that mserver5 (Oct2014-SP1) is disk-bound, but one curious thing is that most of the time it is write-bound. Despite doing only SELECT queries (no updates or inserts), the amount of data being written is nearly two times as much as data read.
Why is that? Is the server restructuring the database on the fly in order to get better performance in later queries? 3X the disk I/O seems excessive so I wonder if there is some way to tune this, or investigate further to see what is happening?
Thanks!
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 01/30/2015 06:15 PM, Martin Kersten wrote:
Which version and on what platform are you running your queries? A snippet of your query could be helpful.
The version is Oct2014-SP1 (I'll upgrade to the new one as soon as some of these queries finish!) on Fedora 20. The hardware is a 3.4GHz Xeon CPU with 32GB of RAM. The queries are all pretty simple, but the table involved has perhaps 4x10^9 records. A representative query looks like this: select iso2 as country,count(distinct source) as sources from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by country order by sources desc; Given the various 'where' conditions, Roberto's idea -- the need to store intermediate results -- sounds very plausible, particularly when there are other jobs claiming RAM on the system. I'm re-running it now with most other user processes stopped. The mserver5 process is using about 77% of memory and nmon shows the disk with the database is pegged at 100% utilization. All writes at the moment, though the df command doesn't show any changes in used space, so I'm not sure where the written data is going! Thanks! Tim
On Jan 30, 2015, at 10:48, Tim Burress
wrote: On 01/30/2015 06:15 PM, Martin Kersten wrote:
Which version and on what platform are you running your queries? A snippet of your query could be helpful.
The version is Oct2014-SP1 (I'll upgrade to the new one as soon as some of these queries finish!) on Fedora 20. The hardware is a 3.4GHz Xeon CPU with 32GB of RAM.
The queries are all pretty simple, but the table involved has perhaps 4x10^9 records. A representative query looks like this:
select iso2 as country,count(distinct source) as sources from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and
This probably triggers the creation of imprints, that also causes some I/O. Do you notice any major exec. time difference if you run your query multiple times?
("action"='R' or "action"='T') and medium='E' group by country order by sources desc;
Given the various 'where' conditions, Roberto's idea -- the need to store intermediate results -- sounds very plausible, particularly when there are other jobs claiming RAM on the system.
I'm re-running it now with most other user processes stopped. The mserver5 process is using about 77% of memory and nmon shows the disk with the database is pegged at 100% utilization. All writes at the moment, though the df command doesn't show any changes in used space, so I'm not sure where the written data is going!
Thanks!
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
I should perhaps also mention that I'm accessing the database through the R API, so the details of the API implementation may also have an effect. I'm not sure. Tim
On Jan 30, 2015, at 10:56, Tim Burress
wrote: I should perhaps also mention that I'm accessing the database through the R API, so the details of the API implementation may also have an effect. I'm not sure.
You can try to run your query in, e.g., mclient, to see if that makes any difference. Regards, Jennie
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 01/30/2015 07:44 PM, Ying Zhang wrote:
You can try to run your query in, e.g., mclient, to see if that makes any difference.
Thanks! I will try your suggestion, and also see if the timing changes from query to query. The first instance of the query has been running now for about two hours, still writing to the disk. That's why I began to wonder if something was wrong. Tim
Hi,
I was going through your query and looks like that COUNT(DISTINCT source)
is the killer.
I have rewritten the same query as below . Please try this and it should
give you the same results.
select c.iso2 as country,count(c.source) as sources from (select
iso2,source from sessions where timestamp>='2014-10-01' and
timestamp<'2015-01-01' and
("action"='R' or "action"='T') and medium='E' group by iso2,source) c group
by country order by sources desc;
Thanks
On Fri, Jan 30, 2015 at 11:00 AM, Tim Burress
On 01/30/2015 07:44 PM, Ying Zhang wrote:
You can try to run your query in, e.g., mclient, to see if that makes any difference.
Thanks! I will try your suggestion, and also see if the timing changes from query to query. The first instance of the query has been running now for about two hours, still writing to the disk. That's why I began to wonder if something was wrong.
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Thanks! I will try this! In fact I was wondering if there was some way to restructure the query but I don't know enough about how things work behind the scenes to do more than guess. The original query has been running now for ten hours, and is still doing mostly disk writes, so I was about to give up. I'll let you know how it goes. Tim On 01/31/2015 12:47 AM, shamsul hassan wrote:
Hi,
I was going through your query and looks like that COUNT(DISTINCT source) is the killer.
I have rewritten the same query as below . Please try this and it should give you the same results.
select c.iso2 as country,count(c.source) as sources from (select iso2,source from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by iso2,source) c group by country order by sources desc;
Thanks
On Fri, Jan 30, 2015 at 11:00 AM, Tim Burress
wrote: On 01/30/2015 07:44 PM, Ying Zhang wrote:
You can try to run your query in, e.g., mclient, to see if that makes any difference.
Thanks! I will try your suggestion, and also see if the timing changes from query to query. The first instance of the query has been running now for about two hours, still writing to the disk. That's why I began to wonder if something was wrong.
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Tim, also to me, some too large intermediate result seem to be the most likely reason for the observed behavior, possibly combined with a sub-optimal plan. To analyze in more detail, could you possibly share the following information with us? - the schema of your table, or at least the type of all columns involved (the output of `\d sessions` would be perfect) - some statistics / intermediate result sizes: select count(*) from sessions where timestamp>='2014-10-01'; select count(*) from sessions where timestamp<'2015-01-01' select count(*) from sessions where "action"='R'; select count(*) from sessions where "action"='T'; select count(*) from sessions where medium='E'; select count(distinct source) from sessions; select count(distinct country) from sessions; - the execution plan of your query: EXPLAIN select iso2 as country,count(distinct source) as sources from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by country order by sources desc; - the execution plan of Hassan's alternative query: EXPLAIN select c.iso2 as country,count(c.source) as sources from (select iso2,source from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by iso2,source) c group by country order by sources desc; Thanks! Stefan ----- Original Message -----
Thanks! I will try this! In fact I was wondering if there was some way to restructure the query but I don't know enough about how things work behind the scenes to do more than guess. The original query has been running now for ten hours, and is still doing mostly disk writes, so I was about to give up.
I'll let you know how it goes.
Tim
On 01/31/2015 12:47 AM, shamsul hassan wrote:
Hi,
I was going through your query and looks like that COUNT(DISTINCT source) is the killer.
I have rewritten the same query as below . Please try this and it should give you the same results.
select c.iso2 as country,count(c.source) as sources from (select iso2,source from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by iso2,source) c group by country order by sources desc;
Thanks
On Fri, Jan 30, 2015 at 11:00 AM, Tim Burress
wrote: On 01/30/2015 07:44 PM, Ying Zhang wrote:
You can try to run your query in, e.g., mclient, to see if that makes any difference.
Thanks! I will try your suggestion, and also see if the timing changes from query to query. The first instance of the query has been running now for about two hours, still writing to the disk. That's why I began to wonder if something was wrong.
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi Stefan, I agree that the problem is just too much data (or maybe the plan for dealing with it). I had to stop Hassan's query, too, after about 10 hours. I thought perhaps I had a solution in that I could partition the sources to reduce the size of the intermediate results, but I realize now that it won't help if other parts of the query are still huge. I have to go to a seminar this weekend but will definitely post the information you asked for when I get back on Monday. Thanks very much to everyone for your help! Tim On 01/31/2015 05:32 PM, Stefan Manegold wrote:
Hi Tim,
also to me, some too large intermediate result seem to be the most likely reason for the observed behavior, possibly combined with a sub-optimal plan.
To analyze in more detail, could you possibly share the following information with us?
- the schema of your table, or at least the type of all columns involved (the output of `\d sessions` would be perfect)
- some statistics / intermediate result sizes: select count(*) from sessions where timestamp>='2014-10-01'; select count(*) from sessions where timestamp<'2015-01-01' select count(*) from sessions where "action"='R'; select count(*) from sessions where "action"='T'; select count(*) from sessions where medium='E'; select count(distinct source) from sessions; select count(distinct country) from sessions;
- the execution plan of your query: EXPLAIN select iso2 as country,count(distinct source) as sources from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by country order by sources desc;
- the execution plan of Hassan's alternative query: EXPLAIN select c.iso2 as country,count(c.source) as sources from (select iso2,source from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by iso2,source) c group by country order by sources desc;
Thanks! Stefan
----- Original Message -----
Thanks! I will try this! In fact I was wondering if there was some way to restructure the query but I don't know enough about how things work behind the scenes to do more than guess. The original query has been running now for ten hours, and is still doing mostly disk writes, so I was about to give up.
I'll let you know how it goes.
Tim
On 01/31/2015 12:47 AM, shamsul hassan wrote:
Hi,
I was going through your query and looks like that COUNT(DISTINCT source) is the killer.
I have rewritten the same query as below . Please try this and it should give you the same results.
select c.iso2 as country,count(c.source) as sources from (select iso2,source from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by iso2,source) c group by country order by sources desc;
Thanks
On Fri, Jan 30, 2015 at 11:00 AM, Tim Burress
wrote: On 01/30/2015 07:44 PM, Ying Zhang wrote:
You can try to run your query in, e.g., mclient, to see if that makes any difference.
Thanks! I will try your suggestion, and also see if the timing changes from query to query. The first instance of the query has been running now for about two hours, still writing to the disk. That's why I began to wonder if something was wrong.
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Notice: For security purposes, we have sent the file(s) attached to this e-mail in encrypted ZIP format. You will receive a separate e-mail containing a password to the file(s). Hello! Here is the data you asked for, plus I added the total record count for the table. Some of the column names are a bit different from what I posted before because I changed them for readability ss <-> sessions med <-> medium src <-> source (etc) I could not get a result for one question, "select count(distinct src)" after eight hours, so this may well be a big part of the problem. I got a value for that question by parsing the raw data. FYI this is a table of Internet traffic that's been analyzed in various ways. One possibility for the root problem may be that there was a bug (https://www.monetdb.org/bugzilla/show_bug.cgi?id=3646) in Oct2014-SP1 concerning the ordering of the 'inet' datatype. Since the search to see if a given 'src' is DISTINCT probably relies on sorting at some level, I wondered if upgrading would solve the problem. Unfortunately it does not seem so. A query for count(distinct src) done after the upgrade has been running for an hour with no results and a lot of disk write operations. However running a test case I find that there is still a problem (just filed as https://www.monetdb.org/bugzilla/show_bug.cgi?id=3660) with comparison operators on the 'inet' datatype, so there may yet be a core issue with handling that datatype that makes "count(distinct...)" inefficient. If there is any other data I can provide, or tests I could run, just let me know, and thanks for your help! Tim
*********************************************************************** 先ほどお送りしたメールの添付ファイルパスワード情報をご案内いたします。 お手数ではございますが、ファイル取得の際には下記のパスワードを ご入力頂けますよう、宜しくお願い申し上げます。 パスワード形式は、以下の通りです。 パスワード:<スペース><12文字の数字、英字、記号の組み合わせのパスワード> *********************************************************************** [パスワード/Password]: FvA3SksNLh4J
Oh... sorry about the attachment password thing. For reasons that make no sense to me, but are above my pay grade, it's required for all of our outgoing mail. For convenience, here's the same information inline. Tim sql>\d raw.ss CREATE TABLE "raw"."ss" ( "log" VARCHAR(32), "record" INTEGER, "size" SMALLINT, "sensor" VARCHAR(64), "ts" TIMESTAMP WITH TIME ZONE, "vector" VARCHAR(16), "action" VARCHAR(16), "med" VARCHAR(16), "offset" VARCHAR(16), "octets" INTEGER, "l3" SMALLINT, "l4" SMALLINT, "src" INET, "snet" INET, "siso2" CHAR(2), "sport" INTEGER, "s1" BOOLEAN, "s2" BOOLEAN, "s3" BOOLEAN, "dst" INET, "dnet" INET, "diso2" CHAR(2), "dport" INTEGER, "d1" BOOLEAN, "d2" BOOLEAN, "d3" BOOLEAN, "a1" BOOLEAN, "a2" BOOLEAN, "a3" BOOLEAN, "a4" BOOLEAN, "a5" BOOLEAN, "a6" BOOLEAN, "a7" BOOLEAN, "a8" BOOLEAN, "truncated" BOOLEAN, CONSTRAINT "uniquesslogentry" UNIQUE ("log", "record") ); sql>select count(*) from raw.ss; +------------+ | L1 | +============+ | 4525458159 | +------------+ 1 tuple (29.963ms) sql>select count(*) from raw.ss where "action"='T'; +-----------+ | L1 | +===========+ | 446532712 | +-----------+ 1 tuple (55.3s) sql>select count(*) from raw.ss where "action"='R'; +------------+ | L1 | +============+ | 3988792686 | +------------+ 1 tuple (7m 13s) sql>select count(*) from raw.ss where med='E'; +------------+ | L1 | +============+ | 2823214994 | +------------+ 1 tuple (6m 2s) sql>select count(distinct siso2) from raw.ss; +------+ | L1 | +======+ | 243 | +------+ 1 tuple (7m 54s) sql>select count(*) from raw.ss where ts>='2014-10-01'; +------------+ | L1 | +============+ | 3591320144 | +------------+ 1 tuple (1m 16s) sql>select count(*) from raw.ss where ts<'2015-01-01'; +------------+ | L1 | +============+ | 3917617844 | +------------+ 1 tuple (12.4s) sql>select count(distinct src) from raw.ss; (no answer in 8 hours) Answer should be 101,706,247 sql>EXPLAIN select siso2 as country,count(distinct src) as sources from raw.ss where ts>='2014-10-01' and ts<'2015-01-01' and ("action"='T' or "action"='R') and med='E' group by country order by sources desc; +----------------------------------------------------------------------------------------------------------------------------------+ | mal | +==================================================================================================================================+ | function user.s5_1{autoCommit=true}(A0:str,A1:str,A2:str,A3:str,A4:str):void; | | X_7 := sql.mvc(); | | X_11:bat[:oid,:str] := sql.bind(X_7,"raw","ss","action",0); | | X_19:bat[:oid,:ts] := sql.bind(X_7,"raw","ss","ts",0); | | X_27:bat[:oid,:str] := sql.bind(X_7,"raw","ss","med",0); | | X_8:bat[:oid,:oid] := sql.tid(X_7,"raw","ss"); | | X_109 := algebra.subselect(X_27,X_8,A4,A4,true,true,false); | | (X_29,r1_36) := sql.bind(X_7,"raw","ss","med",2); | | X_110 := algebra.subselect(r1_36,A4,A4,true,true,false); | | X_31:bat[:oid,:str] := sql.bind(X_7,"raw","ss","med",1); | | X_111 := algebra.subselect(X_31,X_8,A4,A4,true,true,false); | | X_32 := sql.subdelta(X_109,X_8,X_29,X_110,X_111); | | X_24 := calc.ts(A0,7); | | X_26 := calc.ts(A1,7); | | X_112 := algebra.subselect(X_19,X_32,X_24,X_26,true,false,false); | | (X_21,r1_24) := sql.bind(X_7,"raw","ss","ts",2); | | X_113 := algebra.subselect(r1_24,X_24,X_26,true,false,false); | | X_23:bat[:oid,:ts] := sql.bind(X_7,"raw","ss","ts",1); | | X_114 := algebra.subselect(X_23,X_32,X_24,X_26,true,false,false); | | X_35 := sql.subdelta(X_112,X_32,X_21,X_113,X_114); | | X_115 := algebra.subselect(X_11,X_35,A2,A2,true,true,false); | | (X_14,r1_14) := sql.bind(X_7,"raw","ss","action",2); | | X_116 := algebra.subselect(r1_14,A2,A2,true,true,false); | | X_17:bat[:oid,:str] := sql.bind(X_7,"raw","ss","action",1); | | X_117 := algebra.subselect(X_17,X_35,A2,A2,true,true,false); | | X_36 := sql.subdelta(X_115,X_35,X_14,X_116,X_117); | | X_118 := algebra.subselect(X_11,X_35,A3,A3,true,true,false); | | X_119 := algebra.subselect(r1_14,A3,A3,true,true,false); | | X_120 := algebra.subselect(X_17,X_35,A3,A3,true,true,false); | | X_37 := sql.subdelta(X_118,X_35,X_14,X_119,X_120); | | X_38 := bat.mergecand(X_36,X_37); | | X_39:bat[:oid,:str] := sql.bind(X_7,"raw","ss","siso2",0); | | (X_44,r1_55) := sql.bind(X_7,"raw","ss","siso2",2); | | X_47:bat[:oid,:str] := sql.bind(X_7,"raw","ss","siso2",1); | | X_49 := sql.projectdelta(X_38,X_39,X_44,r1_55,X_47); | | X_50:bat[:oid,:inet] := sql.bind(X_7,"raw","ss","src",0); | | (X_52,r1_65) := sql.bind(X_7,"raw","ss","src",2); | | X_54:bat[:oid,:inet] := sql.bind(X_7,"raw","ss","src",1); | | X_55 := sql.projectdelta(X_38,X_50,X_52,r1_65,X_54); | | (X_56,r1_71,r2_71) := group.subgroup(X_55); | | (X_59,r1_74,r2_74) := group.subgroupdone(X_49,X_56); | | X_62 := algebra.leftfetchjoin(r1_74,X_55); | | X_63 := algebra.leftfetchjoin(r1_74,X_49); | | (X_64,r1_79,r2_79) := group.subgroupdone(X_63); | | X_67:bat[:oid,:wrd] := aggr.subcount(X_62,X_64,r1_79,true); | | (X_69,r1_84,r2_84) := algebra.subsort(X_67,true,false); | | X_73:bat[:oid,:str] := algebra.leftfetchjoinPath(r1_84,r1_79,X_63); | | X_74 := algebra.leftfetchjoin(r1_84,X_67); | | X_75 := sql.resultSet(2,1,X_73); | | sql.rsColumn(X_75,"raw.ss","country","char",2,0,X_73); | | sql.rsColumn(X_75,"raw.L1","sources","wrd",64,0,X_74); | | X_88 := io.stdout(); | | sql.exportResult(X_88,X_75); | | end s5_1; | | # querylog.define("explain select siso2 as country,count(distinct src) as sources from raw.ss where ts>=\\'2014-10-01\\' and ts< | : \\'2015-01-01\\' and (\"action\"=\\'T\\' or \"action\"=\\'R\\') and med=\\'E\\' group by country order by sources des : : c;","default_pipe") : | # optimizer.mitosis() | | # optimizer.dataflow() | +----------------------------------------------------------------------------------------------------------------------------------+ 57 tuples (33.922ms) sql> sql>EXPLAIN select c.siso2 as country,count(c.src) as sources from (select siso2,src from raw.ss where ts>='2014-10-01' and ts<'2015-01-01' and ("action"='T' or "action"='R') and med='E' group by siso2,src) c group by country order by sources desc; +----------------------------------------------------------------------------------------------------------------------------------+ | mal | +==================================================================================================================================+ | function user.s6_1{autoCommit=true}(A0:str,A1:str,A2:str,A3:str,A4:str):void; | | X_7 := sql.mvc(); | | X_11:bat[:oid,:str] := sql.bind(X_7,"raw","ss","action",0); | | X_19:bat[:oid,:ts] := sql.bind(X_7,"raw","ss","ts",0); | | X_27:bat[:oid,:str] := sql.bind(X_7,"raw","ss","med",0); | | X_8:bat[:oid,:oid] := sql.tid(X_7,"raw","ss"); | | X_109 := algebra.subselect(X_27,X_8,A4,A4,true,true,false); | | (X_29,r1_36) := sql.bind(X_7,"raw","ss","med",2); | | X_110 := algebra.subselect(r1_36,A4,A4,true,true,false); | | X_31:bat[:oid,:str] := sql.bind(X_7,"raw","ss","med",1); | | X_111 := algebra.subselect(X_31,X_8,A4,A4,true,true,false); | | X_32 := sql.subdelta(X_109,X_8,X_29,X_110,X_111); | | X_24 := calc.ts(A0,7); | | X_26 := calc.ts(A1,7); | | X_112 := algebra.subselect(X_19,X_32,X_24,X_26,true,false,false); | | (X_21,r1_24) := sql.bind(X_7,"raw","ss","ts",2); | | X_113 := algebra.subselect(r1_24,X_24,X_26,true,false,false); | | X_23:bat[:oid,:ts] := sql.bind(X_7,"raw","ss","ts",1); | | X_114 := algebra.subselect(X_23,X_32,X_24,X_26,true,false,false); | | X_35 := sql.subdelta(X_112,X_32,X_21,X_113,X_114); | | X_115 := algebra.subselect(X_11,X_35,A2,A2,true,true,false); | | (X_14,r1_14) := sql.bind(X_7,"raw","ss","action",2); | | X_116 := algebra.subselect(r1_14,A2,A2,true,true,false); | | X_17:bat[:oid,:str] := sql.bind(X_7,"raw","ss","action",1); | | X_117 := algebra.subselect(X_17,X_35,A2,A2,true,true,false); | | X_36 := sql.subdelta(X_115,X_35,X_14,X_116,X_117); | | X_118 := algebra.subselect(X_11,X_35,A3,A3,true,true,false); | | X_119 := algebra.subselect(r1_14,A3,A3,true,true,false); | | X_120 := algebra.subselect(X_17,X_35,A3,A3,true,true,false); | | X_37 := sql.subdelta(X_118,X_35,X_14,X_119,X_120); | | X_38 := bat.mergecand(X_36,X_37); | | X_39:bat[:oid,:str] := sql.bind(X_7,"raw","ss","siso2",0); | | (X_44,r1_55) := sql.bind(X_7,"raw","ss","siso2",2); | | X_47:bat[:oid,:str] := sql.bind(X_7,"raw","ss","siso2",1); | | X_49 := sql.projectdelta(X_38,X_39,X_44,r1_55,X_47); | | X_50:bat[:oid,:inet] := sql.bind(X_7,"raw","ss","src",0); | | (X_52,r1_65) := sql.bind(X_7,"raw","ss","src",2); | | X_54:bat[:oid,:inet] := sql.bind(X_7,"raw","ss","src",1); | | X_55 := sql.projectdelta(X_38,X_50,X_52,r1_65,X_54); | | (X_56,r1_71,r2_71) := group.subgroup(X_55); | | (X_59,r1_74,r2_74) := group.subgroupdone(X_49,X_56); | | X_62 := algebra.leftfetchjoin(r1_74,X_55); | | X_63 := algebra.leftfetchjoin(r1_74,X_49); | | (X_64,r1_79,r2_79) := group.subgroupdone(X_63); | | X_67:bat[:oid,:wrd] := aggr.subcount(X_62,X_64,r1_79,true); | | (X_69,r1_84,r2_84) := algebra.subsort(X_67,true,false); | | X_73:bat[:oid,:str] := algebra.leftfetchjoinPath(r1_84,r1_79,X_63); | | X_74 := algebra.leftfetchjoin(r1_84,X_67); | | X_75 := sql.resultSet(2,1,X_73); | | sql.rsColumn(X_75,"raw.c","country","char",2,0,X_73); | | sql.rsColumn(X_75,"raw.L1","sources","wrd",64,0,X_74); | | X_88 := io.stdout(); | | sql.exportResult(X_88,X_75); | | end s6_1; | | # querylog.define("explain select c.siso2 as country,count(c.src) as sources from (select siso2,src from raw.ss where ts>=\\'201 | : 4-10-01\\' and ts<\\'2015-01-01\\' and (\"action\"=\\'T\\' or \"action\"=\\'R\\') and med=\\'E\\' group by siso2,src) : : c group by country order by sources desc;","default_pipe") : | # optimizer.mitosis() | | # optimizer.dataflow() | +----------------------------------------------------------------------------------------------------------------------------------+ 57 tuples (1.031ms)
participants (6)
-
Martin Kersten
-
Roberto Cornacchia
-
shamsul hassan
-
Stefan Manegold
-
Tim Burress
-
Ying Zhang