
Hi Stefan, I agree that the problem is just too much data (or maybe the plan for dealing with it). I had to stop Hassan's query, too, after about 10 hours. I thought perhaps I had a solution in that I could partition the sources to reduce the size of the intermediate results, but I realize now that it won't help if other parts of the query are still huge. I have to go to a seminar this weekend but will definitely post the information you asked for when I get back on Monday. Thanks very much to everyone for your help! Tim On 01/31/2015 05:32 PM, Stefan Manegold wrote:
Hi Tim,
also to me, some too large intermediate result seem to be the most likely reason for the observed behavior, possibly combined with a sub-optimal plan.
To analyze in more detail, could you possibly share the following information with us?
- the schema of your table, or at least the type of all columns involved (the output of `\d sessions` would be perfect)
- some statistics / intermediate result sizes: select count(*) from sessions where timestamp>='2014-10-01'; select count(*) from sessions where timestamp<'2015-01-01' select count(*) from sessions where "action"='R'; select count(*) from sessions where "action"='T'; select count(*) from sessions where medium='E'; select count(distinct source) from sessions; select count(distinct country) from sessions;
- the execution plan of your query: EXPLAIN select iso2 as country,count(distinct source) as sources from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by country order by sources desc;
- the execution plan of Hassan's alternative query: EXPLAIN select c.iso2 as country,count(c.source) as sources from (select iso2,source from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by iso2,source) c group by country order by sources desc;
Thanks! Stefan
----- Original Message -----
Thanks! I will try this! In fact I was wondering if there was some way to restructure the query but I don't know enough about how things work behind the scenes to do more than guess. The original query has been running now for ten hours, and is still doing mostly disk writes, so I was about to give up.
I'll let you know how it goes.
Tim
On 01/31/2015 12:47 AM, shamsul hassan wrote:
Hi,
I was going through your query and looks like that COUNT(DISTINCT source) is the killer.
I have rewritten the same query as below . Please try this and it should give you the same results.
select c.iso2 as country,count(c.source) as sources from (select iso2,source from sessions where timestamp>='2014-10-01' and timestamp<'2015-01-01' and ("action"='R' or "action"='T') and medium='E' group by iso2,source) c group by country order by sources desc;
Thanks
On Fri, Jan 30, 2015 at 11:00 AM, Tim Burress
wrote: On 01/30/2015 07:44 PM, Ying Zhang wrote:
You can try to run your query in, e.g., mclient, to see if that makes any difference.
Thanks! I will try your suggestion, and also see if the timing changes from query to query. The first instance of the query has been running now for about two hours, still writing to the disk. That's why I began to wonder if something was wrong.
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list