hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Ignore that, my bad. You can also aggregate over all rows, of course.
On 29 April 2016 at 09:37, Roberto Cornacchia
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico
wrote: hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
hmm, that gives me the same error.. SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars; Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'. sorry if i'm doing something silly.. On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico
wrote: hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico
wrote: hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Anthony, while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax? SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ; The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row. If that's what you want, this should do the job: SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ; Best, Stefan ----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
thanks stefan (and also kai jiang)! i am trying to implement a weighted
percentile strategy which uses that code.. i actually don't want the
PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column
needs to be cumulative for a weighted percentile to work-- is there any
way to reconfigure that sql command so that the `cumjobs` column
cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach-to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold
Anthony,
while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;
Best, Stefan
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
might be able to use this.. will try..
http://stackoverflow.com/questions/2120544/how-to-get-cumulative-sum
On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico
thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach-to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold
wrote: Anthony,
while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;
Best, Stefan
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Anthony,
Did you manage to implement the weighted percentile? I also have a case
where i have the value and the count of each value and need to calculate
the percentile.
Now I understand why you need the cumsum for this.
Thank you.
On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico
might be able to use this.. will try.. http://stackoverflow.com/ questions/2120544/how-to-get-cumulative-sum
On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico
wrote: thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another- approach-to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold
wrote: Anthony,
while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;
Best, Stefan
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
I think you can do this if you enable Python extensions and write a UDF using the numpy library.
-----Original Message-----
From: imad hajj chahine [imad.hajj.chahine@gmail.commailto:imad.hajj.chahine@gmail.com]
Sent: Tuesday, December 13, 2016 07:53 AM Eastern Standard Time
To: Communication channel for MonetDB users
Subject: Re: SUM() OVER (ORDER BY..)
Hi Anthony,
Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile.
Now I understand why you need the cumsum for this.
Thank you.
On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.commailto:ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.commailto:roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.commailto:ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=
_______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=
_______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=
_______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/https://urldefense.proofpoint.com/v2/url?u=http-3A__www.CWI.nl_-7Emanegold_&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=QlLs8qlcZyNWYvrbRa9bQMBrtU1kbzSb7C3ISJMtSyk&e= | Science Park 123 (L321) | | +31 (0)20 592-4212tel:%2B31%20%280%2920%20592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e= _______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=
hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations # run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc")) pctiles <- seq( 0 , 1 , .01 ) library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R) db <- dbConnect( MonetDBLite() ) # requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) ) # store mtcars dbWriteTable( db , 'mtcars' , mtcars ) sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt" dbGetQuery( db , sql ) sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ") dbGetQuery( db , sql2 ) sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" , paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" ) library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt) cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant'))) cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant'))) cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant'))) cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant'))) On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Hi Anthony,
Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.
Thank you.
On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico
wrote: might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum
On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico
wrote: thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach -to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold
wrote:
Anthony,
while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;
Best, Stefan
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Thank You,
Can I ask how many records your table has and the execution time, as in my
case the first query is taking too much time to execute and consume too
much disk space.
On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico
hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations
# run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))
pctiles <- seq( 0 , 1 , .01 )
library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R)
db <- dbConnect( MonetDBLite() )
# requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) )
# store mtcars dbWriteTable( db , 'mtcars' , mtcars )
sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt"
dbGetQuery( db , sql )
sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ")
dbGetQuery( db , sql2 )
sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" ,
paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" )
library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt)
cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))
cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant')))
cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant')))
cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant')))
On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Hi Anthony,
Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.
Thank you.
On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico
wrote: might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum
On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico
wrote: thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach -to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:
Anthony,
while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;
Best, Stefan
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
hi, the table in my example has 32 records On Tue, Dec 13, 2016 at 9:16 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Thank You,
Can I ask how many records your table has and the execution time, as in my case the first query is taking too much time to execute and consume too much disk space.
On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico
wrote: hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations
# run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))
pctiles <- seq( 0 , 1 , .01 )
library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R)
db <- dbConnect( MonetDBLite() )
# requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) )
# store mtcars dbWriteTable( db , 'mtcars' , mtcars )
sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt"
dbGetQuery( db , sql )
sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ")
dbGetQuery( db , sql2 )
sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" ,
paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" )
library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt)
cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))
cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant')))
cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant')))
cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant')))
On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Hi Anthony,
Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.
Thank you.
On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico
wrote: might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum
On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico
wrote: thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach -to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:
Anthony,
while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;
Best, Stefan
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
> No, indeed, this syntax is not supported. > > > On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote: > > > > hmm, that gives me the same error.. > > SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS > tot_wgt, hp FROM mtcars; > > > Error in .local(conn, statement, ...) : > Unable to execute statement 'SELECT > mpg, > wt, > SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, > SUM(wt) over () AS tot_wgt, > hp > FROM mtc...'. > Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'. > > > sorry if i'm doing something silly.. > > > > > On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < > roberto.cornacchia@gmail.com > wrote: > > > > I think what you are looking for is SUM(..) OVER(PARTITION BY ..) > > Otherwise where would the groups for the sum come from? > > Roberto > > On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote: > > > > hi, monetdb does not support SUM() OVER commands.. does anyone have a smart > alternative to implement this? thanks > > > SELECT > batch_id, > job_count, > SUM(job_count) OVER (ORDER BY duration) as cumjobs, > SUM(job_count) over () as totjobs, > duration > FROM > test_data ; > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
I think its not the optimal way to calculate the running sum, as you will
have performance issues on large datasets.
On Tue, Dec 13, 2016 at 4:19 PM, Anthony Damico
hi, the table in my example has 32 records
On Tue, Dec 13, 2016 at 9:16 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Thank You,
Can I ask how many records your table has and the execution time, as in my case the first query is taking too much time to execute and consume too much disk space.
On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico
wrote: hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations
# run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))
pctiles <- seq( 0 , 1 , .01 )
library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R)
db <- dbConnect( MonetDBLite() )
# requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) )
# store mtcars dbWriteTable( db , 'mtcars' , mtcars )
sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt"
dbGetQuery( db , sql )
sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ")
dbGetQuery( db , sql2 )
sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" ,
paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" )
library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt)
cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))
cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant')))
cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant')))
cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant')))
On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Hi Anthony,
Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.
Thank you.
On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico
wrote: might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum
On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico
wrote: thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach -to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:
> Anthony, > > while MonetDB indeed does not support this syntax, > I'm also puzzled what kind of semantic or result you'd expect from > this syntax? > > SELECT > batch_id, > job_count, > SUM(job_count) OVER (PARTITION BY duration) as cumjobs, > SUM(job_count) over () as totjobs, > duration > FROM > test_data ; > > The only possible result would IMHO be a > table with as many rows as the input table test_data has, > columns batch_id, job_count, duration identical to the input, > one additional column cumjobs with the same value, i.e., sum > of all job_count within each duration, replicated for all rows > with same duration, > and one additional column totjobs, with the same value (global > sum of all job_count values) replicated for each row. > > If that's what you want, this should do the job: > > SELECT > batch_id, > job_count, > cumjobs, > totjobs, > duration > FROM > test_data AS td > NATURAL JOIN > ( > SELECT > duration, > SUM(job_count) AS cumjobs > FROM > test_data > GROUP BY > duration > ) AS cj > , > ( > SELECT > SUM(job_count) AS totjobs > FROM > test_data > ) AS tj > ; > > > Best, > Stefan > > ----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia > roberto.cornacchia@gmail.com wrote: > > > No, indeed, this syntax is not supported. > > > > > > On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > > wrote: > > > > > > > > hmm, that gives me the same error.. > > > > SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) > over () AS > > tot_wgt, hp FROM mtcars; > > > > > > Error in .local(conn, statement, ...) : > > Unable to execute statement 'SELECT > > mpg, > > wt, > > SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, > > SUM(wt) over () AS tot_wgt, > > hp > > FROM mtc...'. > > Server says 'ParseException:SQLparser:SELECT: function 'sum' not > found'. > > > > > > sorry if i'm doing something silly.. > > > > > > > > > > On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < > > roberto.cornacchia@gmail.com > wrote: > > > > > > > > I think what you are looking for is SUM(..) OVER(PARTITION BY ..) > > > > Otherwise where would the groups for the sum come from? > > > > Roberto > > > > On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > > wrote: > > > > > > > > hi, monetdb does not support SUM() OVER commands.. does anyone > have a smart > > alternative to implement this? thanks > > > > > > SELECT > > batch_id, > > job_count, > > SUM(job_count) OVER (ORDER BY duration) as cumjobs, > > SUM(job_count) over () as totjobs, > > duration > > FROM > > test_data ; > > > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > > > > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > > > > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > > > > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > -- > | Stefan.Manegold@CWI.nl | DB Architectures (DA) | > | www.CWI.nl/~manegold/ | Science Park 123 (L321) | > | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list >
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (5)
-
Anderson, David B
-
Anthony Damico
-
imad hajj chahine
-
Roberto Cornacchia
-
Stefan Manegold