SUM() OVER (ORDER BY..)

Anthony Damico

29 Apr 2016 29 Apr '16

7:28 a.m.

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

Attachments:

attachment.html (text/html — 447 bytes)

Show replies by date

Roberto Cornacchia

29 Apr 29 Apr

7:37 a.m.

I think what you are looking for is SUM(..) OVER(PARTITION BY ..) Otherwise where would the groups for the sum come from? Roberto On 29 April 2016 at 09:28, Anthony Damico wrote:

...

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Roberto Cornacchia

7:41 a.m.

Ignore that, my bad. You can also aggregate over all rows, of course. On 29 April 2016 at 09:37, Roberto Cornacchia wrote:

...

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico wrote:

...
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Anthony Damico

7:41 a.m.

hmm, that gives me the same error.. SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars; Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'. sorry if i'm doing something silly.. On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:

...

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico wrote:

...
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Roberto Cornacchia

7:46 a.m.

No, indeed, this syntax is not supported. On 29 April 2016 at 09:41, Anthony Damico wrote:

...

hmm, that gives me the same error..

SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.

sorry if i'm doing something silly..

On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:

...
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico wrote:

...
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Stefan Manegold

11:17 a.m.

Anthony, while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax? SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ; The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row. If that's what you want, this should do the job: SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ; Best, Stefan ----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

...

No, indeed, this syntax is not supported.

On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:

hmm, that gives me the same error..

SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.

sorry if i'm doing something silly..

On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Anthony Damico

12:01 p.m.

thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column? http://stackoverflow.com/questions/16450779/another-approach-to-percentiles On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold wrote:

...

Anthony,

while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?

SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.

If that's what you want, this should do the job:

SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;

Best, Stefan

----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

...
No, indeed, this syntax is not supported.

On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:

hmm, that gives me the same error..

SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.

sorry if i'm doing something silly..

On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Anthony Damico

12:10 p.m.

might be able to use this.. will try.. http://stackoverflow.com/questions/2120544/how-to-get-cumulative-sum On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico wrote:

...

thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?

http://stackoverflow.com/questions/16450779/another-approach-to-percentiles

On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold wrote:

...
Anthony,

while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?

SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.

If that's what you want, this should do the job:

SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;

Best, Stefan

----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

...
No, indeed, this syntax is not supported.

On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:

hmm, that gives me the same error..

SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.

sorry if i'm doing something silly..

On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

imad hajj chahine

13 Dec 13 Dec

12:51 p.m.

Hi Anthony, Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this. Thank you. On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico wrote:

...

might be able to use this.. will try.. http://stackoverflow.com/ questions/2120544/how-to-get-cumulative-sum

On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico wrote:

...
thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?

http://stackoverflow.com/questions/16450779/another- approach-to-percentiles

On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold wrote:

...
Anthony,

while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?

SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.

If that's what you want, this should do the job:

SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;

Best, Stefan

----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

...
No, indeed, this syntax is not supported.

On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:

hmm, that gives me the same error..

SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.

sorry if i'm doing something silly..

On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Anderson, David B

1:08 p.m.

I think you can do this if you enable Python extensions and write a UDF using the numpy library. -----Original Message----- From: imad hajj chahine [imad.hajj.chahine@gmail.commailto:imad.hajj.chahine@gmail.com] Sent: Tuesday, December 13, 2016 07:53 AM Eastern Standard Time To: Communication channel for MonetDB users Subject: Re: SUM() OVER (ORDER BY..) Hi Anthony, Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this. Thank you. On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico mailto:ajdamico@gmail.com> wrote: might be able to use this.. will try.. http://stackoverflow.com/questions/2120544/how-to-get-cumulative-sum https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_2120544_how-2Dto-2Dget-2Dcumulative-2Dsum&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=giR-7rVohfWclrsQvsaFz_xNv7dRRBMCf8jgv6vLGTk&e= On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico mailto:ajdamico@gmail.com> wrote: thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column? http://stackoverflow.com/questions/16450779/another-approach-to-percentiles https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_16450779_another-2Dapproach-2Dto-2Dpercentiles&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=Y7s994jKVtcO7_SC-pDTKOdfBmJ84yVUjCfkpHJ8F7A&e= On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold mailto:Stefan.Manegold@cwi.nl> wrote: Anthony, while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax? SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ; The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row. If that's what you want, this should do the job: SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ; Best, Stefan ----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.commailto:roberto.cornacchia@gmail.com wrote:

...

No, indeed, this syntax is not supported.

On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.commailto:ajdamico@gmail.com > wrote:

hmm, that gives me the same error..

SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.

sorry if i'm doing something silly..

On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.commailto:roberto.cornacchia@gmail.com > wrote:

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.commailto:ajdamico@gmail.com > wrote:

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=

_______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=

_______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=

_______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/https://urldefense.proofpoint.com/v2/url?u=http-3A__www.CWI.nl_-7Emanegold_&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=QlLs8qlcZyNWYvrbRa9bQMBrtU1kbzSb7C3ISJMtSyk&e= | Science Park 123 (L321) | | +31 (0)20 592-4212tel:%2B31%20%280%2920%20592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e= _______________________________________________ users-list mailing list users-list@monetdb.orgmailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=SaJBkey4RBZyy8ZfixCnr_XyXpxsK2h82sLbLs6hD6A&s=wWF8fSMLcV4AE6-ZCW-66LHml2xDyF7dBdu8_H2H5DU&e=

Anthony Damico

1:21 p.m.

hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations # run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc")) pctiles <- seq( 0 , 1 , .01 ) library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R) db <- dbConnect( MonetDBLite() ) # requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) ) # store mtcars dbWriteTable( db , 'mtcars' , mtcars ) sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt" dbGetQuery( db , sql ) sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ") dbGetQuery( db , sql2 ) sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" , paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" ) library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt) cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant'))) cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant'))) cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant'))) cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant'))) On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:

...

Hi Anthony,

Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.

Thank you.

On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico wrote:

...
might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum

On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico wrote:

...
thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?

http://stackoverflow.com/questions/16450779/another-approach -to-percentiles

On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold
...
wrote:

...
Anthony,

while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?

SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.

If that's what you want, this should do the job:

SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;

Best, Stefan

----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

...
No, indeed, this syntax is not supported.

On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:

hmm, that gives me the same error..

SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.

sorry if i'm doing something silly..

On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

imad hajj chahine

2:16 p.m.

Thank You, Can I ask how many records your table has and the execution time, as in my case the first query is taking too much time to execute and consume too much disk space. On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico wrote:

...

hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations

# run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))

pctiles <- seq( 0 , 1 , .01 )

library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R)

db <- dbConnect( MonetDBLite() )

# requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) )

# store mtcars dbWriteTable( db , 'mtcars' , mtcars )

sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt"

dbGetQuery( db , sql )

sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ")

dbGetQuery( db , sql2 )

sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" ,

paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" )

library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt)

cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))

cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant')))

cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant')))

cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant')))

On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:

...
Hi Anthony,

Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.

Thank you.

On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico wrote:

...
might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum

On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico wrote:

...
thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?

http://stackoverflow.com/questions/16450779/another-approach -to-percentiles

On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:

...
Anthony,

while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?

SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.

If that's what you want, this should do the job:

SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;

Best, Stefan

----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

...
No, indeed, this syntax is not supported.

On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:

hmm, that gives me the same error..

SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.

sorry if i'm doing something silly..

On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:

I think what you are looking for is SUM(..) OVER(PARTITION BY ..)

Otherwise where would the groups for the sum come from?

Roberto

On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:

hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks

SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Anthony Damico

2:19 p.m.

hi, the table in my example has 32 records On Tue, Dec 13, 2016 at 9:16 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:

...

Thank You,

Can I ask how many records your table has and the execution time, as in my case the first query is taking too much time to execute and consume too much disk space.

On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico wrote:

...
hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations

# run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))

pctiles <- seq( 0 , 1 , .01 )

library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R)

db <- dbConnect( MonetDBLite() )

# requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) )

# store mtcars dbWriteTable( db , 'mtcars' , mtcars )

sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt"

dbGetQuery( db , sql )

sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ")

dbGetQuery( db , sql2 )

sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" ,

paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" )

library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt)

cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))

cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant')))

cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant')))

cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant')))

On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:

...
Hi Anthony,

Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.

Thank you.

On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico wrote:

...
might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum

On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico wrote:

...
thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?

http://stackoverflow.com/questions/16450779/another-approach -to-percentiles

On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:

...
Anthony,

while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?

SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;

The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.

If that's what you want, this should do the job:

SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;

Best, Stefan

----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

> No, indeed, this syntax is not supported. > > > On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote: > > > > hmm, that gives me the same error.. > > SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS > tot_wgt, hp FROM mtcars; > > > Error in .local(conn, statement, ...) : > Unable to execute statement 'SELECT > mpg, > wt, > SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, > SUM(wt) over () AS tot_wgt, > hp > FROM mtc...'. > Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'. > > > sorry if i'm doing something silly.. > > > > > On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < > roberto.cornacchia@gmail.com > wrote: > > > > I think what you are looking for is SUM(..) OVER(PARTITION BY ..) > > Otherwise where would the groups for the sum come from? > > Roberto > > On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote: > > > > hi, monetdb does not support SUM() OVER commands.. does anyone have a smart > alternative to implement this? thanks > > > SELECT > batch_id, > job_count, > SUM(job_count) OVER (ORDER BY duration) as cumjobs, > SUM(job_count) over () as totjobs, > duration > FROM > test_data ; > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

imad hajj chahine

2:42 p.m.

I think its not the optimal way to calculate the running sum, as you will have performance issues on large datasets. On Tue, Dec 13, 2016 at 4:19 PM, Anthony Damico wrote:

...

hi, the table in my example has 32 records

On Tue, Dec 13, 2016 at 9:16 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:

...
Thank You,

Can I ask how many records your table has and the execution time, as in my case the first query is taking too much time to execute and consume too much disk space.

On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico wrote:

...
hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations

# run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))

pctiles <- seq( 0 , 1 , .01 )

library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R)

db <- dbConnect( MonetDBLite() )

# requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) )

# store mtcars dbWriteTable( db , 'mtcars' , mtcars )

sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt"

dbGetQuery( db , sql )

sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ")

dbGetQuery( db , sql2 )

sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" ,

paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" )

library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt)

cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))

cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant')))

cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant')))

cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant')))

On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:

...
Hi Anthony,

Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.

Thank you.

On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico wrote:

...
might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum

On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico wrote:

...
thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?

http://stackoverflow.com/questions/16450779/another-approach -to-percentiles

On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:

> Anthony, > > while MonetDB indeed does not support this syntax, > I'm also puzzled what kind of semantic or result you'd expect from > this syntax? > > SELECT > batch_id, > job_count, > SUM(job_count) OVER (PARTITION BY duration) as cumjobs, > SUM(job_count) over () as totjobs, > duration > FROM > test_data ; > > The only possible result would IMHO be a > table with as many rows as the input table test_data has, > columns batch_id, job_count, duration identical to the input, > one additional column cumjobs with the same value, i.e., sum > of all job_count within each duration, replicated for all rows > with same duration, > and one additional column totjobs, with the same value (global > sum of all job_count values) replicated for each row. > > If that's what you want, this should do the job: > > SELECT > batch_id, > job_count, > cumjobs, > totjobs, > duration > FROM > test_data AS td > NATURAL JOIN > ( > SELECT > duration, > SUM(job_count) AS cumjobs > FROM > test_data > GROUP BY > duration > ) AS cj > , > ( > SELECT > SUM(job_count) AS totjobs > FROM > test_data > ) AS tj > ; > > > Best, > Stefan > > ----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia > roberto.cornacchia@gmail.com wrote: > > > No, indeed, this syntax is not supported. > > > > > > On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > > wrote: > > > > > > > > hmm, that gives me the same error.. > > > > SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) > over () AS > > tot_wgt, hp FROM mtcars; > > > > > > Error in .local(conn, statement, ...) : > > Unable to execute statement 'SELECT > > mpg, > > wt, > > SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, > > SUM(wt) over () AS tot_wgt, > > hp > > FROM mtc...'. > > Server says 'ParseException:SQLparser:SELECT: function 'sum' not > found'. > > > > > > sorry if i'm doing something silly.. > > > > > > > > > > On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < > > roberto.cornacchia@gmail.com > wrote: > > > > > > > > I think what you are looking for is SUM(..) OVER(PARTITION BY ..) > > > > Otherwise where would the groups for the sum come from? > > > > Roberto > > > > On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > > wrote: > > > > > > > > hi, monetdb does not support SUM() OVER commands.. does anyone > have a smart > > alternative to implement this? thanks > > > > > > SELECT > > batch_id, > > job_count, > > SUM(job_count) OVER (ORDER BY duration) as cumjobs, > > SUM(job_count) over () as totjobs, > > duration > > FROM > > test_data ; > > > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > > > > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > > > > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > > > > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > -- > | Stefan.Manegold@CWI.nl | DB Architectures (DA) | > | www.CWI.nl/~manegold/ | Science Park 123 (L321) | > | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list >

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

3008

Age (days ago)

3236

Last active (days ago)

List overview

Download

13 comments

5 participants

participants (5)

Anderson, David B
Anthony Damico
imad hajj chahine
Roberto Cornacchia
Stefan Manegold

SUM() OVER (ORDER BY..)

tags

participants (5)