I think its not the optimal way to calculate the running sum, as you will have performance issues on large datasets.

On Tue, Dec 13, 2016 at 4:19 PM, Anthony Damico <ajdamico@gmail.com> wrote:
hi, the table in my example has 32 records

On Tue, Dec 13, 2016 at 9:16 AM, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
Thank You,

Can I ask how many records your table has and the execution time, as in my case the first query is taking too much time to execute and consume too much disk space.

On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico <ajdamico@gmail.com> wrote:
hi, yes!  this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations




    # run this line once
    # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))


    pctiles <- seq( 0 , 1 , .01 )

    library(DBI)            # load the DBI package (implements the R-database coding)
    library(MonetDBLite)    # load MonetDBLite package (creates database files in R)

    db <- dbConnect( MonetDBLite() )

    # requires a unique identifier because i could not get row_number() working
    # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000
    mtcars$id <- seq( nrow( mtcars ) )

    # store mtcars
    dbWriteTable( db , 'mtcars' , mtcars )


    sql <-
        "SELECT
            t1.id,
            t1.hp ,
            t1.wt ,
            SUM( t2.wt ) AS cumwt
        FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp
        GROUP BY t1.id , t1.hp , t1.wt"

    dbGetQuery( db , sql )

    sql2 <-
        paste("( SELECT
                row_number() over () AS id ,
                wt ,
                cumwt,
                totwt,
                hp
        FROM
                mtcars AS td
                NATURAL JOIN
                (
                        " , sql , "
                ) AS cj
                ,
                (
                        SELECT
                                SUM(wt) AS totwt
                        FROM
                                mtcars
                ) AS tj ) ")


    dbGetQuery( db , sql2 )

    sql3 <-
        paste(
            "select pcs.percentile ,
                min(case when cumwt >= totwt * pcs.percentile then hp end)
                from (" , sql2 , ") AS v ,
         (" ,
         
         paste( "select" , pctiles , " as percentile" , collapse = " union all " ) ,
         " ) pcs group by pcs.percentile" )


    library(Hmisc)
    library(survey)
    ms <- svydesign(~1,data=mtcars,weight=~wt)


    cbind( dbGetQuery( db , sql3 ),
    wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) ,
    as.numeric(svyquantile(~ hp,ms,pctiles)),
    as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))





    cbind( dbGetQuery( db , sql3 )[26,],
    wtd.quantile( mtcars$hp , mtcars$wt , .25 ) ,
    as.numeric(svyquantile(~ hp,ms,.25)),
    as.numeric(svyquantile(~ hp,ms,.25,method='constant')))


    cbind( dbGetQuery( db , sql3 )[51,],
    wtd.quantile( mtcars$hp , mtcars$wt , .5 ) ,
    as.numeric(svyquantile(~ hp,ms,.5)) ,
    as.numeric(svyquantile(~ hp,ms,.5,method='constant')))


    cbind( dbGetQuery( db , sql3 )[76,],
    wtd.quantile( mtcars$hp , mtcars$wt , .75 ) ,
    as.numeric(svyquantile(~ hp,ms,.75)),
    as.numeric(svyquantile(~ hp,ms,.75,method='constant')))



On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
Hi Anthony,

Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile.
Now I understand why you need the cumsum for this.

Thank you.



On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico <ajdamico@gmail.com> wrote:
might be able to use this.. will try..  http://stackoverflow.com/questions/2120544/how-to-get-cumulative-sum


On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico <ajdamico@gmail.com> wrote:
thanks stefan (and also kai jiang)!  i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead.  the `cumjobs` column needs to be cumulative for a weighted percentile to work--   is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?

http://stackoverflow.com/questions/16450779/another-approach-to-percentiles

On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Anthony,

while MonetDB indeed does not support this syntax,
I'm also puzzled what kind of semantic or result you'd expect from this syntax?

    SELECT
        batch_id,
        job_count,
        SUM(job_count) OVER (PARTITION BY duration) as cumjobs,
        SUM(job_count) over () as totjobs,
        duration
    FROM
        test_data ;

The only possible result would IMHO be a
table with as many rows as the input table test_data has,
columns batch_id, job_count, duration identical to the input,
one additional column cumjobs with the same value, i.e., sum
of all job_count within each duration, replicated for all rows
with same duration,
and one additional column totjobs, with the same value (global
sum of all job_count values) replicated for each row.

If that's what you want, this should do the job:

SELECT
        batch_id,
        job_count,
        cumjobs,
        totjobs,
        duration
FROM
        test_data AS td
        NATURAL JOIN
        (
                SELECT
                        duration,
                        SUM(job_count) AS cumjobs
                FROM
                        test_data
                GROUP BY
                        duration
        ) AS cj
        ,
        (
                SELECT
                        SUM(job_count) AS totjobs
                FROM
                        test_data
        ) AS tj
;


Best,
Stefan

----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:

> No, indeed, this syntax is not supported.
>
>
> On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
>
>
>
> hmm, that gives me the same error..
>
> SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS
> tot_wgt, hp FROM mtcars;
>
>
> Error in .local(conn, statement, ...) :
> Unable to execute statement 'SELECT
> mpg,
> wt,
> SUM(wt) OVER (PARTITION BY hp) AS cum_wgt,
> SUM(wt) over () AS tot_wgt,
> hp
> FROM mtc...'.
> Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
>
>
> sorry if i'm doing something silly..
>
>
>
>
> On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia <
> roberto.cornacchia@gmail.com > wrote:
>
>
>
> I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
>
> Otherwise where would the groups for the sum come from?
>
> Roberto
>
> On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
>
>
>
> hi, monetdb does not support SUM() OVER commands.. does anyone have a smart
> alternative to implement this? thanks
>
>
> SELECT
> batch_id,
> job_count,
> SUM(job_count) OVER (ORDER BY duration) as cumjobs,
> SUM(job_count) over () as totjobs,
> duration
> FROM
> test_data ;
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list

--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list