
hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations # run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc")) pctiles <- seq( 0 , 1 , .01 ) library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R) db <- dbConnect( MonetDBLite() ) # requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) ) # store mtcars dbWriteTable( db , 'mtcars' , mtcars ) sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt" dbGetQuery( db , sql ) sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ") dbGetQuery( db , sql2 ) sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" , paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" ) library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt) cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant'))) cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant'))) cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant'))) cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant'))) On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Hi Anthony,
Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.
Thank you.
On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico
wrote: might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum
On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico
wrote: thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach -to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold
wrote:
Anthony,
while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;
Best, Stefan
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list