hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations
# run this line once
# install.packages(c("MonetDBLite","DBI","survey"," Hmisc"))
pctiles <- seq( 0 , 1 , .01 )
library(DBI) # load the DBI package (implements the R-database coding)
library(MonetDBLite) # load MonetDBLite package (creates database files in R)
db <- dbConnect( MonetDBLite() )
# requires a unique identifier because i could not get row_number() working
# https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000
mtcars$id <- seq( nrow( mtcars ) )
# store mtcars
dbWriteTable( db , 'mtcars' , mtcars )
sql <-
"SELECT
t1.id,
t1.hp ,
t1.wt ,
SUM( t2.wt ) AS cumwt
FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp
GROUP BY t1.id , t1.hp , t1.wt"
dbGetQuery( db , sql )
sql2 <-
paste("( SELECT
row_number() over () AS id ,
wt ,
cumwt,
totwt,
hp
FROM
mtcars AS td
NATURAL JOIN
(
" , sql , "
) AS cj
,
(
SELECT
SUM(wt) AS totwt
FROM
mtcars
) AS tj ) ")
dbGetQuery( db , sql2 )
sql3 <-
paste(
"select pcs.percentile ,
min(case when cumwt >= totwt * pcs.percentile then hp end)
from (" , sql2 , ") AS v ,
(" ,
paste( "select" , pctiles , " as percentile" , collapse = " union all " ) ,
" ) pcs group by pcs.percentile" )
library(Hmisc)
library(survey)
ms <- svydesign(~1,data=mtcars,weight=~wt)
cbind( dbGetQuery( db , sql3 ),
wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) ,
as.numeric(svyquantile(~ hp,ms,pctiles)),
as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))
cbind( dbGetQuery( db , sql3 )[26,],
wtd.quantile( mtcars$hp , mtcars$wt , .25 ) ,
as.numeric(svyquantile(~ hp,ms,.25)),
as.numeric(svyquantile(~ hp,ms,.25,method='constant')))
cbind( dbGetQuery( db , sql3 )[51,],
wtd.quantile( mtcars$hp , mtcars$wt , .5 ) ,
as.numeric(svyquantile(~ hp,ms,.5)) ,
as.numeric(svyquantile(~ hp,ms,.5,method='constant')))
cbind( dbGetQuery( db , sql3 )[76,],
wtd.quantile( mtcars$hp , mtcars$wt , .75 ) ,
as.numeric(svyquantile(~ hp,ms,.75)),
as.numeric(svyquantile(~ hp,ms,.75,method='constant')))On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:Hi Anthony,Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile.Now I understand why you need the cumsum for this.Thank you.On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico <ajdamico@gmail.com> wrote:might be able to use this.. will try.. http://stackoverflow.com/questions/2120544/how-to-get-cumula tive-sum On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico <ajdamico@gmail.com> wrote:thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach -to-percentiles On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:Anthony,
while MonetDB indeed does not support this syntax,
I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT
batch_id,
job_count,
SUM(job_count) OVER (PARTITION BY duration) as cumjobs,
SUM(job_count) over () as totjobs,
duration
FROM
test_data ;
The only possible result would IMHO be a
table with as many rows as the input table test_data has,
columns batch_id, job_count, duration identical to the input,
one additional column cumjobs with the same value, i.e., sum
of all job_count within each duration, replicated for all rows
with same duration,
and one additional column totjobs, with the same value (global
sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT
batch_id,
job_count,
cumjobs,
totjobs,
duration
FROM
test_data AS td
NATURAL JOIN
(
SELECT
duration,
SUM(job_count) AS cumjobs
FROM
test_data
GROUP BY
duration
) AS cj
,
(
SELECT
SUM(job_count) AS totjobs
FROM
test_data
) AS tj
;
Best,
Stefan
--
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
> No, indeed, this syntax is not supported.
>
>
> On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
>
>
>
> hmm, that gives me the same error..
>
> SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS
> tot_wgt, hp FROM mtcars;
>
>
> Error in .local(conn, statement, ...) :
> Unable to execute statement 'SELECT
> mpg,
> wt,
> SUM(wt) OVER (PARTITION BY hp) AS cum_wgt,
> SUM(wt) over () AS tot_wgt,
> hp
> FROM mtc...'.
> Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
>
>
> sorry if i'm doing something silly..
>
>
>
>
> On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia <
> roberto.cornacchia@gmail.com > wrote:
>
>
>
> I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
>
> Otherwise where would the groups for the sum come from?
>
> Roberto
>
> On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
>
>
>
> hi, monetdb does not support SUM() OVER commands.. does anyone have a smart
> alternative to implement this? thanks
>
>
> SELECT
> batch_id,
> job_count,
> SUM(job_count) OVER (ORDER BY duration) as cumjobs,
> SUM(job_count) over () as totjobs,
> duration
> FROM
> test_data ;
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list