Thank You,
Can I ask how many records your table has and the execution time, as in my
case the first query is taking too much time to execute and consume too
much disk space.
On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico
hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations
# run this line once # install.packages(c("MonetDBLite","DBI","survey","Hmisc"))
pctiles <- seq( 0 , 1 , .01 )
library(DBI) # load the DBI package (implements the R-database coding) library(MonetDBLite) # load MonetDBLite package (creates database files in R)
db <- dbConnect( MonetDBLite() )
# requires a unique identifier because i could not get row_number() working # https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000 mtcars$id <- seq( nrow( mtcars ) )
# store mtcars dbWriteTable( db , 'mtcars' , mtcars )
sql <- "SELECT t1.id, t1.hp , t1.wt , SUM( t2.wt ) AS cumwt FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp GROUP BY t1.id , t1.hp , t1.wt"
dbGetQuery( db , sql )
sql2 <- paste("( SELECT row_number() over () AS id , wt , cumwt, totwt, hp FROM mtcars AS td NATURAL JOIN ( " , sql , " ) AS cj , ( SELECT SUM(wt) AS totwt FROM mtcars ) AS tj ) ")
dbGetQuery( db , sql2 )
sql3 <- paste( "select pcs.percentile , min(case when cumwt >= totwt * pcs.percentile then hp end) from (" , sql2 , ") AS v , (" ,
paste( "select" , pctiles , " as percentile" , collapse = " union all " ) , " ) pcs group by pcs.percentile" )
library(Hmisc) library(survey) ms <- svydesign(~1,data=mtcars,weight=~wt)
cbind( dbGetQuery( db , sql3 ), wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) , as.numeric(svyquantile(~ hp,ms,pctiles)), as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))
cbind( dbGetQuery( db , sql3 )[26,], wtd.quantile( mtcars$hp , mtcars$wt , .25 ) , as.numeric(svyquantile(~ hp,ms,.25)), as.numeric(svyquantile(~ hp,ms,.25,method='constant')))
cbind( dbGetQuery( db , sql3 )[51,], wtd.quantile( mtcars$hp , mtcars$wt , .5 ) , as.numeric(svyquantile(~ hp,ms,.5)) , as.numeric(svyquantile(~ hp,ms,.5,method='constant')))
cbind( dbGetQuery( db , sql3 )[76,], wtd.quantile( mtcars$hp , mtcars$wt , .75 ) , as.numeric(svyquantile(~ hp,ms,.75)), as.numeric(svyquantile(~ hp,ms,.75,method='constant')))
On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine < imad.hajj.chahine@gmail.com> wrote:
Hi Anthony,
Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile. Now I understand why you need the cumsum for this.
Thank you.
On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico
wrote: might be able to use this.. will try.. http://stackoverflow.com/quest ions/2120544/how-to-get-cumulative-sum
On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico
wrote: thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach -to-percentiles
On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:
Anthony,
while MonetDB indeed does not support this syntax, I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT batch_id, job_count, SUM(job_count) OVER (PARTITION BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
The only possible result would IMHO be a table with as many rows as the input table test_data has, columns batch_id, job_count, duration identical to the input, one additional column cumjobs with the same value, i.e., sum of all job_count within each duration, replicated for all rows with same duration, and one additional column totjobs, with the same value (global sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT batch_id, job_count, cumjobs, totjobs, duration FROM test_data AS td NATURAL JOIN ( SELECT duration, SUM(job_count) AS cumjobs FROM test_data GROUP BY duration ) AS cj , ( SELECT SUM(job_count) AS totjobs FROM test_data ) AS tj ;
Best, Stefan
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
No, indeed, this syntax is not supported.
On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
hmm, that gives me the same error..
SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtcars;
Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS tot_wgt, hp FROM mtc...'. Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
sorry if i'm doing something silly..
On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:
I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
Otherwise where would the groups for the sum come from?
Roberto
On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
hi, monetdb does not support SUM() OVER commands.. does anyone have a smart alternative to implement this? thanks
SELECT batch_id, job_count, SUM(job_count) OVER (ORDER BY duration) as cumjobs, SUM(job_count) over () as totjobs, duration FROM test_data ;
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list