hi, the table in my example has 32 recordsOn Tue, Dec 13, 2016 at 9:16 AM, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:Thank You,Can I ask how many records your table has and the execution time, as in my case the first query is taking too much time to execute and consume too much disk space.On Tue, Dec 13, 2016 at 3:21 PM, Anthony Damico <ajdamico@gmail.com> wrote:hi, yes! this R code presents how the solely-within-monetdb weighted quantile calculation compares to three other R implementations of weighted quantile calculations
# run this line once
# install.packages(c("MonetDBLite","DBI","survey","Hmisc"))
pctiles <- seq( 0 , 1 , .01 )
library(DBI) # load the DBI package (implements the R-database coding)
library(MonetDBLite) # load MonetDBLite package (creates database files in R)
db <- dbConnect( MonetDBLite() )
# requires a unique identifier because i could not get row_number() working
# https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000
mtcars$id <- seq( nrow( mtcars ) )
# store mtcars
dbWriteTable( db , 'mtcars' , mtcars )
sql <-
"SELECT
t1.id,
t1.hp ,
t1.wt ,
SUM( t2.wt ) AS cumwt
FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp
GROUP BY t1.id , t1.hp , t1.wt"
dbGetQuery( db , sql )
sql2 <-
paste("( SELECT
row_number() over () AS id ,
wt ,
cumwt,
totwt,
hp
FROM
mtcars AS td
NATURAL JOIN
(
" , sql , "
) AS cj
,
(
SELECT
SUM(wt) AS totwt
FROM
mtcars
) AS tj ) ")
dbGetQuery( db , sql2 )
sql3 <-
paste(
"select pcs.percentile ,
min(case when cumwt >= totwt * pcs.percentile then hp end)
from (" , sql2 , ") AS v ,
(" ,
paste( "select" , pctiles , " as percentile" , collapse = " union all " ) ,
" ) pcs group by pcs.percentile" )
library(Hmisc)
library(survey)
ms <- svydesign(~1,data=mtcars,weight=~wt)
cbind( dbGetQuery( db , sql3 ),
wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) ,
as.numeric(svyquantile(~ hp,ms,pctiles)),
as.numeric(svyquantile(~ hp,ms,pctiles,method='constant')))
cbind( dbGetQuery( db , sql3 )[26,],
wtd.quantile( mtcars$hp , mtcars$wt , .25 ) ,
as.numeric(svyquantile(~ hp,ms,.25)),
as.numeric(svyquantile(~ hp,ms,.25,method='constant')))
cbind( dbGetQuery( db , sql3 )[51,],
wtd.quantile( mtcars$hp , mtcars$wt , .5 ) ,
as.numeric(svyquantile(~ hp,ms,.5)) ,
as.numeric(svyquantile(~ hp,ms,.5,method='constant')))
cbind( dbGetQuery( db , sql3 )[76,],
wtd.quantile( mtcars$hp , mtcars$wt , .75 ) ,
as.numeric(svyquantile(~ hp,ms,.75)),
as.numeric(svyquantile(~ hp,ms,.75,method='constant')))On Tue, Dec 13, 2016 at 7:51 AM, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:Hi Anthony,Did you manage to implement the weighted percentile? I also have a case where i have the value and the count of each value and need to calculate the percentile.Now I understand why you need the cumsum for this.Thank you.On Fri, Apr 29, 2016 at 3:10 PM, Anthony Damico <ajdamico@gmail.com> wrote:might be able to use this.. will try.. http://stackoverflow.com/questions/2120544/how-to-get-cumula tive-sum On Fri, Apr 29, 2016 at 8:01 AM, Anthony Damico <ajdamico@gmail.com> wrote:thanks stefan (and also kai jiang)! i am trying to implement a weighted percentile strategy which uses that code.. i actually don't want the PARTITION BY, i would want it to be ORDER BY instead. the `cumjobs` column needs to be cumulative for a weighted percentile to work-- is there any way to reconfigure that sql command so that the `cumjobs` column cumulatively counts upwards based on the size of the duration column?
http://stackoverflow.com/questions/16450779/another-approach -to-percentiles On Fri, Apr 29, 2016 at 7:17 AM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:Anthony,
while MonetDB indeed does not support this syntax,
I'm also puzzled what kind of semantic or result you'd expect from this syntax?
SELECT
batch_id,
job_count,
SUM(job_count) OVER (PARTITION BY duration) as cumjobs,
SUM(job_count) over () as totjobs,
duration
FROM
test_data ;
The only possible result would IMHO be a
table with as many rows as the input table test_data has,
columns batch_id, job_count, duration identical to the input,
one additional column cumjobs with the same value, i.e., sum
of all job_count within each duration, replicated for all rows
with same duration,
and one additional column totjobs, with the same value (global
sum of all job_count values) replicated for each row.
If that's what you want, this should do the job:
SELECT
batch_id,
job_count,
cumjobs,
totjobs,
duration
FROM
test_data AS td
NATURAL JOIN
(
SELECT
duration,
SUM(job_count) AS cumjobs
FROM
test_data
GROUP BY
duration
) AS cj
,
(
SELECT
SUM(job_count) AS totjobs
FROM
test_data
) AS tj
;
Best,
Stefan
--
----- On Apr 29, 2016, at 9:46 AM, Roberto Cornacchia roberto.cornacchia@gmail.com wrote:
> No, indeed, this syntax is not supported.
>
>
> On 29 April 2016 at 09:41, Anthony Damico < ajdamico@gmail.com > wrote:
>
>
>
> hmm, that gives me the same error..
>
> SELECT mpg, wt, SUM(wt) OVER (PARTITION BY hp) AS cum_wgt, SUM(wt) over () AS
> tot_wgt, hp FROM mtcars;
>
>
> Error in .local(conn, statement, ...) :
> Unable to execute statement 'SELECT
> mpg,
> wt,
> SUM(wt) OVER (PARTITION BY hp) AS cum_wgt,
> SUM(wt) over () AS tot_wgt,
> hp
> FROM mtc...'.
> Server says 'ParseException:SQLparser:SELECT: function 'sum' not found'.
>
>
> sorry if i'm doing something silly..
>
>
>
>
> On Fri, Apr 29, 2016 at 3:37 AM, Roberto Cornacchia <
> roberto.cornacchia@gmail.com > wrote:
>
>
>
> I think what you are looking for is SUM(..) OVER(PARTITION BY ..)
>
> Otherwise where would the groups for the sum come from?
>
> Roberto
>
> On 29 April 2016 at 09:28, Anthony Damico < ajdamico@gmail.com > wrote:
>
>
>
> hi, monetdb does not support SUM() OVER commands.. does anyone have a smart
> alternative to implement this? thanks
>
>
> SELECT
> batch_id,
> job_count,
> SUM(job_count) OVER (ORDER BY duration) as cumjobs,
> SUM(job_count) over () as totjobs,
> duration
> FROM
> test_data ;
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list