Thanks. I will look at these.
My problem is that my data set is ~1 billion records, and I need to group by several dimensions (so I have ~2000 composite factors) computing weighted averages for some fields along with weighted percentiles (0,25,50,75,100) for a smaller subset of the fields. Currently, I’m looking at the MonetDB internals to see if this is possible.
A “perfect” query would look like:
SELECT
sum(a*w)/sum(w) as wa_a,
sum(b*w)/sum(w) as wa_b,
weightedpercentile(a,w,0.25) as wp25_a,
weightedpercentile(a,w,0.50) as wp50_a
FROM
mytable
GROUP BY c,d HAVING sum(w) > 0
From: users-list [mailto:users-list-bounces+david.b.anderson=citi.com@monetdb.org] On Behalf Of Anthony Damico
Sent: Friday, April 29, 2016 9:13 AM
To: Communication channel for MonetDB users
Subject: Re: Support for multi-column aggregate functions
two alternatives (which might require some disentangling) are
install.packages("sqlsurvey", repos="http://R-Forge.R-project.orghttps://urldefense.proofpoint.com/v2/url?u=http-3A__R-2DForge.R-2Dproject.org&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=jRfVH1kUI4OVngyLt6WSKZEFTBpSeL1GOKOpjRQQAO4&e=")
sqlsurvey:::svyquantile.sqlrepsurvey
sqlsurvey:::svyquantile.sqlsurvey
On Fri, Apr 29, 2016 at 8:59 AM, Anthony Damico mailto:ajdamico@gmail.com> wrote:
david, these queries get pretty close to wtd.quantile and svyquantile on a 32-record data set. if you have lots of records, i think it'd perform even better?
pctiles <- seq( 0 , 1 , .01 )
library(DBI) # load the DBI package (implements the R-database coding)
library(MonetDB.R) # load the MonetDB.R package (connects r to a monet database)
library(MonetDBLite) # load MonetDBLite package (creates database files in R)
db <- dbConnect( MonetDBLite() )
# requires a unique identifier because i could not get row_number() working
# https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_bugzilla_show-5Fbug.cgi-3Fid-3D4000&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=OdhZqkKiM9K6szPKPt2QsjZQ423hv1Fawe-p_zS5EyM&e=
mtcars$id <- seq( nrow( mtcars ) )
# store mtcars
dbWriteTable( db , 'mtcars' , mtcars )
sql <-
"SELECT
t1.idhttps://urldefense.proofpoint.com/v2/url?u=http-3A__t1.id&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=4-JSp4ejJ7-Q1lgKW66qDGHvMTaPfTcb-Br0MuJaidM&e=,
t1.hp ,
t1.wt ,
SUM( t2.wt ) AS cumwt
FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp
GROUP BY t1.idhttps://urldefense.proofpoint.com/v2/url?u=http-3A__t1.id&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=4-JSp4ejJ7-Q1lgKW66qDGHvMTaPfTcb-Br0MuJaidM&e= , t1.hp , t1.wt"
dbGetQuery( db , sql )
sql2 <-
paste("( SELECT
row_number() over () AS id ,
wt ,
cumwt,
totwt,
hp
FROM
mtcars AS td
NATURAL JOIN
(
" , sql , "
) AS cj
,
(
SELECT
SUM(wt) AS totwt
FROM
mtcars
) AS tj ) ")
dbGetQuery( db , sql2 )
sql3 <-
paste(
"select pcs.percentile ,
min(case when cumwt >= totwt * pcs.percentile then hp end)
from (" , sql2 , ") AS v ,
(" ,
paste( "select" , pctiles , " as percentile" , collapse = " union all " ) ,
" ) pcs group by pcs.percentile" )
library(Hmisc)
library(survey)
ms <- svydesign(~1,data=mtcars,weight=~wt)
cbind( dbGetQuery( db , sql3 ),
wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) ,
as.numeric(svyquantile(~ hp,ms,pctiles)))
On Wed, Apr 27, 2016 at 11:07 AM, Anderson, David B mailto:david.b.anderson@citi.com> wrote:
I was hoping to use a native/low level implementation for high performance (my data set is on the order of 2. It looks like multiple column aggregates are supported (corr & covar) so in theory I could add a new function.
Poking around in the archives, it looks like I need to add a new MAL and link it to a C implementation.
https://www.monetdb.org/pipermail/developers-list/2015-December/004553.htmlhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_pipermail_developers-2Dlist_2015-2DDecember_004553.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=cMiWT4sRgzhb9yWo1raPOchSYFxsA4Bu4rH6nZsXv3I&e=
Are there any examples on how to do this?
From: users-list [mailto:users-list-bounces+david.b.andersonmailto:users-list-bounces%2Bdavid.b.anderson=citi.com@monetdb.orgmailto:citi.com@monetdb.org] On Behalf Of Anthony Damico
Sent: Wednesday, April 27, 2016 9:47 AM
To: Communication channel for MonetDB users
Subject: Re: Support for multi-column aggregate functions
possible to translate one of these?
http://stackoverflow.com/questions/16450779/another-approach-to-percentileshttps://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_16450779_another-2Dapproach-2Dto-2Dpercentiles&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=VPKaqqy1Sox6hJP3RD8x9lYAWs4sDr5tN4jtb0tElwI&s=4T2ole-AWFy74ddKmo_2X7-1Okti6t0SGcrP3Pg1ttw&e=
On Wednesday, April 27, 2016, Anderson, David B mailto:david.b.anderson@citi.com> wrote:
I am really looking for an implementation of a weighted percentile within a group.
-----Original Message-----
From: Anderson, David B [ICG-MKTS]
Sent: Wednesday, April 27, 2016 9:18 AM
To: users-list@monetdb.orgmailto:users-list@monetdb.org
Subject: Support for multi-column aggregate functions
All,
Does the current version of MonetDB support multi-column aggregate functions? I need to create a function which returns the "quantiles" of one column using another column as a weighting. It looks like PostgreSQL supports multi-column aggregates as of 8.2.
I could write an embedded R function to compute this, but want to know if it is even possible to pass in multiple columns.
Thanks,
Dave
_______________________________________________
users-list mailing list
users-list@monetdb.orgmailto:users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=VPKaqqy1Sox6hJP3RD8x9lYAWs4sDr5tN4jtb0tElwI&s=T43LwaiW3foyJ1zGqAUQo9P1My84NzY92cS5wHK1xeA&e=
_______________________________________________
users-list mailing list
users-list@monetdb.orgmailto:users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=8_GYjk1edsyLJlNaxMxYxBJsviF3JXYvwDK42uy5KWU&m=ej5p-XcFExUgP_KjtEotq-tyDpwFVxJ-xdoHS4LGoh4&s=FvzQxF2GuIV0fh_pivsgjq0sdmigvtu-gXDX9TmH4ek&e=