if you are able to figure out how to define that function, i would appreciate if you could share it.  thanks

On Fri, Apr 29, 2016 at 2:43 PM, Anderson, David B <david.b.anderson@citi.com> wrote:

Thanks. I will look at these.

 

My problem is that my data set is ~1 billion records, and I need to group by several dimensions (so I have ~2000 composite factors) computing weighted averages for some fields along with weighted percentiles (0,25,50,75,100) for a smaller subset of the fields. Currently, I’m looking at the MonetDB internals to see if this is possible.

 

A “perfect” query would look like:

 

SELECT

sum(a*w)/sum(w) as wa_a,

sum(b*w)/sum(w) as wa_b,

weightedpercentile(a,w,0.25) as wp25_a,

weightedpercentile(a,w,0.50) as wp50_a

FROM

mytable

GROUP BY c,d HAVING sum(w) > 0

 

 

 

From: users-list [mailto:users-list-bounces+david.b.anderson=citi.com@monetdb.org] On Behalf Of Anthony Damico
Sent: Friday, April 29, 2016 9:13 AM


To: Communication channel for MonetDB users
Subject: Re: Support for multi-column aggregate functions

 

two alternatives (which might require some disentangling) are

install.packages("sqlsurvey", repos="http://R-Forge.R-project.org")

sqlsurvey:::svyquantile.sqlrepsurvey

sqlsurvey:::svyquantile.sqlsurvey



 

On Fri, Apr 29, 2016 at 8:59 AM, Anthony Damico <ajdamico@gmail.com> wrote:

david, these queries get pretty close to wtd.quantile and svyquantile on a 32-record data set.  if you have lots of records, i think it'd perform even better?




pctiles <- seq( 0 , 1 , .01 )

library(DBI)            # load the DBI package (implements the R-database coding)
library(MonetDB.R)        # load the MonetDB.R package (connects r to a monet database)
library(MonetDBLite)    # load MonetDBLite package (creates database files in R)

db <- dbConnect( MonetDBLite() )

# requires a unique identifier because i could not get row_number() working
# https://www.monetdb.org/bugzilla/show_bug.cgi?id=4000
mtcars$id <- seq( nrow( mtcars ) )

# store mtcars
dbWriteTable( db , 'mtcars' , mtcars )


sql <-
    "SELECT
        t1.id,
        t1.hp ,
        t1.wt ,
        SUM( t2.wt ) AS cumwt
    FROM mtcars t1 INNER JOIN mtcars t2 ON t1.hp >= t2.hp
    GROUP BY t1.id , t1.hp , t1.wt"

dbGetQuery( db , sql )

sql2 <-
    paste("( SELECT
            row_number() over () AS id ,
            wt ,
            cumwt,
            totwt,
            hp
    FROM
            mtcars AS td
            NATURAL JOIN
            (
                    " , sql , "
            ) AS cj
            ,
            (
                    SELECT
                            SUM(wt) AS totwt
                    FROM
                            mtcars
            ) AS tj ) ")


dbGetQuery( db , sql2 )

sql3 <-
    paste(
        "select pcs.percentile ,
            min(case when cumwt >= totwt * pcs.percentile then hp end)
            from (" , sql2 , ") AS v ,
     (" ,
     
     paste( "select" , pctiles , " as percentile" , collapse = " union all " ) ,
     " ) pcs group by pcs.percentile" )


library(Hmisc)
library(survey)
ms <- svydesign(~1,data=mtcars,weight=~wt)

cbind( dbGetQuery( db , sql3 ),
wtd.quantile( mtcars$hp , mtcars$wt , pctiles ) ,
as.numeric(svyquantile(~ hp,ms,pctiles)))







 

On Wed, Apr 27, 2016 at 11:07 AM, Anderson, David B <david.b.anderson@citi.com> wrote:

I was hoping to use a native/low level implementation for high performance (my data set is on the order of 2.  It looks like multiple column aggregates are supported (corr & covar) so in theory I could add a new function.

 

Poking around in the archives, it looks like I need to add a new MAL and link it to a C implementation.

 

https://www.monetdb.org/pipermail/developers-list/2015-December/004553.html

 

Are there any examples on how to do this?

 

From: users-list [mailto:users-list-bounces+david.b.anderson=citi.com@monetdb.org] On Behalf Of Anthony Damico
Sent: Wednesday, April 27, 2016 9:47 AM
To: Communication channel for MonetDB users
Subject: Re: Support for multi-column aggregate functions

 

possible to translate one of these?

 



On Wednesday, April 27, 2016, Anderson, David B <david.b.anderson@citi.com> wrote:

I am really looking for an implementation of a weighted percentile within a group.

-----Original Message-----
From: Anderson, David B [ICG-MKTS]
Sent: Wednesday, April 27, 2016 9:18 AM
To: users-list@monetdb.org
Subject: Support for multi-column aggregate functions

All,

Does the current version of MonetDB support multi-column aggregate functions? I need to create a function which returns the "quantiles" of one column using another column as a weighting. It looks like PostgreSQL supports multi-column aggregates as of 8.2.

I could write an embedded R function to compute this, but want to know if it is even possible to pass in multiple columns.

Thanks,
Dave
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list


_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

 

 


_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list