I am running attached script, which I pieced together from your emails. This does not run, it produces the following error:
Error running R expression. Error message: Error in t.test.formula(arg1 ~ arg2) :
grouping factor must have exactly 2 levels
Calls: as.data.frame -> <Anonymous> -> t.test -> t.test.formula
In general, for aggregations, the R code needs to return a vector with a single entry for each group.
Best,
Hannes
> On 19 Jun 2015, at 09:44, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
>
> Thanks, if you need more information from my side let me know.
>
> Also can you answer me on the following when i use 2 functions one to retrieve Pvalue and one for the stat value:
> Am I doing the correct way to retrieve the stat and p-value, is the ttest get executed twice in this case? is there a more optimal way where i can retrieve both information.
>
> On Fri, Jun 19, 2015 at 10:39 AM, Hannes Mühleisen <Hannes.Muehleisen@cwi.nl> wrote:
> Looking into this at the moment.
>
> > On 16 Jun 2015, at 09:56, Dimitar Nedev <dimitar.nedev@monetdbsolutions.com> wrote:
> >
> > Hannes, do you know if the STD_DEV is the issue there or if the queries should actually yield different results?
> >
> > Dimitar
> >
> >> On 2015-Jun-15, at 13:28 , imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> Shouldn't we have the same result from running the first query and the second query?
> >> after all its an aggregate function and should return the test for each group.
> >>
> >> I checked that monetdb had problem with running group by with statistics functions like STD_DEV, are these 2 issues related?
> >>
> >> Thank you.
> >>
> >>
> >>
> >>
> >> On Mon, Jun 15, 2015 at 1:52 PM, Dimitar Nedev <dimitar.nedev@monetdbsolutions.com> wrote:
> >> Hi Imad,
> >>
> >> Here is the output I got from running the queries. Note, this might not be entirely representative since I did clean-up some data before loading it.
> >>
> >> sql>select "2-Age", "tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" is not null group by "2-Age";
> >> +---------+--------------------------+
> >> | 2-Age | L1 |
> >> +=========+==========================+
> >> | [10-20] | 0.89311103869734754 |
> >> +---------+--------------------------+
> >> 3 tuples (3.218ms)
> >>
> >> sql>select '[10-20]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '[10-20]'
> >> more>union all
> >> more>select '[21-30]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '[21-30]'
> >> more>union all
> >> more>select '[> 30]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '> 30';
> >> +---------+--------------------------+
> >> | 2-Age | L1 |
> >> +=========+==========================+
> >> | [10-20] | 0.14150742355349472 |
> >> | [21-30] | 0.32872830865003566 |
> >> | [> 30] | 0.37127075363219841 |
> >> +---------+--------------------------+
> >> 3 tuples (8.825ms)
> >>
> >>
> >> sql>select "tone"."ttest2samplesStatistic"("9-Score", "4-Gender") as "stat", "tone"."ttest2samples"("9-Score", "4-Gender") as "pvalue" from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" group by "2-Age";
> >> +--------------------------+--------------------------+
> >> | stat | pvalue |
> >> +==========================+==========================+
> >> | -2.1588506639013736 | 0.032555033970368859 |
> >> +--------------------------+--------------------------+
> >> 1 tuple (13.287ms)
> >>
> >>
> >> I dropped the aggregate with:
> >> sql>DROP AGGREGATE "tone"."ttest2samplesStatistic";
> >> operation successful (2.049ms)
> >>
> >>
> >> Best regards,
> >> Dimitar
> >>
> >>> On 2015-Jun-12, at 00:46 , imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
> >>>
> >>> Thank you Dimitar,
> >>>
> >>> I was using Squirrel to connect from my host machine to the VM.
> >>>
> >>> Now when I connect using SSH-client and run the short group by query it works but the result is not similar to the union queries (PS: i also run it in the VM direclty and had the same result)
> >>>
> >>>
> >>> +---------+--------------------------+
> >>> | 2-Age | pval |
> >>> +=========+==========================+
> >>> | [21-30] | 0.99996611809271041 |
> >>> +---------+--------------------------+
> >>> 3 tuples (2.638ms)
> >>>
> >>>
> >>> What i am trying to achieve is to run a one sample ttest for all the age categories, and it seems that the group by is ignored.
> >>>
> >>> I have also created 2 functions for the two sample ttest:
> >>>
> >>> CREATE AGGREGATE "tone"."ttest2samples"(arg1 double, arg2 varchar(250)) RETURNS double LANGUAGE R { t.test(arg1 ~ arg2)$p.value };
> >>> CREATE AGGREGATE "tone"."ttest2samplesStatistic"(arg1 double, arg2 varchar(250)) RETURNS double LANGUAGE R { t.test(arg1 ~ arg2)$statistic };
> >>>
> >>> select "tone"."ttest2samplesStatistic"("9-Score", "4-Gender") as "stat", "tone"."ttest2samples"("9-Score", "4-Gender") as "pvalue" from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" group by "2-Age"
> >>>
> >>> Am I doing the correct way to retrieve the stat and p-value, is the ttest get executed twice in this case? is there a more optimal way where i can retrieve both information. Please note that I also have the same grouping issue as before.
> >>>
> >>> Do you have the same behavior, or I am missing something in the creation of these functions.
> >>>
> >>> A final question, how do u drop a R function? i am trying to execute the following:
> >>> drop all function "tone"."ttest2samplesStat"
> >>>
> >>>
> >>> Thank you.
> >>>
> >>>
> >>>
> >>> On Thu, Jun 11, 2015 at 11:28 PM, Dimitar Nedev <dimitar.nedev@monetdbsolutions.com> wrote:
> >>> Hi Imad,
> >>>
> >>> Hannes asked me to look at your issue with the MonetDB-R VM.
> >>>
> >>> I loaded the data and ran the queries you mentioned, but I did not encounter any issues. This holds for both the short and the union queries. I am also not familiar with the issue you report, nor its cause. At this time I think it is local to your instance. If you can export your db, or re-load the data, I would recommend you to create a new instance with a fresh image and try again. If you import the image as a new VM instance, you can keep your old one around as well.
> >>>
> >>> Please note that during the test:
> >>> - I ran as the admin (monetdb) user, as the one created default on the VM image cannot create new schemas.
> >>> - I have to clean the data a bit up to load it, removing dozen lines or so.
> >>> - Renamed ttest1 to ttest2 to match the queries.
> >>>
> >>> Best regards,
> >>> Dimitar
> >>>
> >>>>>>>>> Begin forwarded message:
> >>>>>>>>>
> >>>>>>>>> From: imad hajj chahine <imad.hajj.chahine@gmail.com>
> >>>>>>>>> Subject: Re: Monetdb-R integration
> >>>>>>>>> Date: 10 Jun 2015 10:11:48 CEST
> >>>>>>>>> To: Hannes Mühleisen <Hannes.Muehleisen@cwi.nl>
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I am using the virtual box provided from https://www.monetdbsolutions.com/solutions/cloud/vm:
> >>>>>>>>> OS:linux version 3.10.0-123.9.3.el7.x86_64 (builder@kbuilder.dev.centos.org)
> >>>>>>>>> Monetdb: V1.7: (Oct2014-SP1)
> >>>>>>>>> R: 3.1.2 (2014-10-31)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> create table "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" ("Consumer Number" INTEGER,"1-Disloyalty" varchar(250),"2-Age" varchar(250),"3-Amount transferred" INTEGER,"4-Gender" varchar(250),"5-Citiz enship" varchar(250),"6- Profession" varchar(250),"7-City" varchar(250),"8-1 NC Billboard" varchar(250),"8-2 NC Friends" varchar(250),"8-3 NC Magazine" varchar(250),"8-4 NC Others" varchar(250),"9-Score" decimal(18,3),"10- Type of operation" varchar(250),"11-1 Welcoming" INTEGER,"11-2 Waiting time" INTEGER,"11-3 Time to finish request" INTEGER,"11-4 Fees" INTEGER,"11-5 Courtesy of employees" INTEGER,"11-6 Branch layout" INTEGER,"11-7 Attention to special requests" INTEGER,"12- Total time for service" decimal(18,3),"13- Distance from POS" decimal(18,3),"14- Weight" INTEGER)
> >>>>>>>>>
> >>>>>>>>> COPY offset 2 INTO "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" FROM '/home/App_Data/TestingDataSimple_2003_Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea.csv' USING DELIMITERS ',','\n','"';
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 10, 2015 at 10:50 AM, Hannes Mühleisen <Hannes.Muehleisen@cwi.nl> wrote:
> >>>>>>>>> Hello Imad,
> >>>>>>>>>
> >>>>>>>>> which version of MonetDB and R are you using? Which operating system?
> >>>>>>>>>
> >>>>>>>>> Can you please also include the commands you used to load your CSV into MonetDB/ create the table?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Hannes
> >>>>>>>>>
> >>>>>>>>>> On 10 Jun 2015, at 00:25, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Hannes,
> >>>>>>>>>>
> >>>>>>>>>> I am testing Monetdb-R integration to do some statistical test, i followed the tutorial you posted on youtube and i have created the following function:
> >>>>>>>>>>
> >>>>>>>>>> CREATE AGGREGATE "tone"."ttest1"(arg1 double, arg2 double) RETURNS double LANGUAGE R { t.test(arg1, arg2)$p.value };
> >>>>>>>>>>
> >>>>>>>>>> this function will basically do a one sample t-test.
> >>>>>>>>>>
> >>>>>>>>>> I have loaded the sample attached file into a demo database and trying to run the following query:
> >>>>>>>>>>
> >>>>>>>>>> select "2-Age", "tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" is not null group by "2-Age"
> >>>>>>>>>>
> >>>>>>>>>> The following error happened
> >>>>>>>>>> Error: protocol violation: unexpected line in data block: .
> >>>>>>>>>> SQLState: M0M10
> >>>>>>>>>> ErrorCode: 0
> >>>>>>>>>>
> >>>>>>>>>> alternatively i managed to have the intended result by running this query:
> >>>>>>>>>>
> >>>>>>>>>> select '[10-20]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '[10-20]'
> >>>>>>>>>> union all
> >>>>>>>>>> select '[21-30]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '[21-30]'
> >>>>>>>>>> union all
> >>>>>>>>>> select '[> 30]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '> 30'
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> any help or guidance is appreciated.
> >>>>>>>>>> Thank you.
> >>>>>>>>>>
> >>>>>>>>>> <TestingDataSimple_2003_Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea.csv>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>