(Forwarding to users list, free support is public)
Begin forwarded message:
From: Hannes Mühleisen
Subject: Re: Monetdb-R integration Date: 19 Jun 2015 10:12:48 CEST To: imad hajj chahine Cc: Dimitar Nedev I am running attached script, which I pieced together from your emails. This does not run, it produces the following error:
Error running R expression. Error message: Error in t.test.formula(arg1 ~ arg2) : grouping factor must have exactly 2 levels Calls: as.data.frame -> <Anonymous> -> t.test -> t.test.formula
In general, for aggregations, the R code needs to return a vector with a single entry for each group.
Best,
Hannes
On 19 Jun 2015, at 09:44, imad hajj chahine
wrote: Thanks, if you need more information from my side let me know.
Also can you answer me on the following when i use 2 functions one to retrieve Pvalue and one for the stat value: Am I doing the correct way to retrieve the stat and p-value, is the ttest get executed twice in this case? is there a more optimal way where i can retrieve both information.
On Fri, Jun 19, 2015 at 10:39 AM, Hannes Mühleisen
wrote: Looking into this at the moment. On 16 Jun 2015, at 09:56, Dimitar Nedev
wrote: Hannes, do you know if the STD_DEV is the issue there or if the queries should actually yield different results?
Dimitar
On 2015-Jun-15, at 13:28 , imad hajj chahine
wrote: Hi,
Shouldn't we have the same result from running the first query and the second query? after all its an aggregate function and should return the test for each group.
I checked that monetdb had problem with running group by with statistics functions like STD_DEV, are these 2 issues related?
Thank you.
On Mon, Jun 15, 2015 at 1:52 PM, Dimitar Nedev
wrote: Hi Imad, Here is the output I got from running the queries. Note, this might not be entirely representative since I did clean-up some data before loading it.
sql>select "2-Age", "tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" is not null group by "2-Age"; +---------+--------------------------+ | 2-Age | L1 | +=========+==========================+ | [10-20] | 0.89311103869734754 | +---------+--------------------------+ 3 tuples (3.218ms)
sql>select '[10-20]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '[10-20]' more>union all more>select '[21-30]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '[21-30]' more>union all more>select '[> 30]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '> 30'; +---------+--------------------------+ | 2-Age | L1 | +=========+==========================+ | [10-20] | 0.14150742355349472 | | [21-30] | 0.32872830865003566 | | [> 30] | 0.37127075363219841 | +---------+--------------------------+ 3 tuples (8.825ms)
sql>select "tone"."ttest2samplesStatistic"("9-Score", "4-Gender") as "stat", "tone"."ttest2samples"("9-Score", "4-Gender") as "pvalue" from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" group by "2-Age"; +--------------------------+--------------------------+ | stat | pvalue | +==========================+==========================+ | -2.1588506639013736 | 0.032555033970368859 | +--------------------------+--------------------------+ 1 tuple (13.287ms)
I dropped the aggregate with: sql>DROP AGGREGATE "tone"."ttest2samplesStatistic"; operation successful (2.049ms)
Best regards, Dimitar
On 2015-Jun-12, at 00:46 , imad hajj chahine
wrote: Thank you Dimitar,
I was using Squirrel to connect from my host machine to the VM.
Now when I connect using SSH-client and run the short group by query it works but the result is not similar to the union queries (PS: i also run it in the VM direclty and had the same result)
+---------+--------------------------+ | 2-Age | pval | +=========+==========================+ | [21-30] | 0.99996611809271041 | +---------+--------------------------+ 3 tuples (2.638ms)
What i am trying to achieve is to run a one sample ttest for all the age categories, and it seems that the group by is ignored.
I have also created 2 functions for the two sample ttest:
CREATE AGGREGATE "tone"."ttest2samples"(arg1 double, arg2 varchar(250)) RETURNS double LANGUAGE R { t.test(arg1 ~ arg2)$p.value }; CREATE AGGREGATE "tone"."ttest2samplesStatistic"(arg1 double, arg2 varchar(250)) RETURNS double LANGUAGE R { t.test(arg1 ~ arg2)$statistic };
select "tone"."ttest2samplesStatistic"("9-Score", "4-Gender") as "stat", "tone"."ttest2samples"("9-Score", "4-Gender") as "pvalue" from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" group by "2-Age"
Am I doing the correct way to retrieve the stat and p-value, is the ttest get executed twice in this case? is there a more optimal way where i can retrieve both information. Please note that I also have the same grouping issue as before.
Do you have the same behavior, or I am missing something in the creation of these functions.
A final question, how do u drop a R function? i am trying to execute the following: drop all function "tone"."ttest2samplesStat"
Thank you.
On Thu, Jun 11, 2015 at 11:28 PM, Dimitar Nedev
wrote: Hi Imad, Hannes asked me to look at your issue with the MonetDB-R VM.
I loaded the data and ran the queries you mentioned, but I did not encounter any issues. This holds for both the short and the union queries. I am also not familiar with the issue you report, nor its cause. At this time I think it is local to your instance. If you can export your db, or re-load the data, I would recommend you to create a new instance with a fresh image and try again. If you import the image as a new VM instance, you can keep your old one around as well.
Please note that during the test: - I ran as the admin (monetdb) user, as the one created default on the VM image cannot create new schemas. - I have to clean the data a bit up to load it, removing dozen lines or so. - Renamed ttest1 to ttest2 to match the queries.
Best regards, Dimitar
>>>>> Begin forwarded message: >>>>> >>>>> From: imad hajj chahine
>>>>> Subject: Re: Monetdb-R integration >>>>> Date: 10 Jun 2015 10:11:48 CEST >>>>> To: Hannes Mühleisen >>>>> >>>>> Hi, >>>>> >>>>> I am using the virtual box provided from https://www.monetdbsolutions.com/solutions/cloud/vm: >>>>> OS:linux version 3.10.0-123.9.3.el7.x86_64 (builder@kbuilder.dev.centos.org) >>>>> Monetdb: V1.7: (Oct2014-SP1) >>>>> R: 3.1.2 (2014-10-31) >>>>> >>>>> >>>>> >>>>> create table "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" ("Consumer Number" INTEGER,"1-Disloyalty" varchar(250),"2-Age" varchar(250),"3-Amount transferred" INTEGER,"4-Gender" varchar(250),"5-Citiz enship" varchar(250),"6- Profession" varchar(250),"7-City" varchar(250),"8-1 NC Billboard" varchar(250),"8-2 NC Friends" varchar(250),"8-3 NC Magazine" varchar(250),"8-4 NC Others" varchar(250),"9-Score" decimal(18,3),"10- Type of operation" varchar(250),"11-1 Welcoming" INTEGER,"11-2 Waiting time" INTEGER,"11-3 Time to finish request" INTEGER,"11-4 Fees" INTEGER,"11-5 Courtesy of employees" INTEGER,"11-6 Branch layout" INTEGER,"11-7 Attention to special requests" INTEGER,"12- Total time for service" decimal(18,3),"13- Distance from POS" decimal(18,3),"14- Weight" INTEGER) >>>>> >>>>> COPY offset 2 INTO "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" FROM '/home/App_Data/TestingDataSimple_2003_Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea.csv' USING DELIMITERS ',','\n','"'; >>>>> >>>>> Thanks >>>>> >>>>> On Wed, Jun 10, 2015 at 10:50 AM, Hannes Mühleisen wrote: >>>>> Hello Imad, >>>>> >>>>> which version of MonetDB and R are you using? Which operating system? >>>>> >>>>> Can you please also include the commands you used to load your CSV into MonetDB/ create the table? >>>>> >>>>> Thanks, >>>>> >>>>> Hannes >>>>> >>>>>> On 10 Jun 2015, at 00:25, imad hajj chahine wrote: >>>>>> >>>>>> Hi Hannes, >>>>>> >>>>>> I am testing Monetdb-R integration to do some statistical test, i followed the tutorial you posted on youtube and i have created the following function: >>>>>> >>>>>> CREATE AGGREGATE "tone"."ttest1"(arg1 double, arg2 double) RETURNS double LANGUAGE R { t.test(arg1, arg2)$p.value }; >>>>>> >>>>>> this function will basically do a one sample t-test. >>>>>> >>>>>> I have loaded the sample attached file into a demo database and trying to run the following query: >>>>>> >>>>>> select "2-Age", "tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" is not null group by "2-Age" >>>>>> >>>>>> The following error happened >>>>>> Error: protocol violation: unexpected line in data block: . >>>>>> SQLState: M0M10 >>>>>> ErrorCode: 0 >>>>>> >>>>>> alternatively i managed to have the intended result by running this query: >>>>>> >>>>>> select '[10-20]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '[10-20]' >>>>>> union all >>>>>> select '[21-30]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '[21-30]' >>>>>> union all >>>>>> select '[> 30]' as "2-Age","tone"."ttest2"("9-Score",6.08963) from "tone"."Marketing_Loyalty_4700298d-9862-40b3-9028-b0f15dab9dea" where "2-Age" = '> 30' >>>>>> >>>>>> >>>>>> any help or guidance is appreciated. >>>>>> Thank you. >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >
participants (1)
-
Hannes Mühleisen