On Fri, Jul 05, 2013 at 10:08:31AM +0000, Scott Mathieson wrote:
Hi,
I am trying to create a simple aggregate function entirely in MAL and am now struggling at the final hurdle. I have created the aggregate function such that it aggregates across an entire table but in the sub aggregate function I can't work out how to link the group to data BAT (g) to the data. The aggregate is designed to simply collect a sample of the values from the named column and return them as a pipe delimited string value. Can anyone tell me how to use the 'g' bat to join to the 'b' bat in the sub... aggregate function to get only the data rows for a specific group - currently I only get 2 rows of data per group even though I know that there should be 100k+ rows in each group?
Here's the code:
function bin_sample(b:bat[:oid, :any]):str; value := ""; bsamp := sample.uniform(b, 1000:wrd); barrier (h, t) := iterator.new(bsamp); tmp := "" + t; type := bat.getTailType(bsamp); test := str.startsWith(type, "str"); # Force the string type to be cast to a string or you just end up with numeric values barrier xxxx := test; tmp := str.str(t); exit xxxx;
value := value + tmp; value := value + "||"; redo (h, t) := iterator.next(bsamp); exit (h, t); return value; end bin_sample;
why not simply cast any type to string? i.e., function bin_sample(b:bat[:oid, :any]):str; value := ""; bsamp := sample.uniform(b, 1000:wrd); barrier (h, t) := iterator.new(bsamp); tmp := calc.str(t); value := value + tmp; value := value + "||"; redo (h, t) := iterator.next(bsamp); exit (h, t); return value; end bin_sample; or even better, do the cast "in bulk": function bin_sample(b:bat[:oid, :any]):str; value := ""; bsamp := sample.uniform(b, 1000:wrd); bsampstr := batcalc.str(bsamp); barrier (h, t) := iterator.new(bsampstr); value := value + t; value := value + "||"; redo (h, t) := iterator.next(bsampstr); exit (h, t); return value; end bin_sample;
function subbin_sample(b:bat[:oid,:any_1],g:bat[:oid,:oid],e:bat[:oid,:any_2],skip_nils:bit) :bat[:oid,:str]; bn:=bat.new(:oid, :str);
# check 'e' has some values - this is the list of group IDs in the head # 'g' is the group to the data BAT head ID mapping # 'b' is the data column BAT that we're aggregating over barrier (grpid, t) := iterator.new(e);
# create a single row BAT of the group id we're processing bn_gid := bat.new(:oid, :oid); bat.insert(bn_gid, grpid, grpid);
# join this to the GID->DATA map BAT to get the Group DATA oids b_g2dr := bat.reverse(g); b_gid2data := algebra.leftjoin(bn_gid, b_g2dr);
grpval := "";
# Get the BAT of the actual data to be processed # # !!! HOW TO GET THIS TO WORK???????? # b_data := algebra.join(b_gid2data, b); grpval := bin_sample(b_data);
# Store the result for this group bat.insert(bn, grpid, grpval); redo (grpid, t) := iterator.next(e); exit (grpid, t); return bn;
end subbin_sample;
what about: function subbin_sample(b:bat[:oid,:any_1],g:bat[:oid,:oid],e:bat[:oid,:any_2],skip_nils:bit) :bat[:oid,:str]; nw := aggr.count(e); nl := calc.lng(nw); bn := bat.new(:oid, :str, nl); # check 'e' has some values - this is the list of group IDs in the head # 'g' is the group to the data BAT head ID mapping # 'b' is the data column BAT that we're aggregating over barrier (grpid, t) := iterator.new(e); # select GID from TID->GID map to get matching TIDs TIDs := algebra.uselect(g,grpid); # get DATA for matching TIDs b_data := algebra.kintersect(b,TIDs); # aggregate grpval := bin_sample(b_data); # Store the result for this group bat.insert(bn, grpid, grpval); redo (grpid, t) := iterator.next(e); exit (grpid, t); return bn; end subbin_sample; Stefan
Scott Mathieson,
________________________________
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |