One obvious optimization was to avoid the explicit BATproject and use the candidate inut in BATappend, when possible. This improved things and now the clean up takes 3/4 of the total time. Still... loop(...) { BAT *gn, *en, *hn; BAT *idxn, *pn; // tokenize, group, transform, append // this takes 1/4 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tokens,en,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail; // this takes 3/4 of the total loop time BBPreclaim(idxn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); } On Thu, 25 Jan 2018 at 18:26 Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
Hi there,
Would someone like to pick up this wonderful opportunity to tell me that I'm doing something silly? Please do.
In this loop, the cleanup take 4/5 of the total time. Is there a better way of doing this?
loop(...) { BAT *gn, *en, *hn; BAT *idxn, *tn, *pn;
// tokenize, group, transform, append // this takes 1/5 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); tn = BATproject(en,tokens); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail;
// this takes 4/5 of the total loop time BBPreclaim(idxn); BBPreclaim(tn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }