Help with performance issue
Hi there, Would someone like to pick up this wonderful opportunity to tell me that I'm doing something silly? Please do. In this loop, the cleanup take 4/5 of the total time. Is there a better way of doing this? loop(...) { BAT *gn, *en, *hn; BAT *idxn, *tn, *pn; // tokenize, group, transform, append // this takes 1/5 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); tn = BATproject(en,tokens); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail; // this takes 4/5 of the total loop time BBPreclaim(idxn); BBPreclaim(tn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }
One obvious optimization was to avoid the explicit BATproject and use the candidate inut in BATappend, when possible. This improved things and now the clean up takes 3/4 of the total time. Still... loop(...) { BAT *gn, *en, *hn; BAT *idxn, *pn; // tokenize, group, transform, append // this takes 1/4 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tokens,en,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail; // this takes 3/4 of the total loop time BBPreclaim(idxn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); } On Thu, 25 Jan 2018 at 18:26 Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
Hi there,
Would someone like to pick up this wonderful opportunity to tell me that I'm doing something silly? Please do.
In this loop, the cleanup take 4/5 of the total time. Is there a better way of doing this?
loop(...) { BAT *gn, *en, *hn; BAT *idxn, *tn, *pn;
// tokenize, group, transform, append // this takes 1/5 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); tn = BATproject(en,tokens); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail;
// this takes 4/5 of the total loop time BBPreclaim(idxn); BBPreclaim(tn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }
try to reclaim a.s.a.p., this reduces resource competitions. martin Sent from my iPad
On 25 Jan 2018, at 19:22, Roberto Cornacchia
wrote: One obvious optimization was to avoid the explicit BATproject and use the candidate inut in BATappend, when possible. This improved things and now the clean up takes 3/4 of the total time.
Still...
loop(...) { BAT *gn, *en, *hn; BAT *idxn, *pn;
// tokenize, group, transform, append // this takes 1/4 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tokens,en,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail;
// this takes 3/4 of the total loop time BBPreclaim(idxn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }
On Thu, 25 Jan 2018 at 18:26 Roberto Cornacchia
wrote: Hi there, Would someone like to pick up this wonderful opportunity to tell me that I'm doing something silly? Please do.
In this loop, the cleanup take 4/5 of the total time. Is there a better way of doing this?
loop(...) { BAT *gn, *en, *hn; BAT *idxn, *tn, *pn;
// tokenize, group, transform, append // this takes 1/5 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); tn = BATproject(en,tokens); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail;
// this takes 4/5 of the total loop time BBPreclaim(idxn); BBPreclaim(tn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Thanks Martin,
Actually I was doing that in my original code, and grouped all the
BBPreclaim at the end just to measure them easier. It makes no difference
in my case.
The issue seems simply to be that BBPreclaim is a more expensive operation
than I thought, so calling it many times kills performance.
I was hoping that there could be a similar but cheaper function that could
be sufficient in this case.
I'm always confused with BBPrelclaim, BBPrelease, BBPunfix.
I actually solved my problem the right way, rewriting the whole thing to
work in larger batches. This brings the total time to 15% of the original
one.
Roberto
On Thu, 25 Jan 2018 at 20:53 Martin Kersten
try to reclaim a.s.a.p., this reduces resource competitions. martin
Sent from my iPad
On 25 Jan 2018, at 19:22, Roberto Cornacchia
wrote: One obvious optimization was to avoid the explicit BATproject and use the candidate inut in BATappend, when possible. This improved things and now the clean up takes 3/4 of the total time.
Still...
loop(...) { BAT *gn, *en, *hn; BAT *idxn, *pn;
// tokenize, group, transform, append // this takes 1/4 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tokens,en,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail;
// this takes 3/4 of the total loop time BBPreclaim(idxn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }
On Thu, 25 Jan 2018 at 18:26 Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:
Hi there,
Would someone like to pick up this wonderful opportunity to tell me that I'm doing something silly? Please do.
In this loop, the cleanup take 4/5 of the total time. Is there a better way of doing this?
loop(...) { BAT *gn, *en, *hn; BAT *idxn, *tn, *pn;
// tokenize, group, transform, append // this takes 1/5 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); tn = BATproject(en,tokens); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail;
// this takes 4/5 of the total loop time BBPreclaim(idxn); BBPreclaim(tn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Roberto Good to hear you solved it. And indeed the right way ;) Without further knowledge on the sizes and timing involved, I can only guess that if the BATs are large and may have to be removed from the directories (i.e. system calls) The general mechanism is to use BBPunfix(), provided the reference counts are properly set during the operations (to be checked for your case). On 25/01/2018 22:08, Roberto Cornacchia wrote:
Thanks Martin,
Actually I was doing that in my original code, and grouped all the BBPreclaim at the end just to measure them easier. It makes no difference in my case.
The issue seems simply to be that BBPreclaim is a more expensive operation than I thought, so calling it many times kills performance. I was hoping that there could be a similar but cheaper function that could be sufficient in this case. I'm always confused with BBPrelclaim, BBPrelease, BBPunfix.
I actually solved my problem the right way, rewriting the whole thing to work in larger batches. This brings the total time to 15% of the original one.
Roberto
On Thu, 25 Jan 2018 at 20:53 Martin Kersten
mailto:martin.kersten@cwi.nl> wrote: try to reclaim a.s.a.p., this reduces resource competitions. martin
Sent from my iPad
On 25 Jan 2018, at 19:22, Roberto Cornacchia
mailto:roberto.cornacchia@gmail.com> wrote: One obvious optimization was to avoid the explicit BATproject and use the candidate inut in BATappend, when possible. This improved things and now the clean up takes 3/4 of the total time.
Still...
loop(...) { BAT *gn, *en, *hn; BAT *idxn, *pn;
// tokenize, group, transform, append // this takes 1/4 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tokens,en,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail;
// this takes 3/4 of the total loop time BBPreclaim(idxn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }
On Thu, 25 Jan 2018 at 18:26 Roberto Cornacchia
mailto:roberto.cornacchia@gmail.com> wrote: Hi there,
Would someone like to pick up this wonderful opportunity to tell me that I'm doing something silly? Please do.
In this loop, the cleanup take 4/5 of the total time. Is there a better way of doing this?
loop(...) { BAT *gn, *en, *hn; BAT *idxn, *tn, *pn;
// tokenize, group, transform, append // this takes 1/5 of the total loop time if (BATutf8_tokenize(&tokens, s, delims, min_tok_len) != GDK_SUCCEED) goto fail; if (BATgroup(&gn, &en, &hn, tokens, NULL, NULL, NULL, NULL) != GDK_SUCCEED) goto fail; idxn = BATconstant(0, TYPE_int, &idx, BATcount(en), TRANSIENT); tn = BATproject(en,tokens); pn = BATconvert(hn, NULL, TYPE_dbl, TRUE); if (BATappend(br1,idxn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br2,tn,NULL,FALSE) != GDK_SUCCEED) goto fail; if (BATappend(br3,pn,NULL,FALSE) != GDK_SUCCEED) goto fail;
// this takes 4/5 of the total loop time BBPreclaim(idxn); BBPreclaim(tn); BBPreclaim(pn); BBPreclaim(hn); BBPreclaim(gn); BBPreclaim(en); BBPreclaim(tokens); }
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On 25/01/18 22:08, Roberto Cornacchia wrote:
I'm always confused with BBPrelclaim, BBPrelease, BBPunfix.
BBPreclaim is essentially the same as BBPunfix, except for two things: the argument is different (BAT * vs. bat, i.e. b vs. b->batCacheid), and BBPrelease can only be called if you *know* that there is only exactly one reference to the BAT (yours), i.e. basically for BATs that you just created and need to clean up. Internally, BBPreclaim just calls BBPunfix. BBPrelease is completely different. It is used to release a *logical* reference, whereas the other two release a *physical* reference. -- Sjoerd Mullender
Thanks for the details!
On Fri, 26 Jan 2018 at 10:16 Sjoerd Mullender
On 25/01/18 22:08, Roberto Cornacchia wrote:
I'm always confused with BBPrelclaim, BBPrelease, BBPunfix.
BBPreclaim is essentially the same as BBPunfix, except for two things: the argument is different (BAT * vs. bat, i.e. b vs. b->batCacheid), and BBPrelease can only be called if you *know* that there is only exactly one reference to the BAT (yours), i.e. basically for BATs that you just created and need to clean up.
Internally, BBPreclaim just calls BBPunfix.
BBPrelease is completely different. It is used to release a *logical* reference, whereas the other two release a *physical* reference.
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Martin Kersten
-
Roberto Cornacchia
-
Sjoerd Mullender