Re: Pathological behaviour when building large hash index

5 Dec 2013

      Hi Will,

could you please also share with us which version of MonetDB you are using,
and OS you are running?

Pleass find more comments & questions inlined below.

----- Original Message -----
...
Hi
I've been investigating very slow queries on my 350M row dataset when
selecting for equality on a non-unique string id. e.g.
select count(*) from t_order where orderId = 'FOO';
Is your orderId column unique, or are there duplicate values?
...
This query takes ~4.5 hours to build a thash index. Subsequent repeats hit
the cache and are sub milli.
select count(*) from t_order where time > '1970-01-01' and orderId = 'FOO'
This second one is consistently finished in a few seconds or less, and
doesn't create any indexes.
The first query is a bit of a show-stopper. It spends a huge about of time in
BAThash (gdk_search.c), and incurs a lot of page faults while reading and
writing at random across a multi GB mmaped thash file. Additionally these
page faults are hampered by the OS trying to write dirty pages out in the
background. I've got plenty of RAM overall (130GB), but the 'active' page
proportion for the thash file seems to be stuck at about 25% - giving me a
0.75 probability of a miss.
* Is there some way to make my OS less keen to evict pages? (swappiness = 0
already) I should have plenty of room to have the whole thash file resident.
Additionally, allowing it to sit on dirty pages longer would reduce the
total write IO.
In fact, if there is sufficient memory in total, there would be no need to flush
dirty (non-persistent) pages at all, but we have not yet found a good way to hit
this to the OS --- madvice() did not have much/any useful efficet for us on Linux.

Maybe some of the issues dicsussed here play a role:
http://engineering.linkedin.com/performance/optimizing-linux-memory-manageme...
?

While we have not yet encountered the NUMA problem reported there,
we have encountered the transparent huge page problems, and usually
disable them as also suggested in the above posting.
...
* Is there some way to simply prevent large thash creation? For many
applications I'd rather have slower consistent queries, then incur a
massively slow query following a restart (not great in a prod environment!)
Not easily right now.
However, you can either add the "fake" time predicate as you did above,
which (in this case) triggers MonetDB's optimizer to push the time predicate
(either because it's in timestamp (int) rather then varchar (string), or
because the time column happens to be ordered) below the orderId predicate
and this avoid the hash build,
or change the following line(s) in gdk/gdk_select.c (in case you built
MonetDB yourself from source):

        hash = b->batPersistence == PERSISTENT &&
               (size_t) ATOMsize(b->ttype) > sizeof(BUN) / 4 &&
               BATcount(b) * (ATOMsize(b->ttype) + 2 * sizeof(BUN)) < GDK_mem_maxsize / 2;
->
        hash = FALSE;

We'll consider whether we can provide a better solution in a future release.
...
* Is there some way to generate the hash index with a more sympathetic algo
that doesn't degrade so steeply. e.g some hash + sort. (I have literally no
experience with this!)
We'll also look into this. However, building a hash index inherently incurs
random access, so chances are slim ...

Best,
Stefan
...
-Will Muldrew
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
-- 
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |

Re: Pathological behaviour when building large hash index

Stefan Manegold