I've been investigating very slow queries on my 350M row dataset when selecting for equality on a non-unique string id. e.g.

select count(*) from t_order where orderId = 'FOO';

This query takes ~4.5 hours to build a thash index. Subsequent repeats hit the cache and are sub milli.

select count(*) from t_order where time > '1970-01-01' and orderId = 'FOO'

This second one is consistently finished in a few seconds or less, and doesn't create any indexes.

The first query is a bit of a show-stopper. It spends a huge about of time in BAThash (gdk_search.c), and incurs a lot of page faults while reading and writing at random across a multi GB mmaped thash file. Additionally these page faults are hampered by the OS trying to write dirty pages out in the background. I've got plenty of RAM overall (130GB), but the 'active' page proportion for the thash file seems to be stuck at about 25% - giving me a 0.75 probability of a miss.

* Is there some way to make my OS less keen to evict pages? (swappiness = 0 already) I should have plenty of room to have the whole thash file resident. Additionally, allowing it to sit on dirty pages longer would reduce the total write IO.

* Is there some way to simply prevent large thash creation? For many applications I'd rather have slower consistent queries, then incur a massively slow query following a restart (not great in a prod environment!)

* Is there some way to generate the hash index with a more sympathetic algo that doesn't degrade so steeply. e.g some hash + sort. (I have literally no experience with this!)

-Will Muldrew