Hi,

 

I have a few questions and hopefully someone will be able to answer me and put me in the right direction.

 

-       When we’re having performance issues, the only way we can resolve them, is by dumping the data again, sorted by datetime, and after that the queries are fast. When we do this, the table enables sorting on the datetime column and that’s why the queries are fast as there won’t be a full table scan. As soon as we insert new data (always in the same sorted order), firstly the table is losing the sorting and secondly the queries become extremely slow

-       We find it very unpredictable, and somehow not very deterministic, especially when it comes to indexing, sorting and log file consumption. We can’t control those, it seems

-       The read ahead log is growing uncontrollably and we don’t really know when it’s committing

-       When running the queries (after the table loses the sorting), we can see it’s writing too much data (storage usage grows significantly when query is still running) and that’s most probably because it is doing a full table scan

-       Even if we restart the engine, it takes a while for it to be back, as it recommits the read ahead log which takes time, and it doesn’t make a difference with the sorting and therefore the queries are still slow even after a restart

 

Is there a way we can further optimise so we don’t have these issues? We have more than 600 million entries and when the datetime field is sorted, the queries are quick. The only way we can force the sorting is when we dump the data again, but we can’t really do that all the time and we need to insert new data at least once every night.

 

It’s an Ubuntu 14.04.4 LTS instance on AWS with 2 CPU’s and 16 GB of RAM (and SSD drives).

 

Thank you.

 

Regards,

 

Kevork Vartanian

Technical Solutions Consultant

kevork.vartanian@innotta.com.au | +61 406 776 789 | www.innotta.com.au

 

INNOTTA Logo