Hi Kevork,

> Is there a way we can further optimise so we don’t have these issues?

As a workaround you may try to use data partitioning via Merge tables, see https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/DataPartitioning

The data which is static can be stored sorted in one table (t1). The new (inserted) data can be stored in a second table (t2). A third table (the merge table, mt1) will combine the two. You apply your queries on the merge table. All three tables must have the same table structure. New data is only inserted into t2, so t1 stays sorted.

If after some time the queries get slower again (due to growing second table (t2) which is not sorted on datetime) you can copy them to a third table, sorted on datetime. Next add this third table to the merge table (via ALTER TABLE mt1 ADD TABLE t3; ) and empty the data in the second table. All new data will be added to t2 only.

Let us know if this workaround works for your use case.

Regards,
Martin

On 07-12-16 05:48, Kevork Vartanian wrote:

Hi,

 

I have a few questions and hopefully someone will be able to answer me and put me in the right direction.

 

-       When we’re having performance issues, the only way we can resolve them, is by dumping the data again, sorted by datetime, and after that the queries are fast. When we do this, the table enables sorting on the datetime column and that’s why the queries are fast as there won’t be a full table scan. As soon as we insert new data (always in the same sorted order), firstly the table is losing the sorting and secondly the queries become extremely slow

-       We find it very unpredictable, and somehow not very deterministic, especially when it comes to indexing, sorting and log file consumption. We can’t control those, it seems

-       The read ahead log is growing uncontrollably and we don’t really know when it’s committing

-       When running the queries (after the table loses the sorting), we can see it’s writing too much data (storage usage grows significantly when query is still running) and that’s most probably because it is doing a full table scan

-       Even if we restart the engine, it takes a while for it to be back, as it recommits the read ahead log which takes time, and it doesn’t make a difference with the sorting and therefore the queries are still slow even after a restart

 

Is there a way we can further optimise so we don’t have these issues? We have more than 600 million entries and when the datetime field is sorted, the queries are quick. The only way we can force the sorting is when we dump the data again, but we can’t really do that all the time and we need to insert new data at least once every night.

 

It’s an Ubuntu 14.04.4 LTS instance on AWS with 2 CPU’s and 16 GB of RAM (and SSD drives).

 

Thank you.

 

Regards,

 

Kevork Vartanian

Technical Solutions Consultant

kevork.vartanian@innotta.com.au | +61 406 776 789 | www.innotta.com.au

 

INNOTTA Logo

 



_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list