On 2012-02-29 01:06, Joseph Brower wrote:
We're using MonetDB and hitting it fairly heavily with read queries. We only use importing of text files to insert information, and that happens every 30 seconds or so. When we're only inserting information, we consume less than 1% of the cpu and things run just fine. We are also able to occasionally query the DB for various peices of information (counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of importing and querying heavily, performance drops to where queries take multiple minutes to finish. The queries generally finish in less than half a second.
Does anyone have any ideas on how we can best track down this issue? Is there any more information that I need to provide?
Thanks,
Joseph Brower
How are you importing? Are you by any chance using INSERT INTO queries for each tuple you're inserting? And are you using this in autocommit mode (the default in MonetDB)? If so, this combination is killing your performance. By far the fastest way of importing data is by using COPY INTO. See http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for information on syntax. The second best way (way slower on a multi-core system than the fastest) is to use INSERT INTO queries in one big transaction. Before starting to insert, do START TRANSACTION;, and afterwards do COMMIT;. To give an indication, if you're doing INSERT INTO queries in autocommit mode, that means (at least) one disk write per query. Disks can do maybe 100 I/O operations per second (IOPS), depending on type of interface and speed of disk. This means each query will take at leat 0.1 second and you will get at most 100 inserts per second. There is nothing that you can do about that apart from cheating (or using hardware that is way faster and hence much more expensive, such as SSD). So the best way of avoiding this limit is to do fewer database transactions. To further improve speed, you need to parallelize import. We can't do that with INSERT INTO queries, but we *can* do that with COPY INTO. Therefore, COPY INTO is the fastest. -- Sjoerd Mullender