Re: [MonetDB-users] Very Slow Selects

29 Feb 2012

      On 2012-02-29 01:06, Joseph Brower wrote:
...
We're using MonetDB and hitting it fairly heavily with read queries.  We 
only use importing of text files to insert information, and that happens 
every 30 seconds or so.  When we're only inserting information, we 
consume less than 1% of the cpu and things run just fine.  We are also 
able to occasionally query the DB for various peices of information 
(counts, averages, etc) and it returns quite quickly.
The problem is that after we've been running for 15-20 minutes of 
importing and querying heavily, performance drops to where queries take 
multiple minutes to finish.  The queries generally finish in less than 
half a second.
Does anyone have any ideas on how we can best track down this issue?  Is 
there any more information that I need to provide?
Thanks,
Joseph Brower
How are you importing?
Are you by any chance using INSERT INTO queries for each tuple you're
inserting?  And are you using this in autocommit mode (the default in
MonetDB)?
If so, this combination is killing your performance.

By far the fastest way of importing data is by using COPY INTO.  See
http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for
information on syntax.
The second best way (way slower on a multi-core system than the fastest)
is to use INSERT INTO queries in one big transaction.  Before starting
to insert, do START TRANSACTION;, and afterwards do COMMIT;.

To give an indication, if you're doing INSERT INTO queries in autocommit
mode, that means (at least) one disk write per query.  Disks can do
maybe 100 I/O operations per second (IOPS), depending on type of
interface and speed of disk.  This means each query will take at leat
0.1 second and you will get at most 100 inserts per second.  There is
nothing that you can do about that apart from cheating (or using
hardware that is way faster and hence much more expensive, such as SSD).

So the best way of avoiding this limit is to do fewer database
transactions.  To further improve speed, you need to parallelize import.
 We can't do that with INSERT INTO queries, but we *can* do that with
COPY INTO.  Therefore, COPY INTO is the fastest.

-- 
Sjoerd Mullender