Hi,

I've been doing performance tests on MondetDB to see how it scales. I've been running tests on Amazon EC2 (7GB of memory, 64-Bit, CentOS 5.2)

The table I am using has 60 columns with most being just simple characters of length 2, here is the query:
select CollectionDEPOTID as c0, CollectionCountryID as c1, sum(ParcelCount) as m0 from parcel_stage_short as parcel_stage_short  group by c0, c1;


Here are the load time in batches of 8,000,000 using the following command:
COPY 8000000 RECORDS INTO parcel_stage_short FROM '/mnt/parcel_stage_short.txt'  USING DELIMITERS '\t','\n'

Rows Load Time Increase % Increase
8000000 240434.17
8000000 424937.54 184503.37 76.74%
8000000 599621.98 359187.81 149.39%
8000000 769276.70 528842.53 219.95%
8000000 889600.24 889600.24 209.35%
msecs msecs

As can be seen the load time increases by around 200,000 msec for each additional 8Million rows added.
Any tips on how to reduce the load speed?
Would it be faster to load the table in one go (tried loading in bigger batches but loader fails)?

Here are the performance times for each additional batch of 8,000,000 rows:

Rows 1st Query 1st Increase % 1st Increase 2nd Query 2nd Increase % Increase
08000000 1176.53
16000000 4838.52 3661.99 311.25% 2239.61 1168.30 109.05%
24000000 33947.84 32771.31 2785.43% 9323.29 8251.98 770.27%
32000000 837114.13 835937.61 71051.29% 5456.96 4385.65 409.37%
40000000 668380.26 667203.73 56709.60% 8538.02 7466.71 696.97%
msecs msecs msecs msecs

The performance of the 1st query is unsurprisingly poor just after the data has been copied in
The performance of the same query a second time yields much faster results
The performance of the monetDb with 1st/2nd query degrades as load increases.

Are these results typical? Is there anything I can do to increase performance on the 1st or 2nd queries?

Thanks in advance

Calum