Calum Miller wrote:
Hi Martin
Many thanks for the feedback, much appreciated. Comments below.
On 6 Aug 2009, at 10:20, Martin Kersten wrote:
Hi Calum,
Thank you very much for sharing this information.
Hi,
I've been doing performance tests on MondetDB to see how it scales. I've been running tests on Amazon EC2 (7GB of memory, 64-Bit, CentOS 5.2) The effect of running MonetDB on a cloud machine is new to us. The table I am using has 60 columns with most being just simple
Calum Miller wrote: the table contains (primary) keys? From the SQL below CollectionDEPOTID and CollectionCountryID would be dimension columns but not primary. There would be one primary key on each row Inforbright performance improves when integer ids are used instead of strings, does monetDb care what type the columns are? Yes. If you play with strings while you actually mean integers, your system becomes slower. For it has to convert str->int, except for equality.
Would a thinner fact table improve performance any? no. MonetDB is column oriented. Performance is more affected by taking superfluous attributes in your query result. So, don't use 'select * from t' when you are only interested in a limited set of columns.
characters of length 2, here is the query: select CollectionDEPOTID as c0, CollectionCountryID as c1, sum(ParcelCount) as m0 from parcel_stage_short as parcel_stage_short group by c0, c1;
Here are the load time in batches of 8,000,000 using the following command: COPY 8000000 RECORDS INTO parcel_stage_short FROM '/mnt/parcel_stage_short.txt' USING DELIMITERS '\t','\n';
Rows Load Time Increase % Increase 8000000 240434.17 8000000 424937.54 184503.37 76.74% 8000000 599621.98 359187.81 149.39% 8000000 769276.70 528842.53 219.95% 8000000 889600.24 889600.24 209.35% msecs msecs
As can be seen the load time increases by around 200,000 msec for each additional 8Million rows added. Any tips on how to reduce the load speed? The copy command in the latest release was not running in parallel on multicore machines due to a design error discovered in that code. Parallel loading alone would give you another factor >=2
Great news, I'll try on an earlier release It is unclear yet, if this patch will get it into the august 2009 release or becomes part of the next bugfix release.
Would it be faster to load the table in one go (tried loading in bigger batches but loader fails)? It should save. Because each time you run a batch it has to make room for the newly tuples, which implies a copying of what you already had. The counter effect is that you stress the operating system with lot of dirty memory, which it has to swap out.
If it failed, then we would like to know the error message it generated.
Here are the performance times for each additional batch of 8,000,000 rows:
Rows 1st Query 1st Increase % 1st Increase 2nd Query 2nd Increase % Increase 08000000 1176.53 16000000 4838.52 3661.99 311.25% 2239.61 1168.30 109.05% 24000000 33947.84 32771.31 2785.43% 9323.29 8251.98 770.27% 32000000 837114.13 835937.61 71051.29% 5456.96 4385.65 409.37% 40000000 668380.26 667203.73 56709.60% 8538.02 7466.71 696.97% msecs msecs msecs msecs
The performance of the 1st query is unsurprisingly poor just after the data has been copied in MonetDB decides itself when a (hash) index might speed up the query. This index will be maintained during the session only. You see here that effect of self-management. The performance of the same query a second time yields much faster results The performance of the monetDb with 1st/2nd query degrades as load increases. yes, building a hash index is linear in the size of the table.
Are these results typical? Is there anything I can do to increase performance on the 1st or 2nd queries? no, traditional systems would use a persistent index build *before* you run any query. Would allocation of more memory speed up the performance of the hash on the subsequent queries?
MonetDB takes the memory it can find. There is no DBA control over that other then plugging more ram into the machine and, with large database, extend the swap space. Could you share some performance figures of Infobright with us?
regards, Martin
Thanks in advance
Calum
------------------------------------------------------------------------
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users