
Hi Hering, MonetDB is a full-fledged DBMS, not just a plain data store. Hence, just like any other DBMS, its purpose it not to merely store (large amounts of) data as retrieve it as-is in its entirety --- say, SELECT * FROM table; --- (that indeed what we have filesystems and plain files for), but to process and analyse the data using (possibly complex) queries that yield results that are usually significantly smaller than the stored data volume. Basically, I may assume that you do not intend to present data at a rate of
1M records per sec to the (human) user, do you?
Consequently, I may assume that your application does do some kind of processing of the retrived data, right? The whole porpose of a DBMS is to provide you with the tools an knowledge to do such processing on large amount of data very efficiently. Hence, instead of aiming at retrieving all data as-is at high throughput from the DBMS to post-process it in your application, you should rather aim at having the DBMS do all (or at least) most of that processing, by formulating the processing task in a DBMS query language like SQL (or MAL, if SQL does not provide sufficient functionality and you are nor limited to using a standard language). Basically, pushing (most of) the data processing logic of your application into the DBMS to retrieve only a small ("useful" and "usable") result is the recommended way to use a DBMS efficiently and effectively. Stefan ps: Documentation about embedded use of MonetDB is available on our website: http://monetdb.cwi.nl/MonetDB/Documentation/Embedded-Server.html http://monetdb.cwi.nl/SQL/Documentation/Embedded-Server.html On Wed, May 26, 2010 at 08:05:08PM -0700, Hering Cheng wrote:
Hi,
I was wondering if you can give me some ideas on how to retrieve data from MonetDB at very high throughput. The query will be very simple, issued against a single table. This table contains billions of records and I'd like to retrieve the data at a rate of at least one million records per second. From my naive tests with mclient running on the same 16-core machine as mserver5, I am only able to extract the data at about 20,000 records per second with the Feb2010 release.
As a baseline case, with the same data in compressed (gzip'd) ASCII form stored in a regular file on the same (SAN) file system as MonetDB, I am able to read at the desired speed of one million records per second.
I understand that there is communication overhead between mserver5 and mclient (or whatever client I use). Therefore, one possibility is to embed my application within mserver5. The embedded application basically just needs to be able to issue a SQL (or even MAL) query against the enclosing mserver5 and process the result set. If this is a viable approach, I'd like some guidance on where the hooks are.
Thanks. Hering Cheng
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |