Re: [MonetDB-users] Bulk Data Extraction & Embedded mserver5

30 Aug 2010

      Other than taking the embedded route, can someone give me some guidance on
how I can hook into the internals of MonetDB so I can "intervene" before the
result set of a SQL query is sent to clients?

Basically, I want to insert my logic within MonetDB so that the (initial)
result set that would have been sent to clients (mclient, ODBC, JDBC) is
intercepted by my code and re-processed.  The output of my logic will then
be forwarded to the clients as usual.  Since my logic resides within
MonetDB, I should be able to receive the initial result set at millions of
records per second.

I'd like some hints on which source files I should look at to get this to
work.

Thanks a lot.
Hering

On Thu, May 27, 2010 at 7:26 AM, Hering Cheng wrote:
...
Thank you, Stefan, for the advice.
I apologize if I gave the impression that I am using MonetDB as a mere data
store.  I do realize that one should leverage analytical capabilities of the
DBMS engine as much as possible before getting the results out.  Part of my
system will indeed make extensive use of MonetDB's amazing query capability.
Unfortunately, another part of my application deals with what is known as
"complex event processing" (CEP), the most fundamental building block is the
use of fixed or sliding time windows.  An example would be to calculate
averages of a column with a 5-minute window.  Needless to say, the data are
time series.  The trick is to meld CEP logic with MonetDB.  Ideally, I would
enhance MonetDB with this capability, but I deem it to be beyond my own
ability.  Thus my question about how to feed data from MonetDB into my logic
in bulk.
On Wed, May 26, 2010 at 10:33 PM, Stefan Manegold wrote:
...
Hi Hering,
MonetDB is a full-fledged DBMS, not just a plain data store. Hence, just
like any other DBMS, its purpose it not to merely store (large amounts of)
data as retrieve it as-is in its entirety --- say, SELECT * FROM table;
---
(that indeed what we have filesystems and plain files for), but to process
and analyse the data using (possibly complex) queries that yield results
that are usually significantly smaller than the stored data volume.
Basically, I may assume that you do not intend to present data at a rate
of
...
1M records per sec to the (human) user, do you?
Consequently, I may assume that your application does do some kind of
processing of the retrived data, right?
The whole porpose of a DBMS is to provide you with the tools an knowledge
to
do such processing on large amount of data very efficiently.
Hence, instead of aiming at retrieving all data as-is at high throughput
from the DBMS to post-process it in your application, you should rather
aim
at having the DBMS do all (or at least) most of that processing, by
formulating the processing task in a DBMS query language like SQL (or MAL,
if SQL does not provide sufficient functionality and you are nor limited
to
using a standard language).
Basically, pushing (most of) the data processing logic of your application
into the DBMS to retrieve only a small ("useful" and "usable") result is
the
recommended way to use a DBMS efficiently and effectively.
Stefan
ps: Documentation about embedded use of MonetDB is available on our
website:
   http://monetdb.cwi.nl/MonetDB/Documentation/Embedded-Server.html
   http://monetdb.cwi.nl/SQL/Documentation/Embedded-Server.html
...
Hi,
I was wondering if you can give me some ideas on how to retrieve data
from
MonetDB at very high throughput.  The query will be very simple, issued
against a single table.  This table contains billions of records and I'd
like to retrieve the data at a rate of at least one million records per
second.  From my naive tests with mclient running on the same 16-core
machine as mserver5, I am only able to extract the data at about 20,000
records per second with the Feb2010 release.
As a baseline case, with the same data in compressed (gzip'd) ASCII form
stored in a regular file on the same (SAN) file system as MonetDB, I am
able
to read at the desired speed of one million records per second.
I understand that there is communication overhead between mserver5 and
mclient (or whatever client I use).  Therefore, one possibility is to
embed
my application within mserver5.  The embedded application basically just
needs to be able to issue a SQL (or even MAL) query against the
enclosing
mserver5 and process the result set.  If this is a viable approach, I'd
On Wed, May 26, 2010 at 08:05:08PM -0700, Hering Cheng wrote:
like
...
some guidance on where the hooks are.
Thanks.
Hering Cheng
...
------------------------------------------------------------------------------
...
...
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/http://www.cwi.nl/%7Emanegold/ |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4199       |