[MonetDB-users] MonetDB scalability
Hi, I am thinking of using MonetDB as a data storage for my application. The application will create lots of data (few GB of new data each day) while there will not be many queries (probably few thousands/day). Queries will be complex, but they should execute really fast. Since data will grow pretty fast, I must prepare for scaling, so I need to know is there a way to distribute MonetDB over multiple machines? Thanks and regards
On 18-12-2009 20:01:40 +0100, Bojan Šmid wrote:
I am thinking of using MonetDB as a data storage for my application. The application will create lots of data (few GB of new data each day) while there will not be many queries (probably few thousands/day). Queries will be complex, but they should execute really fast. Since data will grow pretty fast, I must prepare for scaling, so I need to know is there a way to distribute MonetDB over multiple machines?
At the moment there is no off-the-shelf implementation of any distributed/orchestrated version of MonetDB. That means, MonetDB itself doesn't manage anything like that currently, though it is not impossible to create distributed solutions, think of replicas or sharding (fragmentation). What /is/ around, is Merovingian, which can help you to implement any distributed solution. Merovingian can make access to MonetDB servers running on different machines transparent from any machine, allowing you you to use patterns to find MonetDB instances back in your cluster. See also the REMOTE DATABASES section in the merovingian man-page: http://homepages.cwi.nl/~fabian/MonetDB/Man%20Pages.html/merovingian.html
Thanks Fabian, OK, it looks like that is about DB instance discovery and proxying to it, even if a DB is remote, so that it looks like it's local to a client. This doesn't address anything about partitioning and cross-server joints, for example, right? Is partitioning support on the roadmap for 2010 by any chance? I do see http://monetdb.cwi.nl/MonetDB/Documentation/SQL-Roadmap.html , but I'm not sure how up to date it is? (and it doesn't mention horizontal scaling) Thanks, Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch ----- Original Message ----
From: Fabian Groffen
To: Bojan Šmid Cc: monetdb-users@lists.sourceforge.net Sent: Tue, December 22, 2009 1:32:35 PM Subject: Re: [MonetDB-users] MonetDB scalability On 18-12-2009 20:01:40 +0100, Bojan Šmid wrote:
I am thinking of using MonetDB as a data storage for my application. The application will create lots of data (few GB of new data each day) while there will not be many queries (probably few thousands/day). Queries will be complex, but they should execute really fast. Since data will grow pretty fast, I must prepare for scaling, so I need to know is there a way to distribute MonetDB over multiple machines?
At the moment there is no off-the-shelf implementation of any distributed/orchestrated version of MonetDB. That means, MonetDB itself doesn't manage anything like that currently, though it is not impossible to create distributed solutions, think of replicas or sharding (fragmentation).
What /is/ around, is Merovingian, which can help you to implement any distributed solution. Merovingian can make access to MonetDB servers running on different machines transparent from any machine, allowing you you to use patterns to find MonetDB instances back in your cluster. See also the REMOTE DATABASES section in the merovingian man-page: http://homepages.cwi.nl/~fabian/MonetDB/Man%20Pages.html/merovingian.html
------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On 22-12-2009 10:43:35 -0800, OG wrote:
Thanks Fabian,
OK, it looks like that is about DB instance discovery and proxying to it, even if a DB is remote, so that it looks like it's local to a client. This doesn't address anything about partitioning and cross-server joints, for example, right?
Your observation is right.
Is partitioning support on the roadmap for 2010 by any chance? I do see http://monetdb.cwi.nl/MonetDB/Documentation/SQL-Roadmap.html , but I'm not sure how up to date it is? (and it doesn't mention horizontal scaling)
Please ignore that roadmap. We are working on several distribution techniques, but none of them are ready for use, nor doing the classical fragmentation thing. If we know more specifically what your needs are we might be able to advise/help you in some way or another.
Hi Fabian,
Is partitioning support on the roadmap for 2010 by any chance? I do see http://monetdb.cwi.nl/MonetDB/Documentation/SQL-Roadmap.html , but I'm not sure how up to date it is? (and it doesn't mention horizontal scaling)
Please ignore that roadmap. We are working on several distribution techniques, but none of them are ready for use, nor doing the classical fragmentation thing.
Are they described anywhere public, even very briefly?
If we know more specifically what your needs are we might be able to advise/help you in some way or another.
I'm investigating and evaluating several DBs that seem to be in a similar space - LucidDB, MonetDB, InfiniDB... without knowing my *exact* needs, but I know I'll need to store and query large volumes of data. If I were to use a regular ROLAP, I'd use a star schema (with 100M+ fact rows) where time is a factor/dimension in pretty much all queries, and maybe I'd make use of Mondrian. Ideally, I'd be able to run most ad-hoc queries in sub-3 second time. The "fear" is, what happens when I hit the limit of the server running (Monet)DB? Thanks, Otis
OG wrote:
Hi Fabian,
Is partitioning support on the roadmap for 2010 by any chance? I do see http://monetdb.cwi.nl/MonetDB/Documentation/SQL-Roadmap.html , but I'm not sure how up to date it is? (and it doesn't mention horizontal scaling)
Please ignore that roadmap. We are working on several distribution techniques, but none of them are ready for use, nor doing the classical fragmentation thing.
Are they described anywhere public, even very briefly? Some of techniques researched are described in the public available source, e.g. the Octopus optimizer. But, as said before, we seek complete new ways to deal with scalability beyond the TB barrier.
If we know more specifically what your needs are we might be able to advise/help you in some way or another.
I'm investigating and evaluating several DBs that seem to be in a similar space - LucidDB, MonetDB, InfiniDB... without knowing my *exact* needs, but I know I'll need to store and query large volumes of data. If I were to use a regular ROLAP, I'd use a star schema (with 100M+ fact rows) where time is a factor/dimension in pretty much all queries, and maybe I'd make use of Mondrian. Ideally, I'd be able to run most ad-hoc queries in sub-3 second time.
All three systems have overlapping areas where they perform good/worst. Moreover, they have different approaches to capitalize the hardware infrastructure (throwing hardware in an MPP setting gives you performance). Moreover, a real-life testcase could help to discriminate the systems. (See www.cwi.nl/~mk/ontimeReport for more info)
The "fear" is, what happens when I hit the limit of the server running (Monet)DB? MonetDB currently behaves not optimal when the individual columns (and their support structures) do not fit in memory. Then IO thrashing may occur. This issue is addressed in ongoing work with high priority.
Martin
Thanks, Otis
------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (4)
-
Bojan Šmid
-
Fabian Groffen
-
Martin Kersten
-
OG