[MonetDB-users] billions rows ?
Hello, I'm testing for store large collections of genotypes on Monetdb. In few words, a genotype is a little piece of information ( about 3 or 4 char ) related to an individual and a dna marker. So a genotypes looks like contents of cell of sheet. Sometime we need to access them by individuals (cross marker), sometime by dna marker (cross individual). Today genotyping technologies provide to get 600, 000,0000 genotypes by run. Is MonetDB able to manage efficiently tables with several billions of rows ? have you any example of application with lot ( > 1 billion) of rows in only one table ? I' ve compiled and installed monetDB/SQL on a Dell PE2950 with 2 quadcores intel xeon 2.66 Ghz and 4 GB RAM with success I 've created a very basic table to store genotype : Create table genotypes ( ind char(10), mark char(10), alleles(char3) ) after I've populated this table with the "copy into table" statement About 370 millions rows have been loaded after 7 minutes. I haven't defined any index.
From mclient I sent the query below :
select * from genotypes where alleles ="A A"; Immediatly the server became frozen and after about ten minutes a w unix command showed : load average 16 !!!! I stopped the query Could you explain to me what has appended ? Is this behaviour normal ? Is it possible to restrict the cpu ressources allowed to monetDB ? Thank you in advance for your advices and your help Eric
Hi Eric,
thank you for trying out MonetDB. What happens in your case is that
you run out of memory. Your query becomes I/O bound and not cpu bound,
hence the low cpu avg load. For this amount of data you will need more
RAM and MonetDB will manage faster your workload.
There are no user defined indices in Monet, however depending your
query load you may choose to order your data in the columns that most
of your predicates apply to.
CPU resources cannot be restricted (as far as I know) from inside
monet, but you can always do that through you OS, btw are you using
Windows or Linux?
I hope I gave you a couple of hints.
Kind regards,
lefteris
On Sun, Sep 28, 2008 at 12:38 PM, eric Gtep
Hello,
I'm testing for store large collections of genotypes on Monetdb. In few words, a genotype is a little piece of information ( about 3 or 4 char ) related to an individual and a dna marker. So a genotypes looks like contents of cell of sheet. Sometime we need to access them by individuals (cross marker), sometime by dna marker (cross individual). Today genotyping technologies provide to get 600, 000,0000 genotypes by run.
Is MonetDB able to manage efficiently tables with several billions of rows ?
have you any example of application with lot ( > 1 billion) of rows in only one table ?
I' ve compiled and installed monetDB/SQL on a Dell PE2950 with 2 quadcores intel xeon 2.66 Ghz and 4 GB RAM with success I 've created a very basic table to store genotype :
Create table genotypes (
ind char(10), mark char(10), alleles(char3) )
after I've populated this table with the "copy into table" statement About 370 millions rows have been loaded after 7 minutes. I haven't defined any index.
From mclient I sent the query below :
select * from genotypes where alleles ="A A";
Immediatly the server became frozen and after about ten minutes a w unix command showed : load average 16 !!!! I stopped the query Could you explain to me what has appended ? Is this behaviour normal ? Is it possible to restrict the cpu ressources allowed to monetDB ?
Thank you in advance for your advices and your help
Eric
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi Eric, Please can you indicate what version of MonetDB you are using. The largest table loaded at our site contained 13.000.000.000 elements. (on a 2 quad-core system with 64Gb RAM and 6TB disk) It requires the Current version of MonetDB from the repository, because there where some problems in both loading and counting. To load such a large table, you better turn off (foreign-) key checking if you can, because those checks are notoriously expensive using the current code base. A solution is under development. For such large databases we used a Linux platform with 128GB swap space. The increased load is curious. If you use the Current version then the queries my internally run in parallel, causing all your cores to become active. They probably wait for IO. A general remark, you are using strings to represent the genotype and rely on string comparison. Given the way a DBMS handles this, it is often significantly worse then handling integer based search. Alternatively, for the adventurous people, a column-store approach where the genotypes are split into 3-4 columns can be considered. Joining them back upon need. Then each column is an array of ca 370MB which fits into memory. The selection itself would roughly create 600MB intermediates. success and keep us informed on the progress, Martin Lefteris wrote:
Hi Eric,
thank you for trying out MonetDB. What happens in your case is that you run out of memory. Your query becomes I/O bound and not cpu bound, hence the low cpu avg load. For this amount of data you will need more RAM and MonetDB will manage faster your workload.
There are no user defined indices in Monet, however depending your query load you may choose to order your data in the columns that most of your predicates apply to.
CPU resources cannot be restricted (as far as I know) from inside monet, but you can always do that through you OS, btw are you using Windows or Linux?
I hope I gave you a couple of hints.
Kind regards,
lefteris
On Sun, Sep 28, 2008 at 12:38 PM, eric Gtep
wrote: Hello,
I'm testing for store large collections of genotypes on Monetdb. In few words, a genotype is a little piece of information ( about 3 or 4 char ) related to an individual and a dna marker. So a genotypes looks like contents of cell of sheet. Sometime we need to access them by individuals (cross marker), sometime by dna marker (cross individual). Today genotyping technologies provide to get 600, 000,0000 genotypes by run.
Is MonetDB able to manage efficiently tables with several billions of rows ?
have you any example of application with lot ( > 1 billion) of rows in only one table ?
I' ve compiled and installed monetDB/SQL on a Dell PE2950 with 2 quadcores intel xeon 2.66 Ghz and 4 GB RAM with success I 've created a very basic table to store genotype :
Create table genotypes (
ind char(10), mark char(10), alleles(char3) )
after I've populated this table with the "copy into table" statement About 370 millions rows have been loaded after 7 minutes. I haven't defined any index.
From mclient I sent the query below :
select * from genotypes where alleles ="A A";
Immediatly the server became frozen and after about ten minutes a w unix command showed : load average 16 !!!! I stopped the query Could you explain to me what has appended ? Is this behaviour normal ? Is it possible to restrict the cpu ressources allowed to monetDB ?
Thank you in advance for your advices and your help
Eric
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Out of curiosity... what was the 13B row dataset you loaded? It
doesn't happen to be publicly available, does it? :)
-Tom
On Sun, Sep 28, 2008 at 8:23 AM, Martin Kersten
Hi Eric,
Please can you indicate what version of MonetDB you are using.
The largest table loaded at our site contained 13.000.000.000 elements. (on a 2 quad-core system with 64Gb RAM and 6TB disk) It requires the Current version of MonetDB from the repository, because there where some problems in both loading and counting. To load such a large table, you better turn off (foreign-) key checking if you can, because those checks are notoriously expensive using the current code base. A solution is under development.
For such large databases we used a Linux platform with 128GB swap space.
The increased load is curious. If you use the Current version then the queries my internally run in parallel, causing all your cores to become active. They probably wait for IO.
A general remark, you are using strings to represent the genotype and rely on string comparison. Given the way a DBMS handles this, it is often significantly worse then handling integer based search.
Alternatively, for the adventurous people, a column-store approach where the genotypes are split into 3-4 columns can be considered. Joining them back upon need. Then each column is an array of ca 370MB which fits into memory. The selection itself would roughly create 600MB intermediates.
success and keep us informed on the progress, Martin Lefteris wrote:
Hi Eric,
thank you for trying out MonetDB. What happens in your case is that you run out of memory. Your query becomes I/O bound and not cpu bound, hence the low cpu avg load. For this amount of data you will need more RAM and MonetDB will manage faster your workload.
There are no user defined indices in Monet, however depending your query load you may choose to order your data in the columns that most of your predicates apply to.
CPU resources cannot be restricted (as far as I know) from inside monet, but you can always do that through you OS, btw are you using Windows or Linux?
I hope I gave you a couple of hints.
Kind regards,
lefteris
On Sun, Sep 28, 2008 at 12:38 PM, eric Gtep
wrote: Hello,
I'm testing for store large collections of genotypes on Monetdb. In few words, a genotype is a little piece of information ( about 3 or 4 char ) related to an individual and a dna marker. So a genotypes looks like contents of cell of sheet. Sometime we need to access them by individuals (cross marker), sometime by dna marker (cross individual). Today genotyping technologies provide to get 600, 000,0000 genotypes by run.
Is MonetDB able to manage efficiently tables with several billions of rows ?
have you any example of application with lot ( > 1 billion) of rows in only one table ?
I' ve compiled and installed monetDB/SQL on a Dell PE2950 with 2 quadcores intel xeon 2.66 Ghz and 4 GB RAM with success I 've created a very basic table to store genotype :
Create table genotypes (
ind char(10), mark char(10), alleles(char3) )
after I've populated this table with the "copy into table" statement About 370 millions rows have been loaded after 7 minutes. I haven't defined any index.
From mclient I sent the query below :
select * from genotypes where alleles ="A A";
Immediatly the server became frozen and after about ten minutes a w unix command showed : load average 16 !!!! I stopped the query Could you explain to me what has appended ? Is this behaviour normal ? Is it possible to restrict the cpu ressources allowed to monetDB ?
Thank you in advance for your advices and your help
Eric
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (4)
-
eric Gtep
-
Lefteris
-
Martin Kersten
-
Thomas Briggs