Hi All, I just tried monetdb with large tables (600,000 rows by 50,000 columns). There are two issues: 1. performance was low, it took about 4 seconds to insert a row of record when the number of columns was large (50,000). However it was pretty fast (about 100 rows per second) when the column number was small (for example, 10). 2. serious errors occurred very often. For example, after inserting random number of records (50, for example), monetdb started to report errors, like: monetdbd: internal error while starting mserver, please refer to the logs So I opened the log file, and the error was: 2012-09-14 08:26:36 ERR merovingian[13730]: client error: unknown or impossible state: 4 The Makefile (it is also contained in the attachments) listed below will start database server, create a database farm, create a database, create a big table and then insert records to the big table. Here it is: #=====start of Makefile===================== ROW_NUM=600000 COLUMN_NUM=50000 COLUMN_NUM_MINUS_ONE=$(shell echo $(COLUMN_NUM)-1 | bc) all: clean monetdbd create ./mydbfarm monetdbd start ./mydbfarm monetdb create test_big_db monetdb release test_big_db make create_table_sql -s > create_big_table.sql make insert_data_sql -s > insert_data.sql mclient -d test_big_db create_big_table.sql for i in $$(seq 1 $(ROW_NUM)); do \ mclient -d test_big_db insert_data.sql ;\ echo 'done inserting ' $$i ' rows'; \ done; mclient -d test_big_db -s "select * from test_big_table" > test_big_table.txt create_table_sql: @echo 'START TRANSACTION;' @echo 'CREATE TABLE "test_big_table" (' @for i in $$(seq 1 $(COLUMN_NUM_MINUS_ONE)); do \ echo ' "gen'$$i'" DOUBLE,'; \ done @echo ' "gen$(COLUMN_NUM)" DOUBLE' @echo ');' @echo 'COMMIT;' insert_data_sql: @echo -n 'INSERT INTO "test_big_table" VALUES (' ; @for j in $$(seq 1 $(COLUMN_NUM_MINUS_ONE)) ; do \ echo -n '0.323, '; \ done; @echo '0.324);'; insert_all_data_sql: @echo 'START TRANSACTION;' @for i in $$(seq 1 $(ROW_NUM)); do\ echo -n 'INSERT INTO "test_big_table" VALUES (' ;\ for j in $$(seq 1 $(COLUMN_NUM_MINUS_ONE)) ; do \ echo -n '0.323, '; \ done; \ echo '0.324);'; \ done @echo 'COMMIT;' clean: -killall mserver5 -killall monetdbd rm -rf ./mydbfarm rm -rf create_big_table.sql test_big_table.txt insert_data.sql #=====end of Makefile====================== Thank you very much ! Best wishes, Joshua Shuai Yuan On Thu, Sep 13, 2012 at 12:43 PM, Joshua Shuai Yuan < shuaiyuan.emory@gmail.com> wrote:
Sure, I will see what I can do.
Best wishes, Joshua Shuai Yuan
On Wed, Sep 12, 2012 at 11:40 PM, Wang, Fusheng
wrote: Hi Joshua,****
** **
The table we are managing is in the scale of 30K x 5K: 150M cells. If each table cell (double type) needs 8 bytes to represent (ideally), the space needed will be a couple of GB, or at most, in the scale of tens of GB. If that is the case, distributed setup may not be needed, as the data can be mostly cached by the database. MonetDB does support multi-cores and multi-disks, but setup across multiple machines is not supported based on my knowledge. ****
** **
Even though MonetDB claims unlimited number of columns, we should still be cautious on what performance we can achieve for the queries we want to provide. A pilot study on this could provide us some guideline. Do you think you can setup MonetDB, and create a benchmark table we can do some performance study?****
** **
Interestingly, the large column issue is also discussed in a famous database blog:****
http://www.dbms2.com/2011/03/13/so-how-many-columns-can-a-single-table-have-... ****
** **
One guy commented:****
** **
“Genomics models were the primary driver. these folks typically have short but wide datasets of the order of 300,000 columns.”****
** **
So we are not alone. And we have only a subset of the columns (30K versus 300K)? Then we are lucky!****
** **
** **
Fusheng****
** **
** **
** **
** **
*From:* Joshua Shuai Yuan [mailto:shuaiyuan.emory@gmail.com] *Sent:* Wednesday, September 12, 2012 11:14 PM *To:* Wang, Fusheng *Cc:* Qin, Zhaohui *Subject:* Re: Array database for large matrix****
** **
Hi Dr. Wang,****
That's really a good news. Does it support distributed database? Or do we need distributed one?****
** **
Best wishes, Joshua Shuai Yuan
****
On Wed, Sep 12, 2012 at 3:54 PM, Wang, Fusheng
wrote:**** Hi guys,****
****
It looks like the matrix structure can be nicely supported by array databases, like MonetDB. It supports unlimited number of columns for a table. It’s also open source. ****
http://www.monetdb.org/Home/Features****
****
I know the group quite well, and I will chat with them on the use case to see if it’s a good fit. ****
****
Thanks,****
****
Fusheng****
****
****
** ** ------------------------------
This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited.
If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments).****
** **