[MonetDB-users] Optimal filesystem/platform choices for MonetDB?
We've started doing a big import of data into SR5 and I've noticed that it's creating *A LOT* of files. So far we've got about 620k files for 6.7 GB of bat. I expect that a full import should have at least 20 times what's imported so far. It seems that loading data is relatively slow compared to PostgreSQL (we're using COPY) and I'm guessing it's due to all of these files. Right now we're using ext3 on amd64 ubuntu linux. Should we be looking at other platforms? -bob
With --nightly=stable compiled on 2007-01-21 this problem still seems
to occur. We don't have quite as much data imported into the system as
we did last night but it seems to be using a lot of files again. It
hasn't crashed yet, but we're not past the point where SR5 crashed.
$ du -sh /data/db/monetdb
2.6G /data/db/monetdb
$ find /data/db/monetdb|wc -l
285477
On Jan 21, 2008 12:46 AM, Bob Ippolito
We've started doing a big import of data into SR5 and I've noticed that it's creating *A LOT* of files. So far we've got about 620k files for 6.7 GB of bat. I expect that a full import should have at least 20 times what's imported so far. It seems that loading data is relatively slow compared to PostgreSQL (we're using COPY) and I'm guessing it's due to all of these files. Right now we're using ext3 on amd64 ubuntu linux. Should we be looking at other platforms?
-bob
With --nightly=stable compiled on 2007-01-21 this problem still seems to occur. We don't have quite as much data imported into the system as we did last night but it seems to be using a lot of files again. It hasn't crashed yet, but we're not past the point where SR5 crashed.
$ du -sh /data/db/monetdb 2.6G /data/db/monetdb $ find /data/db/monetdb|wc -l 285477
On Jan 21, 2008 12:46 AM, Bob Ippolito
wrote: We've started doing a big import of data into SR5 and I've noticed that it's creating *A LOT* of files. So far we've got about 620k files
Bob Ippolito wrote: this is the most outrageous case case seen. is it possible at all to have the snippet of the sql script and input that causes it. It hints at creation of temporary tables which are retained until end of session, which means a garbage collection issue.
for 6.7 GB of bat. I expect that a full import should have at least 20 times what's imported so far. It seems that loading data is relatively slow compared to PostgreSQL (we're using COPY) and I'm guessing it's due to all of these files. Right now we're using ext3 on amd64 ubuntu linux. Should we be looking at other platforms?
-bob
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Jan 21, 2008 11:36 PM, Martin Kersten
With --nightly=stable compiled on 2007-01-21 this problem still seems to occur. We don't have quite as much data imported into the system as we did last night but it seems to be using a lot of files again. It hasn't crashed yet, but we're not past the point where SR5 crashed.
$ du -sh /data/db/monetdb 2.6G /data/db/monetdb $ find /data/db/monetdb|wc -l 285477
On Jan 21, 2008 12:46 AM, Bob Ippolito
wrote: We've started doing a big import of data into SR5 and I've noticed that it's creating *A LOT* of files. So far we've got about 620k files
Bob Ippolito wrote: this is the most outrageous case case seen. is it possible at all to have the snippet of the sql script and input that causes it. It hints at creation of temporary tables which are retained until end of session, which means a garbage collection issue.
We do not explicitly use any temporary tables actually, just COPY INTO. There is no DDL whatsoever in this script. The only two SQL statements that are executed are SELECT MAX(timestamp) FROM table; and COPY %d RECORDS INTO table FROM '%s' USING DELIMITERS '\t'; We do this for several tables, with autocommit off via conn._mapi.setAutocommit(False) (using the Python API). After every hourly batch (maybe 30-50k rows or so at its current position) we do a commit. -bob
participants (2)
-
Bob Ippolito
-
Martin Kersten