[MonetDB-users] speed comparison off inserts and deletes, and db size question
Hi, I'm not too familiar with the BAT concept. Regardless, I want to know more about the SQL equivalents. We have need for a 300 million recors DB of about 500 GB. We insert about 1000 records per seconds. or about 100 millions record per day, for 3 days. We 'roll' the data by time, deleting the tail end of the tables, but the the 'deletes' are too slow. So, we are looking at better DB or data storage patterns (one that would allow a quick DROP-table-like or file.delete() response time). So, as I look at MonetDB, I have some questions: 1-What is the insert speed relatively to other DBs? 2-What is the delete speed relatively to other DBs? 3-What is the insert speed degradation with the addition of indexes? 4-What is the delete speed degradation with the addition of indexes? Also, in the forum, I saw some discussions about size. We may remain on a 32 bits kernel, so: 5-The 2 gigs limit on BATs, is that records count is byte size? A performance constraint was also highlighted: "the performance degradation starts when MonetDB has to access BATs larger then the available memory" 6-In our case, with 300 millions records, how big would be the BATs? Last question, lucky 7: 7-Do you think MonetDB is at all a good choice for our high-thoughput highly volatile data? Thanks! ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs
Hi, Thanks for your interest in MonetDB. I'll try to answer your questions wherever I can. Quartz wrote:
Hi,
I'm not too familiar with the BAT concept. Regardless, I want to know more about the SQL equivalents.
We have need for a 300 million recors DB of about 500 GB. We insert about 1000 records per seconds. or about 100 millions record per day, for 3 days.
wow Can you tell us which database on what OS on which hardware this runs?
We 'roll' the data by time, deleting the tail end of the tables, but the the 'deletes' are too slow. So, we are looking at better DB or data storage patterns (one that would allow a quick DROP-table-like or file.delete() response time).
So, as I look at MonetDB, I have some questions:
1-What is the insert speed relatively to other DBs?
Using SQL, with the current version we won't compete very well. We get much more acceptable speeds in 'raw' mode, where lots of the parsing overhead is removed.
2-What is the delete speed relatively to other DBs?
If I recall correctly, on a moving area of data like you have, our BATs keep on growing. Delete speed should be fine, but you will encounter a serious other problem with the growing BATs.
3-What is the insert speed degradation with the addition of indexes?
I thought we don't have indices. That is, if necessary they are created on the fly. Someone should correct me if I'm wrong.
4-What is the delete speed degradation with the addition of indexes?
idem
Also, in the forum, I saw some discussions about size. We may remain on a 32 bits kernel, so:
5-The 2 gigs limit on BATs, is that records count is byte size?
I think so, yes. A BAT must be fully memory addressable, which draws a limitation around 2G on a 32-bits system. Again, someone correct me if I'm wrong, or not entirely right.
A performance constraint was also highlighted: "the performance degradation starts when MonetDB has to access BATs larger then the available memory"
6-In our case, with 300 millions records, how big would be the BATs?
A BAT can be thought of as a single column. If you have 300 million ints, then you need over 1.118GB. If you don't have this amount of memory, then it will be memory mapped, in which case your performance is limited by the speed of your disk. The performance degradation has to do with MonetDB having to resort to virtual memory which is ofcourse much slower than real memory.
Last question, lucky 7: 7-Do you think MonetDB is at all a good choice for our high-thoughput highly volatile data?
I think you would seriously need horizontal partitioning which will be available in the next generation MonetDB kernel, to get around some problems both memory and delete wise. Since you have a very specific use case, I think it might be worth to have some more detailed discussion on whether MonetDB can be of any use in its current form, a next generation, or not for what the planning looks like. Regards, Fabian
--- Fabian
We have need for a 300 million recors DB of about 500 GB. We insert about 1000 records per seconds. or about 100 millions record per day, for 3 days.
wow Can you tell us which database on what OS on which hardware this runs?
Hehehe. We have needs for it... It doesn't exist yet. We have a dual amd opteron, linux 32 bits for now, 120 GB partition in a striped raid, 4 gigs of ram. Mysql 4.1.10 with InnoDB. The secret is in the extended insert syntax of mysql. Magnitude faster than jdbc batches. We fill our 90 gigs innodb file easily, but given we want to delete in batch once a day, the delete takes like hours. A permanent 'nibble' delete is working, but lowers the insert speed in the peek of the day. So we batch-delete by night.
Last question, lucky 7: 7-Do you think MonetDB is at all a good choice for our high-thoughput highly volatile data?
I think you would seriously need horizontal partitioning which will be available in the next generation MonetDB kernel, to get around some problems both memory and delete wise.
If it cannot be supported transparently, I won't pursue it. We already have another prototype with multi-table partitionning. It is a bit slower on queries though. Thanks Fabian. ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs
Quartz wrote:
--- Fabian
wrote: We have need for a 300 million recors DB of about 500 GB. We insert about 1000 records per seconds. or about 100 millions record per day, for 3 days.
wow Can you tell us which database on what OS on which hardware this runs?
Hehehe. We have needs for it... It doesn't exist yet. We have a dual amd opteron, linux 32 bits for now, 120 GB partition in a striped raid, 4 gigs of ram. Mysql 4.1.10 with InnoDB. The secret is in the extended insert syntax of mysql. Magnitude faster than jdbc batches.
That is the 'raw' mode I was talking about we also have. It has around the same orders of magnitude of speed improvement in comparison to JDBC batching (even with PreparedStatements).
7-Do you think MonetDB is at all a good choice for our high-thoughput highly volatile data?
I think you would seriously need horizontal partitioning which will be available in the next generation MonetDB kernel, to get around some problems both memory and delete wise.
If it cannot be supported transparently, I won't pursue it. We already have another prototype with multi-table partitionning. It is a bit slower on queries though.
The idea of the partitions in the next generation of MonetDB is to support transparent access to a pseudo BAT, which consists of multiple BATs. I think there is still some work to do on this.
The idea of the partitions in the next generation of MonetDB is to support transparent access to a pseudo BAT, which consists of multiple BATs. I think there is still some work to do on this.
I am sorry to interfere in your discussion, but this topic of transparent partitioning interests me a little. I did not know the next generation would support it. Do you intend to manage distributed partitioning as well? That would be quite interesting. Cheers, Rodrigo
Rodrigo Schmidt wrote:
I am sorry to interfere in your discussion, but this topic of transparent partitioning interests me a little. I did not know the next generation would support it. Do you intend to manage distributed partitioning as well? That would be quite interesting.
A distributed version of MonetDB is in the works, which will support partitioning over multiple instances of MonetDB/Five.
When you say distributed, do you mean you will cope with replication also? For that I mean partitioning + replication, like "MySQL Cluster". When will this version of MonetDB be available? Cheers, Rodrigo Fabian wrote:
Rodrigo Schmidt wrote:
I am sorry to interfere in your discussion, but this topic of transparent partitioning interests me a little. I did not know the next generation would support it. Do you intend to manage distributed partitioning as well? That would be quite interesting.
A distributed version of MonetDB is in the works, which will support partitioning over multiple instances of MonetDB/Five.
Yes, we envision partitioning, replication, partitioning again, replication, in any desired order, dynamically. This extention to MonetDB will not be available very soon. Rodrigo Schmidt wrote:
When you say distributed, do you mean you will cope with replication also? For that I mean partitioning + replication, like "MySQL Cluster".
When will this version of MonetDB be available?
Cheers, Rodrigo
Fabian wrote:
Rodrigo Schmidt wrote:
I am sorry to interfere in your discussion, but this topic of transparent partitioning interests me a little. I did not know the next generation would support it. Do you intend to manage distributed partitioning as well? That would be quite interesting.
A distributed version of MonetDB is in the works, which will support partitioning over multiple instances of MonetDB/Five.
Do you have already some documentation about how this system is going to work? As the implementation is to come soon, I suppose all algorithms for managing replication/partitioning have been decided already. I would be particularly interested if you have some type of comparison between your architecture and the one of "MySQL Cluster". Cheers, Rodrigo Fabian wrote:
Yes, we envision partitioning, replication, partitioning again, replication, in any desired order, dynamically.
This extention to MonetDB will not be available very soon.
Rodrigo Schmidt wrote:
When you say distributed, do you mean you will cope with replication also? For that I mean partitioning + replication, like "MySQL Cluster".
When will this version of MonetDB be available?
Cheers, Rodrigo
Fabian wrote:
Rodrigo Schmidt wrote:
I am sorry to interfere in your discussion, but this topic of transparent partitioning interests me a little. I did not know the next generation would support it. Do you intend to manage distributed partitioning as well? That would be quite interesting.
A distributed version of MonetDB is in the works, which will support partitioning over multiple instances of MonetDB/Five.
We are working on this, and it probably won't be finished very soon. You better don't count on it if you need it now. Rodrigo Schmidt wrote:
Do you have already some documentation about how this system is going to work? As the implementation is to come soon, I suppose all algorithms for managing replication/partitioning have been decided already. I would be particularly interested if you have some type of comparison between your architecture and the one of "MySQL Cluster".
Cheers, Rodrigo
participants (3)
-
Fabian
-
Quartz
-
Rodrigo Schmidt