Hello everyone!

 

This is my second posting to this list and identical to the first because I am not sure if I was subscribed at the time. Please ignore if you have read this before.

 

My apologies if this has been discussed elsewhere but a google search did not give me much. Let me give you a bit of background. We are planning to use a star schema in MonetDB as the backend for Pentaho BI reporting engine. We have a total of 2.4 B rows in the fact table and a maximum of 10 M rows in the dimension table. As you can see, this will be a read only database once the initial load is done.

 

I was wondering what hardware architecture will give a reasonable performance over traditional BI queries. Let’s define reasonable here by less than 1 minute turn-around time on joins between the fact table and the dimension table. On that note, what will be the expected performance on a machine with specs identical to the Amazon High I/O instance type which has two solid state drives - http://aws.amazon.com/ec2/instance-types/. I will of course combine the two drives in a software raid-0 configuration.

 

My broader question is that columnar databases are typically optimized for hard drives to utilize the maximum possible disk transfer rate,; what advantages can a solid state drive offer over a traditional hard drive when used solely for MonetDB? I can imagine some advantages maybe in the swap-ins for VM pages but I am not really an expert. Also, some advantages in multi-threaded disk reads.

 

Sid