Hi,
My firm is investigating the use of MonetDB as a potential market data
store. I am interested in the opinion of those on this list as to how
best to structure the data for optimum performance.
Basically, our data consists of ~50,000 instruments. One use we are
looking at is for the storage of bar data, one for each minute.The
resulting data for each bar is like
, with one observation per
minute. While not all symbols will have an entry for each minute, many
observations will share the same Timestamp.
Is this an appropriate use for MonetDB? One schema I tried was to put
all symbols into the same table, and to separate the tables by month.
So for 5 years that would be 5*12=60 tables. A query along the lines
of "SELECT * FROM "200801" WHERE symbol='IBM' UNION ALL SELECT * FROM
"200802" WHERE symbol='IBM' UNION ALL ...)" The reads for a single
table were quite good, however, when reading across multiple tables I
encountered a lot of swap space and disk IO which drastically slowed
down the query.
I also tried one table per symbol but found this to be simply
untenable as INSERTS became extremely slow (on the order of minutes)
and once again, massive swap space usage.
How would you partition this data? We will be inserting once per day
in bulk, and I am not super concerned with the insert performance.
However, query performance is paramount, with some users querying data
very frequently for long time periods for single or groups of symbols.
The machine I am testing on has 16GB of RAM and is running Windows.
However, the machine MonetDB would ultimately be deployed on would
have much more RAM. Even so, the size of the results of a single query
are less than 16GB and should fit into memory without issue.
So please help me, MonetDB gurus, with any suggestions as to how you
would partition and store this data. I would appreciate any input.
Thank you!