Hi Brandon,
First I would like to say I really appreciate your assistance. I'm
grateful for your help in figuring this out. On paper, MonetDB seems
like it could be very good for our purposes and it's been frustrating
it hasn't worked out that way. I do understand that it is auto-tuning
and I was looking forward to not having to deal with getting all sorts
of different settings working to get the database working right.
I agree when you mention the idiosyncracies of the Windows version.
There really isn't much logging to the console window either to be
honest (other than exceptions / errors and the startup/initialization
script outputs). There is a sql_logs directory in the dbfarm folder,
however the two files inside contain practically nothing.
When doing a "SELECT Value1 FROM table WHERE key='key'" type
statement, the query time does decrease but it still is on the order
of 15-20 seconds which is not acceptable. In addition, I should note
that the data is loaded on disk in Timestamp order, and while there
may be duplicate timestamps next to each other there should be nothing
out of order there. I could see that using a CHAR() as the predicate
would cause things to slow down and I will certainly try to use an
integer key, but I think this shouldn't account for most of the poor
performance. I would consider loading the data ordered by symbol (or
the security id integer) but in that case I would imagine write
performance would be killed as each daily update would be written into
many different sections of the disk. As for the number of rows per
symbol, for minute data (what I am testing this with) it is on the
order of 100-390 observations per day, for a total of anywhere from
200k-700k rows for ~7 years of data.
When MonetDB starts up, it has ~13GB available to it, and as more
queries are run it's physical memory footprint grows until it runs
out. It never seems to release any of this memory until after the
MServer5 process is terminated.
I hope that information helps and I do appreciate the suggestions as I
dig further.
On Tue, Apr 1, 2014 at 9:07 PM, Brandon Jackson
We run MonetDB on Linux in a VM where resources are handed to it and it runs only MonetDB. I've watched this forum for a while and just recently was made aware of some of the differences in the Windows version. Logging for instance seems to show up in the console window as opposes to being written to a file inside the database farm directory, which I found shocking really. MonetDB is really a knobs free database, meaning that you do not have to index or partition, but do have to keep in mind that certain data types get stored differently and have advantages. Integers seem to fare better than variable length character strings.
50,000 securities is one thing, but you mention per minute data for most of them. How many rows do you guess in general per security? MonetDB, if I remember correctly, keeps the disk memory addresses of all the data items in RAM, which means that it is not the actual data elements which would be far bigger. Secondly, there is caching of query results after that. Perhaps your bulk loaded results are spread all over the slow disk physically, although finding them in memory was very quick. By bringing all columns back from your table, it's basically a select * from blah where key = kind of thing, while yields little IO savings. Try bringing back one column, or counting the sum of items grouped by something and see if you get large differences in performance. It may be insightful to know the raw on disk size of your database to just do some pencil math with the 16GB memory size in the machine you are working on. What's available to MonetDB when you are starting it? 11 GB may be?
I suspect a high disk size to RAM ratio with unsorted data being bulk loaded which lends to extra work on some level for the machine. I'd have to go searching forum history, but it seems like presorting can really help a lot if MonetDB can detect that about a column.
Sent from my iPhone
On Apr 1, 2014, at 7:38 PM, Dino Veritas
wrote: I have been using bulk loads and that has been great (although performance seems to decrease as more data is in the database). The update speed is not the problem. It's simple queries on the data that's my main problem. I first thought maybe MonetDB needed some time to build an index, but after waiting a while and running more queries, waiting again and running more it doesn't ever seem to improve.
Example: Table with structure like
, a simple query like "SELECT Timestamp,Value1,Value2,Value3,Value4 FROM table WHERE Key='key'" takes 90 seconds! I'm on a Windows box with 16GB of RAM and the system process is writing to disk a lot while the MServer5 is reading from disk, which tells me the source of the slowdown is likely disk IO as data is being moved into the disk swap file. I thought MonetDB was able to take advantage of memory mapping to avoid this kind of overhead. Any thoughts on how I can achieve better performance?
On Tue, Apr 1, 2014 at 3:48 PM, Brandon Jackson
wrote: Using single table and bulk loading updates to it in scheduled batches. I never use inserts in column stores like this.
Sent from my iPhone
On Apr 1, 2014, at 7:25 AM, Dino Veritas
wrote: Hi,
My firm is investigating the use of MonetDB as a potential market data store. I am interested in the opinion of those on this list as to how best to structure the data for optimum performance.
Basically, our data consists of ~50,000 instruments. One use we are looking at is for the storage of bar data, one for each minute.The resulting data for each bar is like
, with one observation per minute. While not all symbols will have an entry for each minute, many observations will share the same Timestamp. Is this an appropriate use for MonetDB? One schema I tried was to put all symbols into the same table, and to separate the tables by month. So for 5 years that would be 5*12=60 tables. A query along the lines of "SELECT * FROM "200801" WHERE symbol='IBM' UNION ALL SELECT * FROM "200802" WHERE symbol='IBM' UNION ALL ...)" The reads for a single table were quite good, however, when reading across multiple tables I encountered a lot of swap space and disk IO which drastically slowed down the query.
I also tried one table per symbol but found this to be simply untenable as INSERTS became extremely slow (on the order of minutes) and once again, massive swap space usage.
How would you partition this data? We will be inserting once per day in bulk, and I am not super concerned with the insert performance. However, query performance is paramount, with some users querying data very frequently for long time periods for single or groups of symbols.
The machine I am testing on has 16GB of RAM and is running Windows. However, the machine MonetDB would ultimately be deployed on would have much more RAM. Even so, the size of the results of a single query are less than 16GB and should fit into memory without issue.
So please help me, MonetDB gurus, with any suggestions as to how you would partition and store this data. I would appreciate any input. Thank you! _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list