Re: MonetDB Performance and Data Schema

2 Apr 2014

      Hi Brandon,

First I would like to say I really appreciate your assistance. I'm
grateful for your help in figuring this out. On paper, MonetDB seems
like it could be very good for our purposes and it's been frustrating
it hasn't worked out that way. I do understand that it is auto-tuning
and I was looking forward to not having to deal with getting all sorts
of different settings working to get the database working right.

I agree when you mention the idiosyncracies of the Windows version.
There really isn't much logging to the console window either to be
honest (other than exceptions / errors and the startup/initialization
script outputs). There is a sql_logs directory in the dbfarm folder,
however the two files inside contain practically nothing.

When doing a "SELECT Value1 FROM table WHERE key='key'" type
statement, the query time does decrease but it still is on the order
of 15-20 seconds which is not acceptable. In addition, I should note
that the data is loaded on disk in Timestamp order, and while there
may be duplicate timestamps next to each other there should be nothing
out of order there. I could see that using a CHAR() as the predicate
would cause things to slow down and I will certainly try to use an
integer key, but I think this shouldn't account for most of the poor
performance. I would consider loading the data ordered by symbol (or
the security id integer) but in that case I would imagine write
performance would be killed as each daily update would be written into
many different sections of the disk. As for the number of rows per
symbol, for minute data (what I am testing this with) it is on the
order of 100-390 observations per day, for a total of anywhere from
200k-700k rows for ~7 years of data.

When MonetDB starts up, it has ~13GB available to it, and as more
queries are run it's physical memory footprint grows until it runs
out. It never seems to release any of this memory until after the
MServer5 process is terminated.

I hope that information helps and I do appreciate the suggestions as I
dig further.

On Tue, Apr 1, 2014 at 9:07 PM, Brandon Jackson  wrote:
...
We run MonetDB on Linux in a VM where resources are handed to it and it runs only MonetDB. I've watched this forum for a while and just recently was made aware of some of the differences in the Windows version.  Logging for instance seems to show up in the console window as opposes to being written to a file inside the database farm directory, which I found shocking really.  MonetDB is really a knobs free database, meaning that you do not have to index or partition, but do have to keep in mind that certain data types get stored differently and have advantages.  Integers seem to fare better than variable length character strings.
50,000 securities is one thing, but you mention per minute data for most of them.  How many rows do you guess in general per security?  MonetDB, if I remember correctly, keeps the disk memory addresses of all the data items in RAM, which means that it is not the actual data elements which would be far bigger.  Secondly, there is caching of query results after that.  Perhaps your bulk loaded results are spread all over the slow disk physically, although finding them in memory was very quick.  By bringing all columns back from your table, it's basically a select * from blah where key =  kind of thing, while yields little IO savings.   Try bringing back one column, or counting the sum of items grouped by something and see if you get large differences in performance.   It may be insightful to know the raw on disk size of your database to just do some pencil math with the 16GB memory size in the machine you are working on.   What's available to MonetDB when you are starting it?  11 GB may
 be?
I suspect a high disk size to RAM ratio with unsorted data being bulk loaded which lends to extra work on some level for the machine.  I'd have to go searching forum history, but it seems like presorting can really help a lot if MonetDB can detect that about a column.
Sent from my iPhone
...
On Apr 1, 2014, at 7:38 PM, Dino Veritas  wrote:
I have been using bulk loads and that has been great (although
performance seems to decrease as more data is in the database). The
update speed is not the problem. It's simple queries on the data
that's my main problem. I first thought maybe MonetDB needed some time
to build an index, but after waiting a while and running more queries,
waiting again and running more it doesn't ever seem to improve.
Example: Table with structure like , a simple query like "SELECT
Timestamp,Value1,Value2,Value3,Value4 FROM table WHERE Key='key'"
takes 90 seconds! I'm on a Windows box with 16GB of RAM and the system
process is writing to disk a lot while the MServer5 is reading from
disk, which tells me the source of the slowdown is likely disk IO as
data is being moved into the disk swap file. I thought MonetDB was
able to take advantage of memory mapping to avoid this kind of
overhead.
Any thoughts on how I can achieve better performance?
...
On Tue, Apr 1, 2014 at 3:48 PM, Brandon Jackson  wrote:
Using single table and bulk loading updates to it in scheduled batches.
I never use inserts in column stores like this.
Sent from my iPhone
...
On Apr 1, 2014, at 7:25 AM, Dino Veritas  wrote:
Hi,
My firm is investigating the use of MonetDB as a potential market data
store. I am interested in the opinion of those on this list as to how
best to structure the data for optimum performance.
Basically, our data consists of ~50,000 instruments. One use we are
looking at is for the storage of bar data, one for each minute.The
resulting data for each bar is like
, with one observation per
minute. While not all symbols will have an entry for each minute, many
observations will share the same Timestamp.
Is this an appropriate use for MonetDB? One schema I tried was to put
all symbols into the same table, and to separate the tables by month.
So for 5 years that would be 5*12=60 tables. A query along the lines
of "SELECT * FROM "200801" WHERE symbol='IBM' UNION ALL SELECT * FROM
"200802" WHERE symbol='IBM' UNION ALL ...)" The reads for a single
table were quite good, however, when reading across multiple tables I
encountered a lot of swap space and disk IO which drastically slowed
down the query.
I also tried one table per symbol but found this to be simply
untenable as INSERTS became extremely slow (on the order of minutes)
and once again, massive swap space usage.
How would you partition this data? We will be inserting once per day
in bulk, and I am not super concerned with the insert performance.
However, query performance is paramount, with some users querying data
very frequently for long time periods for single or groups of symbols.
The machine I am testing on has 16GB of RAM and is running Windows.
However, the machine MonetDB would ultimately be deployed on would
have much more RAM. Even so, the size of the results of a single query
are less than 16GB and should fit into memory without issue.
So please help me, MonetDB gurus, with any suggestions as to how you
would partition and store this data. I would appreciate any input.
Thank you!
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list