On Dec 5, 2007 12:07 AM, Niels Nes
On Tue, Dec 04, 2007 at 12:56:27PM -0700, m h wrote:
A few more caveats. (Yes, this is a poor benchmark but after reading pdfs saying that monetdb was 7 times faster than a 64 way oracle box, I thought I'd get quite a bit of speedup.) The old db is running under vmware on a Core2 1.86Ghz box. Monetdb is running on my laptop, a Pentium M 1.86 Ghz machine.
cpu wise this would allready give you almost an par results as a core2 is quite a bit faster than a pentium M. An important thing would be to know the memory size of your laptop and the memory requirements of your query. What kind of columns (type of the columns) are involved.
Like I said, it's not apples to apples. Tomorrow I'll deploy monetdb on the same machine running the vm'd db. (But not under vmware, so again not the exactly the same, but the hardware now should be faster than the old db hardware since no virtualization). My laptop has 1.5Gig and Monetdb was using between 200M to 1+ Gigs. (It eats memory during loading the data. I've got 67 Million rows to load and after 4 Million I was swapping like crazy, but I'll save that for another email/ticket). After a restart memory goes down to 200M again and I can query without stopping. Machine running the old db (under VM) only has 400M allocated to it (and also has 67 Million rows)! You know what I'm willing to give you anonymous schemas and queries. I don't know what to do with 67 Million rows in the fact table though....
But lets try to get to the problem here.
First thing to make sure if there is no overhead in the python code, ie is it MonetDB/SQL server side or client side.
Good point. Yes, I've been thinking that said python code for the other db might be a bit faster than monetdb's python code. One thing to check out is profiling the python side.
Could you therefor 1) give the number of rows return (doesn't have to be exact)
Current query is returning 90 rows
2) try with mclient -t
I tried that before blowing away the data and got like 3 seconds (guess) but I also have 90 rows of output scrolling... Is there a way to disable output, so you only measure the impact of the query and not displaying the results? I'll have to try it again tomorrow, I re-installed the latest build to test the order by and blew away my data.
Since the query is confidential I can't really put it here, but I can describe it. I'm pulling stuff out of my fact table and two other dimension tables. The select clause sums a few columns, and on one sums the product of one column and the sum of a few others. The coalesce function is used to convert NULLs to 0. The where statement joins the tables using the = operator and then has two "IN" clauses.
You may have hit a problem with our "IN" clause handling. Are the sizes sets your doing the IN against large?
Then there is a group by (since I'm aggregating in the select) and an order by. Are there any (foreign/primary/unique) key's involved?
Primary and unique yes, no foreign.
We would like your to see your query/data. If its all too confidential maybe you could anonimize it.
Am willing to work with you guys. I'd like to come up with something but the 67 Rows of data is around 3Gigs compressed.... Perhaps I can find a way to get you some data...
Any hints on how to increase performance? Do I need to tune my query or tweak other things? (Yes I know the laptop machine is not apples to apples to the vmware machine)
What you could do is run your query with 'trace' infront and see which lower level monetdb statement takes a lot of time. Maybe we can without disclosing your data extra fix the performance problem.
Yeah, I ran with TRACE and tons of data scrolled by. I need to learn what it actually means (ran over my xterm buffer....). The intermediate language looks like perl to me... (sorry perl people ;)) -matt