On Tue, Dec 04, 2007 at 12:56:27PM -0700, m h wrote:
Hello all-
First of all thanks to you guys for your great help these past few weeks. I really appreciate it. I've learned a bit, and hopefully your product is improving from my bug reports.
I've managed to pull a bit of my data from my proprietary db into monetdb. (Have some 60 million rows in the fact table, pulled in 5 million)! I even ran a query against it and got the same numbers (though in a different order)!
So I did a very basic poor mans benchmark. My python code can hit the old db or monetdb by changing one line. So I took timestamps around the query execution portion.
The old db takes 0.21 seconds when cold. Subsequent queries take 0.027-0.072 seconds
Monetdb takes 0.67 seconds when cold and 0.12-0.23 otherwise.
It appears that the old db is faster when cold than monetdb is after being primed.
A few more caveats. (Yes, this is a poor benchmark but after reading pdfs saying that monetdb was 7 times faster than a 64 way oracle box, I thought I'd get quite a bit of speedup.) The old db is running under vmware on a Core2 1.86Ghz box. Monetdb is running on my laptop, a Pentium M 1.86 Ghz machine. cpu wise this would allready give you almost an par results as a core2 is quite a bit faster than a pentium M. An important thing would be to know the memory size of your laptop and the memory requirements of your query. What kind of columns (type of the columns) are involved.
But lets try to get to the problem here. First thing to make sure if there is no overhead in the python code, ie is it MonetDB/SQL server side or client side. Could you therefor 1) give the number of rows return (doesn't have to be exact) 2) try with mclient -t
Since the query is confidential I can't really put it here, but I can describe it. I'm pulling stuff out of my fact table and two other dimension tables. The select clause sums a few columns, and on one sums the product of one column and the sum of a few others. The coalesce function is used to convert NULLs to 0. The where statement joins the tables using the = operator and then has two "IN" clauses.
You may have hit a problem with our "IN" clause handling. Are the sizes sets your doing the IN against large?
Then there is a group by (since I'm aggregating in the select) and an order by. Are there any (foreign/primary/unique) key's involved?
We would like your to see your query/data. If its all too confidential maybe you could anonimize it.
Any hints on how to increase performance? Do I need to tune my query or tweak other things? (Yes I know the laptop machine is not apples to apples to the vmware machine)
What you could do is run your query with 'trace' infront and see which lower level monetdb statement takes a lot of time. Maybe we can without disclosing your data extra fix the performance problem. Niels
Am open to suggestions.
thanks much,
-matt
------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl