Re: [MonetDB-users] Profiling queries

5 Dec 2007

      On Tue, Dec 04, 2007 at 12:56:27PM -0700, m h wrote:
...
Hello all-
First of all thanks to you guys for your great help these past few
weeks.  I really appreciate it.  I've learned a bit, and hopefully
your product is improving from my bug reports.
I've managed to pull a bit of my data from my proprietary db into
monetdb. (Have some 60 million rows in the fact table, pulled in 5
million)!  I even ran a query against it and got the same numbers
(though in a different order)!
So I did a very basic poor mans benchmark.  My python code can hit the
old db or monetdb by changing one line.  So I took timestamps around
the query execution portion.
The old db takes 0.21 seconds when cold.  Subsequent queries take
0.027-0.072 seconds
Monetdb takes 0.67 seconds when cold and 0.12-0.23 otherwise.
It appears that the old db is faster when cold than monetdb is after
being primed.
A few more caveats.  (Yes, this is a poor benchmark but after reading
pdfs saying that monetdb was 7 times faster than a 64 way oracle box,
I thought I'd get quite a bit of speedup.)  The old db is running
under vmware on a Core2 1.86Ghz box.  Monetdb is running on my laptop,
a Pentium M 1.86 Ghz machine.
cpu wise this would allready give you almost an par results as a core2 is
quite a bit faster than a pentium M. An important thing would be to
know the memory size of your laptop and the memory requirements of
your query. What kind of columns (type of the columns) are involved.
But lets try to get to the problem here. 

First thing to make sure if there is no overhead in the python code, ie
is it MonetDB/SQL server side or client side.

Could you therefor 
1) give the number of rows return (doesn't have to be exact)
2) try with mclient -t
...
Since the query is confidential I can't really put it here, but I can
describe it.  I'm pulling stuff out of my fact table and two other
dimension tables.  The select clause sums a few columns, and on one
sums the product of one column and the sum of a few others.  The
coalesce function is used to convert NULLs to 0.  The where statement
joins the tables using the = operator and then has two "IN" clauses.
You may have hit a problem with our "IN" clause handling. Are the
sizes sets your doing the IN against large?
...
Then there is a group by (since I'm aggregating in the select) and an
order by.
Are there any (foreign/primary/unique) key's involved?
We would like your to see your query/data. If its all too confidential
maybe you could anonimize it.
...
Any hints on how to increase performance?  Do I need to tune my query
or tweak other things?  (Yes I know the laptop machine is not apples
to apples to the vmware machine)
What you could do is run your query with 'trace' infront and see which
lower level monetdb statement takes a lot of time.
Maybe we can without disclosing your data extra fix the performance problem.

Niels
...
Am open to suggestions.
thanks much,
-matt
-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- 

Niels Nes, Centre for Mathematics and Computer Science (CWI)
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
room C0.02,  phone ++31 20 592-4098, fax ++31 20 592-4312
url: http://www.cwi.nl/~niels   e-mail: Niels.Nes@cwi.nl