Re: [MonetDB-users] Profiling queries

5 Dec 2007

      On Dec 5, 2007 12:07 AM, Niels Nes  wrote:
...
On Tue, Dec 04, 2007 at 12:56:27PM -0700, m h wrote:
...
A few more caveats.  (Yes, this is a poor benchmark but after reading
pdfs saying that monetdb was 7 times faster than a 64 way oracle box,
I thought I'd get quite a bit of speedup.)  The old db is running
under vmware on a Core2 1.86Ghz box.  Monetdb is running on my laptop,
a Pentium M 1.86 Ghz machine.
cpu wise this would allready give you almost an par results as a core2 is
quite a bit faster than a pentium M. An important thing would be to
know the memory size of your laptop and the memory requirements of
your query. What kind of columns (type of the columns) are involved.
Like I said, it's not apples to apples.  Tomorrow I'll deploy monetdb
on the same machine running the vm'd db.  (But not under vmware, so
again not the exactly the same, but the hardware now should be faster
than the old db hardware since no virtualization).

My laptop has 1.5Gig and Monetdb was using between 200M to 1+ Gigs.
(It eats memory during loading the data.  I've got 67 Million rows to
load and after 4 Million I was swapping like crazy, but I'll save that
for another email/ticket).  After a restart memory goes down to 200M
again and I can query without stopping.

Machine running the old db (under VM) only has 400M allocated to it
(and also has 67 Million rows)!

You know what I'm willing to give you anonymous schemas and queries.
I don't know what to do with 67 Million rows in the fact table
though....
...
But lets try to get to the problem here.
First thing to make sure if there is no overhead in the python code, ie
is it MonetDB/SQL server side or client side.
Good point.  Yes, I've been thinking that said python code for the
other db might be a bit faster than monetdb's python code.  One thing
to check out is profiling the python side.
...
Could you therefor
1) give the number of rows return (doesn't have to be exact)
Current query is returning 90 rows
...
2) try with mclient -t
I tried that before blowing away the data and got like 3 seconds
(guess) but I also have 90 rows of output scrolling...  Is there a way
to disable output, so you only measure the impact of the query and not
displaying the results?

I'll have to try it again tomorrow, I re-installed the latest build to
test the order by and blew away my data.
...
...
Since the query is confidential I can't really put it here, but I can
describe it.  I'm pulling stuff out of my fact table and two other
dimension tables.  The select clause sums a few columns, and on one
sums the product of one column and the sum of a few others.  The
coalesce function is used to convert NULLs to 0.  The where statement
joins the tables using the = operator and then has two "IN" clauses.
You may have hit a problem with our "IN" clause handling. Are the
sizes sets your doing the IN against large?
...
Then there is a group by (since I'm aggregating in the select) and an
order by.
Are there any (foreign/primary/unique) key's involved?
Primary and unique yes, no foreign.
...
We would like your to see your query/data. If its all too confidential
maybe you could anonimize it.
Am willing to work with you guys.  I'd like to come up with something
but the 67 Rows of data is around 3Gigs compressed....  Perhaps I can
find a way to get you some data...
...
...
Any hints on how to increase performance?  Do I need to tune my query
or tweak other things?  (Yes I know the laptop machine is not apples
to apples to the vmware machine)
What you could do is run your query with 'trace' infront and see which
lower level monetdb statement takes a lot of time.
Maybe we can without disclosing your data extra fix the performance problem.
Yeah, I ran with TRACE and tons of data scrolled by.  I need to learn
what it actually means (ran over my xterm buffer....).  The
intermediate language looks like perl to me... (sorry perl people ;))

-matt