Re: [MonetDB-users] Profiling queries

5 Dec 2007


      On Dec 5, 2007 12:40 AM, Niels Nes  wrote:
...
On Wed, Dec 05, 2007 at 12:25:36AM -0700, m h wrote:
...
...
On Tue, Dec 04, 2007 at 12:56:27PM -0700, m h wrote:
Like I said, it's not apples to apples.  Tomorrow I'll deploy monetdb
on the same machine running the vm'd db.  (But not under vmware, so
again not the exactly the same, but the hardware now should be faster
On Dec 5, 2007 12:07 AM, Niels Nes  wrote:
than the old db hardware since no virtualization).
My laptop has 1.5Gig and Monetdb was using between 200M to 1+ Gigs.
(It eats memory during loading the data.  I've got 67 Million rows to
load and after 4 Million I was swapping like crazy, but I'll save that
for another email/ticket).  After a restart memory goes down to 200M
again and I can query without stopping.
As I don't known your schema/data types I don't know if 1+ GB is much.
Monetdb likes memory ;-).
Yes, it does ;)
...
...
Machine running the old db (under VM) only has 400M allocated to it
(and also has 67 Million rows)!
But your old db is block oriented. And your data is somehow organised
that that doesn't hurt much too (any sorting or cluster indices?).
There are indexes on all the columns in the WHERE clause
...
...
You know what I'm willing to give you anonymous schemas and queries.
That would help a lot.
...
I don't know what to do with 67 Million rows in the fact table
though....
Well uploading should be a problem, but de-personalizing tyour
67 M fact table may be problem.
Seems like the easiest solution is to change the dimension tables....
...
...
...
But lets try to get to the problem here.
First thing to make sure if there is no overhead in the python code, ie
is it MonetDB/SQL server side or client side.
Good point.  Yes, I've been thinking that said python code for the
other db might be a bit faster than monetdb's python code.  One thing
to check out is profiling the python side.
...
Could you therefor
1) give the number of rows return (doesn't have to be exact)
Current query is returning 90 rows
...
2) try with mclient -t
I tried that before blowing away the data and got like 3 seconds
3 seconds is very high, -t gives times in millisecends.
...
(guess) but I also have 90 rows of output scrolling...  Is there a way
Simply redirect output to a file always helps.
Must have done that wrong.  Will try again after sleep.
...
...
to disable output, so you only measure the impact of the query and not
displaying the results?
I'll have to try it again tomorrow, I re-installed the latest build to
test the order by and blew away my data.
...
...
...
Since the query is confidential I can't really put it here, but I can
describe it.  I'm pulling stuff out of my fact table and two other
dimension tables.  The select clause sums a few columns, and on one
sums the product of one column and the sum of a few others.  The
coalesce function is used to convert NULLs to 0.  The where statement
joins the tables using the = operator and then has two "IN" clauses.
You may have hit a problem with our "IN" clause handling. Are the
sizes sets your doing the IN against large?
...
Then there is a group by (since I'm aggregating in the select) and an
order by.
Are there any (foreign/primary/unique) key's involved?
Primary and unique yes, no foreign.
No foreign make the joins expensive afcourse, are these single column
joins are are these joins over multiple columns?
So an optimization is to remove all foreign keys?  I'll need to double
check if there are FK's after sleep....  Obviously most entries in the
FACT table could be FKs.
...
...
...
We would like your to see your query/data. If its all too confidential
maybe you could anonimize it.
Am willing to work with you guys.  I'd like to come up with something
but the 67 Rows of data is around 3Gigs compressed....  Perhaps I can
find a way to get you some data...
We have a ftp service at cwi. Any way we could/should discuss this of the
list.
Yes, that sounds like the best way to handle.
...
...
...
...
Any hints on how to increase performance?  Do I need to tune my query
or tweak other things?  (Yes I know the laptop machine is not apples
to apples to the vmware machine)
What you could do is run your query with 'trace' infront and see which
lower level monetdb statement takes a lot of time.
Maybe we can without disclosing your data extra fix the performance problem.
Yeah, I ran with TRACE and tons of data scrolled by.  I need to learn
what it actually means (ran over my xterm buffer....).  The
intermediate language looks like perl to me... (sorry perl people ;))
So now i'm offended. Perl is not really a language I appriciate.
Same here redirect to a file and view with your favourite editor.
Sorry.  I don't like perl either.  I'll get used to the other language ;)

-matt