On Dec 5, 2007 12:07 AM, Niels Nes
wrote: On Tue, Dec 04, 2007 at 12:56:27PM -0700, m h wrote:
A few more caveats. (Yes, this is a poor benchmark but after reading pdfs saying that monetdb was 7 times faster than a 64 way oracle box, I thought I'd get quite a bit of speedup.) The old db is running under vmware on a Core2 1.86Ghz box. Monetdb is running on my laptop, a Pentium M 1.86 Ghz machine.
cpu wise this would allready give you almost an par results as a core2 is quite a bit faster than a pentium M. An important thing would be to know the memory size of your laptop and the memory requirements of your query. What kind of columns (type of the columns) are involved.
Like I said, it's not apples to apples. Tomorrow I'll deploy monetdb on the same machine running the vm'd db. (But not under vmware, so again not the exactly the same, but the hardware now should be faster than the old db hardware since no virtualization).
My laptop has 1.5Gig and Monetdb was using between 200M to 1+ Gigs. (It eats memory during loading the data. I've got 67 Million rows to load and after 4 Million I was swapping like crazy, but I'll save that for another email/ticket). After a restart memory goes down to 200M again and I can query without stopping. As I don't known your schema/data types I don't know if 1+ GB is much. Monetdb likes memory ;-).
Machine running the old db (under VM) only has 400M allocated to it (and also has 67 Million rows)! But your old db is block oriented. And your data is somehow organised
On Wed, Dec 05, 2007 at 12:25:36AM -0700, m h wrote: that that doesn't hurt much too (any sorting or cluster indices?).
You know what I'm willing to give you anonymous schemas and queries.
That would help a lot.
I don't know what to do with 67 Million rows in the fact table though.... Well uploading should be a problem, but de-personalizing tyour 67 M fact table may be problem.
But lets try to get to the problem here.
First thing to make sure if there is no overhead in the python code, ie is it MonetDB/SQL server side or client side.
Good point. Yes, I've been thinking that said python code for the other db might be a bit faster than monetdb's python code. One thing to check out is profiling the python side.
Could you therefor 1) give the number of rows return (doesn't have to be exact)
Current query is returning 90 rows
2) try with mclient -t
I tried that before blowing away the data and got like 3 seconds 3 seconds is very high, -t gives times in millisecends. (guess) but I also have 90 rows of output scrolling... Is there a way Simply redirect output to a file always helps. to disable output, so you only measure the impact of the query and not displaying the results?
I'll have to try it again tomorrow, I re-installed the latest build to test the order by and blew away my data.
Since the query is confidential I can't really put it here, but I can describe it. I'm pulling stuff out of my fact table and two other dimension tables. The select clause sums a few columns, and on one sums the product of one column and the sum of a few others. The coalesce function is used to convert NULLs to 0. The where statement joins the tables using the = operator and then has two "IN" clauses.
You may have hit a problem with our "IN" clause handling. Are the sizes sets your doing the IN against large?
Then there is a group by (since I'm aggregating in the select) and an order by. Are there any (foreign/primary/unique) key's involved?
Primary and unique yes, no foreign. No foreign make the joins expensive afcourse, are these single column joins are are these joins over multiple columns?
We would like your to see your query/data. If its all too confidential maybe you could anonimize it.
Am willing to work with you guys. I'd like to come up with something but the 67 Rows of data is around 3Gigs compressed.... Perhaps I can find a way to get you some data... We have a ftp service at cwi. Any way we could/should discuss this of the list.
Any hints on how to increase performance? Do I need to tune my query or tweak other things? (Yes I know the laptop machine is not apples to apples to the vmware machine)
What you could do is run your query with 'trace' infront and see which lower level monetdb statement takes a lot of time. Maybe we can without disclosing your data extra fix the performance problem.
Yeah, I ran with TRACE and tons of data scrolled by. I need to learn what it actually means (ran over my xterm buffer....). The intermediate language looks like perl to me... (sorry perl people ;)) So now i'm offended. Perl is not really a language I appriciate. Same here redirect to a file and view with your favourite editor.
Niels
-matt
------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl