[MonetDB-users] Parallelization
Hi, I'm playing with parallization of databases and I am a little confused about my result on MonetDB. Scenario: 15 Millions of Entries and i want to find the number of entries where the attributes are in a certain sector (i know, a simple select would do the work but i wanted a simple example) i have java code that get all 15 Mio. Data from the database via JDBC and iterates over them with a loop without multithreading. --> 13 ms then i created a stored function: CREATE FUNCTION isInRangexy(x DECIMAL(10,7), y DECIMAL(10,7)) RETURNS BOOL BEGIN ATOMIC IF x > 50 AND y < 50 THEN RETURN true; ELSE RETURN false; END IF; END Then I call it with: SELECT COUNT(*) FROM location WHERE isInRange(x,y) It works, but it is more than 3 times slower than the first java option. I had thought i would get benefit from the fact that i don't need to transfer all the data to the application and of some parallelization and other optimization from monetdb. Did I made a mistake? Can I use another example to get the anticipated results? thanks, michael
Hi When you wish the understand some of these performance issues, then an explain of the query helps. In this particular case, you will notice that the plan will contain an iterator over all elements. So your time is spent in the MAL interpreter, rather then dong useful database processing. Moreover it contains random probes and stores the result in a table ... sql> explain select * from t1 where isInRangexy(x,y); ... =barrier (X_58,X_59,X_60) := bat.newIterator(X_17); = X_62 := algebra.find(X_25,X_59); = X_64 := user.isinrangexy(X_60,X_62); = bat.insert(X_55,X_59,X_64); = redo (X_58,X_59,X_60) := bat.hasMoreElements(X_17); =exit (X_58,X_59,X_60); ... This is the fall back scenario when there are no bulk operations defined or when the system reaches the end of the optimizer(s) opportunities to make general steps. When you trace the performance e.g. using stethoscope you will get the confirmation. The following would be a more relevant plan. explain select * from t1 where x>50 and y <50; In this particular case, the rewriting to this query would be possible if the function was defined as follows CREATE FUNCTION isInRangexy(x DECIMAL(10,7), y DECIMAL(10,7)) RETURNS BOOL BEGIN ATOMIC RETURN x > 50 AND y < 50; END regards, Martin On 6/20/11 10:45 AM, Michael Kusber wrote:
Hi,
I'm playing with parallization of databases and I am a little confused about my result on MonetDB. Scenario: 15 Millions of Entries and i want to find the number of entries where the attributes are in a certain sector (i know, a simple select would do the work but i wanted a simple example)
i have java code that get all 15 Mio. Data from the database via JDBC and iterates over them with a loop without multithreading. --> 13 ms
then i created a stored function:
CREATE FUNCTION isInRangexy(x DECIMAL(10,7), y DECIMAL(10,7)) RETURNS BOOL BEGIN ATOMIC IF x > 50 AND y < 50 THEN RETURN true; ELSE RETURN false; END IF; END
Then I call it with: SELECT COUNT(*) FROM location WHERE isInRange(x,y)
It works, but it is more than 3 times slower than the first java option. I had thought i would get benefit from the fact that i don't need to transfer all the data to the application and of some parallelization and other optimization from monetdb.
Did I made a mistake? Can I use another example to get the anticipated results?
thanks, michael
------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
Martin Kersten
-
Michael Kusber