[MonetDB-users] Performance improvements over equal queries limits to operators?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hello, I was quite surprised about the performance of some queries, that I expected to be heavy weighted, but actually performed pretty well. To show some things: sql>select * from kvk where postcode like '2265CA' and adres like '%7%'; 3 tuples (259.000ms) sql>select * from kvk where postcode like '2265CA' and adres like '%7%'; 3 tuples (259.000ms) sql>select * from kvk where postcode like '2265CA' and adres like '%7%'; 3 tuples (258.000ms) sql>select * from kvk where postcode like '2265CA' and adres like '%7%'; 3 tuples (257.000ms) sql>select * from kvk where postcode = '2265CA' and adres like '%7%'; 3 tuples (11.000ms) sql>select * from kvk where postcode = '2265CA' and adres like '%7%'; 3 tuples (6.000ms) sql>select * from kvk where postcode = '2265CA' and adres like '%7%'; 3 tuples (6.000ms) sql>select * from kvk where kvk = 273121520000; 1 tuple (473.000ms) sql>select * from kvk where kvk = 273121520000; 1 tuple (5.000ms) I wonder: what makes the 11ms to 6ms possible in the like vs = scenario? While a repeat of the like operator itself doesn't give a true improvement? As you can observe the like argument doesn't have any modifiers, so it should actually become a normal equal. (I noticed this later, goes beyond my original point, will file a bug for enhancement.) Also I'm quite surprised that the primary key lookup still takes a fair amount of time, could be i/o related obviously, but the performance on the second run is again huge, even for other arguments. Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEAREKAAYFAkysWw8ACgkQYH1+F2Rqwn19AgCgk9lkRBLcWdylxxz9M4B8a8f5 5JUAn1TOU1VxpvVfVWJMLwlwp9Pz4a0x =IqCY -----END PGP SIGNATURE-----
Stefan, since this changeset http://dev.monetdb.org/hg/MonetDB/rev/6e55ac43e39e The first (after server restart) point (equality) selection per persistent base column will not only scan the column, but as side effect also build a (for now still transient) hash table, that will speed-up any future point (equality) selection on that very column (until the server is restarted). That's the effect you see with your equality ("=") predicates. In fact, if you want to single out cost for loading a column from disk, building the hash table in memory (assuming the column easily fits in memory) and querying the hash table, you could restart your mserver5 and then run, e.g., the following three queries: select * from kvk where kvk between 273121519999 and 273121520001; -- range select; triggers data load from disk (or file system buffer); -- uses scan; does not build hash table. select * from kvk where kvk = 273121520000; -- point select; on (now) memory resident data; uses scan; builds hash table -- as side effect. select * from kvk where kvk = 273121520000; -- point select; uses previously built hash table. Obvioulsy, with LIKE predicate, you do not see these effects, as LIKE predicates (in general) are no point (equality) selections. Stefan On Wed, Oct 06, 2010 at 01:18:39PM +0200, Stefan de Konink wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hello,
I was quite surprised about the performance of some queries, that I expected to be heavy weighted, but actually performed pretty well.
To show some things: sql>select * from kvk where postcode like '2265CA' and adres like '%7%'; 3 tuples (259.000ms) sql>select * from kvk where postcode like '2265CA' and adres like '%7%'; 3 tuples (259.000ms) sql>select * from kvk where postcode like '2265CA' and adres like '%7%'; 3 tuples (258.000ms) sql>select * from kvk where postcode like '2265CA' and adres like '%7%'; 3 tuples (257.000ms)
sql>select * from kvk where postcode = '2265CA' and adres like '%7%'; 3 tuples (11.000ms) sql>select * from kvk where postcode = '2265CA' and adres like '%7%'; 3 tuples (6.000ms) sql>select * from kvk where postcode = '2265CA' and adres like '%7%'; 3 tuples (6.000ms)
sql>select * from kvk where kvk = 273121520000; 1 tuple (473.000ms) sql>select * from kvk where kvk = 273121520000; 1 tuple (5.000ms)
I wonder: what makes the 11ms to 6ms possible in the like vs = scenario? While a repeat of the like operator itself doesn't give a true improvement? As you can observe the like argument doesn't have any modifiers, so it should actually become a normal equal. (I noticed this later, goes beyond my original point, will file a bug for enhancement.)
Also I'm quite surprised that the primary key lookup still takes a fair amount of time, could be i/o related obviously, but the performance on the second run is again huge, even for other arguments.
Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREKAAYFAkysWw8ACgkQYH1+F2Rqwn19AgCgk9lkRBLcWdylxxz9M4B8a8f5 5JUAn1TOU1VxpvVfVWJMLwlwp9Pz4a0x =IqCY -----END PGP SIGNATURE-----
------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
participants (2)
-
Stefan de Konink
-
Stefan Manegold