[MonetDB-users] Why does MonetDB take indexes not into account?
Hi, I basically have this schema: CREATE TABLE "act" ( "id" int NOT NULL auto_increment, "title" text NOT NULL, "is_closed" bool NOT NULL, PRIMARY KEY ("id") ); CREATE INDEX act_isClosedIndex ON act (is_closed); CREATE TABLE "entry" ( "id" int NOT NULL auto_increment, "note" text NOT NULL, "act_id" int NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "entry_act_id_exists" FOREIGN KEY ("act_id") REFERENCES "act" ("id") ); CREATE INDEX entry_actIndex ON entry (act_id); Each "act" has several "entries" and each "entry" belongs to one "act". *) full-text search in the "note" column over all entries: SELECT * FROM entry WHERE entry.note LIKE ('%hui%') MySQL: 4.5 Seconds MonetDB: 0.8 Seconds So this is very good and this motivates me to switch to MonetDB. However, the majority of the acts have the "is_closed" field set. The usual case in my application is to take into account only the entries of open acts: *) full-text search in the "note" column over all entries of the open acts: SELECT * FROM entry, act WHERE entry.act_id = act.id AND act.is_closed = 0 AND entry.note LIKE ('%hui%') MySQL: 0.26 Seconds MonetDB: 0.8 Seconds Here, MySQL can take advantage of the index on the is_closed column, whereas MonetDB does not. Is this intended? Databases without indexes seem strange to me. Markus
Markus Gritsch wrote:
Hi,
I basically have this schema:
CREATE TABLE "act" ( "id" int NOT NULL auto_increment, "title" text NOT NULL, "is_closed" bool NOT NULL, PRIMARY KEY ("id") ); CREATE INDEX act_isClosedIndex ON act (is_closed);
CREATE TABLE "entry" ( "id" int NOT NULL auto_increment, "note" text NOT NULL, "act_id" int NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "entry_act_id_exists" FOREIGN KEY ("act_id") REFERENCES "act" ("id") ); CREATE INDEX entry_actIndex ON entry (act_id);
Each "act" has several "entries" and each "entry" belongs to one "act".
*) full-text search in the "note" column over all entries:
SELECT * FROM entry WHERE entry.note LIKE ('%hui%')
MySQL: 4.5 Seconds MonetDB: 0.8 Seconds
So this is very good and this motivates me to switch to MonetDB. However, the majority of the acts have the "is_closed" field set. The usual case in my application is to take into account only the entries of open acts:
*) full-text search in the "note" column over all entries of the open acts:
SELECT * FROM entry, act WHERE entry.act_id = act.id AND act.is_closed = 0 AND entry.note LIKE ('%hui%')
MySQL: 0.26 Seconds MonetDB: 0.8 Seconds
Here, MySQL can take advantage of the index on the is_closed column, whereas MonetDB does not. Is this intended? Databases without indexes seem strange to me.
it has little to do with the indices, but more with the way the SQL optimizer produces a plan is my impression. Perhaps Niels sees an opportunity or missed opportunity for the plan generator. MonetDB internally uses (hash-)indices, which are created for the duration of the session. And the holy grale for database designers is to create a self-organizing system, one that learns how to index without user interaction. In this example, it would be interesting to see what a second call to the same query produces.
Markus
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Mon, Nov 26, 2007 at 07:53:21AM +0100, Martin Kersten wrote:
Markus Gritsch wrote:
Hi,
I basically have this schema:
CREATE TABLE "act" ( "id" int NOT NULL auto_increment, "title" text NOT NULL, "is_closed" bool NOT NULL, PRIMARY KEY ("id") ); CREATE INDEX act_isClosedIndex ON act (is_closed);
CREATE TABLE "entry" ( "id" int NOT NULL auto_increment, "note" text NOT NULL, "act_id" int NOT NULL, PRIMARY KEY ("id"), CONSTRAINT "entry_act_id_exists" FOREIGN KEY ("act_id") REFERENCES "act" ("id") ); CREATE INDEX entry_actIndex ON entry (act_id);
Each "act" has several "entries" and each "entry" belongs to one "act".
*) full-text search in the "note" column over all entries:
SELECT * FROM entry WHERE entry.note LIKE ('%hui%')
MySQL: 4.5 Seconds MonetDB: 0.8 Seconds
So this is very good and this motivates me to switch to MonetDB. However, the majority of the acts have the "is_closed" field set. The usual case in my application is to take into account only the entries of open acts:
*) full-text search in the "note" column over all entries of the open acts:
SELECT * FROM entry, act WHERE entry.act_id = act.id AND act.is_closed = 0 AND entry.note LIKE ('%hui%')
MySQL: 0.26 Seconds MonetDB: 0.8 Seconds As the time is remarkably close to the earlier It looks like monet does the 'LIKE' before the 'is_closed = 0'. Looking at the 'explain' output indeed shows this too. We could improve the optimizer for these cases.
Niels
Here, MySQL can take advantage of the index on the is_closed column, whereas MonetDB does not. Is this intended? Databases without indexes seem strange to me.
it has little to do with the indices, but more with the way the SQL optimizer produces a plan is my impression. Perhaps Niels sees an opportunity or missed opportunity for the plan generator.
MonetDB internally uses (hash-)indices, which are created for the duration of the session. And the holy grale for database designers is to create a self-organizing system, one that learns how to index without user interaction.
In this example, it would be interesting to see what a second call to the same query produces.
Markus
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Nov 26, 2007 7:53 AM, Martin Kersten
MonetDB internally uses (hash-)indices, which are created for the duration of the session. And the holy grale for database designers is to create a self-organizing system, one that learns how to index without user interaction.
Yes, this would be the case in an ideal world :) Nevertheless the system could take some hints from the schema designer into account (via indexes), in case he exactly knows how the schema is queried and what columns can benefit greatly from having an indexed on them. Markus
Markus Gritsch wrote:
On Nov 26, 2007 7:53 AM, Martin Kersten
wrote: MonetDB internally uses (hash-)indices, which are created for the duration of the session. And the holy grale for database designers is to create a self-organizing system, one that learns how to index without user interaction.
Yes, this would be the case in an ideal world :) Nevertheless the system could take some hints from the schema designer into account (via indexes), in case he exactly knows how the schema is queried and what columns can benefit greatly from having an indexed on them.
Markus, Sure. It does and ignores it ;-) If the workload is stable then the system should be able to detect and exploit it. For the kind of research activities we undertake, have a look at the Database Cracking papers, which ensures building an index behind the scene based on the actual workload. Anyway, we greatly appreciate your response and experiences. It really helps us to improve the product base.
Markus
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Mon, Nov 26, 2007 at 08:41:31AM +0100, Markus Gritsch wrote:
On Nov 26, 2007 7:53 AM, Martin Kersten
wrote: MonetDB internally uses (hash-)indices, which are created for the duration of the session. And the holy grale for database designers is to create a self-organizing system, one that learns how to index without user interaction.
Yes, this would be the case in an ideal world :) Nevertheless the system could take some hints from the schema designer into account (via indexes), in case he exactly knows how the schema is queried and what columns can benefit greatly from having an indexed on them.
Agreed. However, if the cost of maintaining the indices under updates exceeds the (potential) benefits of exploiting the indices for queries, the investment is debatable. Or, in other words, even without "old-fashioned" indices, MonetDB/SQL appears to outperform MySQL by more than a factor 3 (in your case). Hence, we still have some margin/time to finish our hard research work on (a.o.) self-organizing query-driven adaptive indices --- that work even without "hints from the schema designer", e.g., in case it is not a priory known how a schema is queried, or which columns could bebeit from an index --- to hopefully get the very promising initial results of a prototypical implementation stable and robust enough for the product version ;-) Stefan
Markus
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
On Nov 26, 2007 9:14 AM, Stefan Manegold
On Mon, Nov 26, 2007 at 08:41:31AM +0100, Markus Gritsch wrote:
Yes, this would be the case in an ideal world :) Nevertheless the system could take some hints from the schema designer into account (via indexes), in case he exactly knows how the schema is queried and what columns can benefit greatly from having an indexed on them.
However, if the cost of maintaining the indices under updates exceeds the (potential) benefits of exploiting the indices for queries, the investment is debatable.
True. But there are cases (as in mine) where the cost of maintaining the indices does not exceed the benefit of having indexes for queries. Having fast queries without specifying an index would of course be ideal, so I would love to see MonetDB achieving this.
Or, in other words, even without "old-fashioned" indices, MonetDB/SQL appears to outperform MySQL by more than a factor 3 (in your case).
Please look at my e-mail again. MonetDB outperforms MySQL in my case only when *not* using the index i.e. querying over the "entries" of *all* "acts", which is very good. Unfortunately, as already posted in my original e-mail, in the common case (in my application) of querying only over the "entries" of the *open* "acts", MonetDB is 3x slower than MySQL. Markus
On Nov 26, 2007 9:14 AM, Stefan Manegold
wrote: On Mon, Nov 26, 2007 at 08:41:31AM +0100, Markus Gritsch wrote:
Yes, this would be the case in an ideal world :) Nevertheless the system could take some hints from the schema designer into account (via indexes), in case he exactly knows how the schema is queried and what columns can benefit greatly from having an indexed on them.
However, if the cost of maintaining the indices under updates exceeds the (potential) benefits of exploiting the indices for queries, the investment is debatable.
True. But there are cases (as in mine) where the cost of maintaining the indices does not exceed the benefit of having indexes for queries. Having fast queries without specifying an index would of course be ideal, so I would love to see MonetDB achieving this.
Or, in other words, even without "old-fashioned" indices, MonetDB/SQL appears to outperform MySQL by more than a factor 3 (in your case).
Please look at my e-mail again. MonetDB outperforms MySQL in my case only when *not* using the index i.e. querying over the "entries" of *all* "acts", which is very good. Unfortunately, as already posted in my original e-mail, in the common case (in my application) of querying only over the "entries" of the *open* "acts", MonetDB is 3x slower than MySQL. But even then its not the index (or lack of it) causing the slowdown. It the optimizer taking the in efficient route. With or without
On Mon, Nov 26, 2007 at 09:37:48AM +0100, Markus Gritsch wrote: the index it should have taken the route via your 'is_closed = 0'. I will look into this problem, but changing the optimizer is always with great causion as it may cause unwanted side effects (performance degredation) on other queries. Niels
Markus
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Nov 26, 2007 9:42 AM, Niels Nes
I will look into this problem, but changing the optimizer is always
Looking forward to it :)
with great causion as it may cause unwanted side effects (performance degredation) on other queries.
Reminds me of using Oracle where joining a completely unrelated table to a query (which motivated Oracle to use another optimizer) greatly sped up the select... :/ Good luck, Markus
On Mon, Nov 26, 2007 at 09:49:53AM +0100, Markus Gritsch wrote:
On Nov 26, 2007 9:42 AM, Niels Nes
wrote: I will look into this problem, but changing the optimizer is always
Looking forward to it :)
any change I could have access to the test data set your using? This would make it much easier to test the performance improvements. Niels
with great causion as it may cause unwanted side effects (performance degredation) on other queries.
Reminds me of using Oracle where joining a completely unrelated table to a query (which motivated Oracle to use another optimizer) greatly sped up the select... :/
Good luck, Markus
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On 27/11/2007, Niels Nes
On Mon, Nov 26, 2007 at 09:49:53AM +0100, Markus Gritsch wrote:
On Nov 26, 2007 9:42 AM, Niels Nes
wrote: I will look into this problem, but changing the optimizer is always
Looking forward to it :)
any change I could have access to the test data set your using? This would make it much easier to test the performance improvements.
No, they are somehow confidential, but I will try to produce some dummy dataset, which the same behavior. Markus
On 27/11/2007, Niels Nes
wrote: On Mon, Nov 26, 2007 at 09:49:53AM +0100, Markus Gritsch wrote:
On Nov 26, 2007 9:42 AM, Niels Nes
wrote: I will look into this problem, but changing the optimizer is always
Looking forward to it :)
any change I could have access to the test data set your using? This would make it much easier to test the performance improvements.
No, they are somehow confidential, but I will try to produce some dummy dataset, which the same behavior.
On Tue, Nov 27, 2007 at 10:32:52PM +0100, Markus Gritsch wrote: thats fine, thanks Niels
Markus
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On 27/11/2007, Niels Nes
On 27/11/2007, Niels Nes
wrote: On Mon, Nov 26, 2007 at 09:49:53AM +0100, Markus Gritsch wrote:
On Nov 26, 2007 9:42 AM, Niels Nes
wrote: I will look into this problem, but changing the optimizer is always
Looking forward to it :)
any change I could have access to the test data set your using? This would make it much easier to test the performance improvements.
No, they are somehow confidential, but I will try to produce some dummy dataset, which the same behavior.
On Tue, Nov 27, 2007 at 10:32:52PM +0100, Markus Gritsch wrote: thats fine, thanks
You can download a dataset from http://xile.org/le/data.zip
mclient.exe -lsql < DDL.sql mclient.exe -lsql < DATA2.sql > log.txt
SELECT * FROM entry WHERE entry.note LIKE ('%kawilokare%') SELECT * FROM entry, act WHERE entry.act_id = act.id AND act.is_closed = 0 AND entry.note LIKE ('%kawilokare%') Both queries take about 0.5 seconds on my Notebook using MonetDB. Markus
On Tue, Nov 27, 2007 at 11:53:09PM +0100, Markus Gritsch wrote:
On 27/11/2007, Niels Nes
wrote: On 27/11/2007, Niels Nes
wrote: On Mon, Nov 26, 2007 at 09:49:53AM +0100, Markus Gritsch wrote:
On Nov 26, 2007 9:42 AM, Niels Nes
wrote: I will look into this problem, but changing the optimizer is always
Looking forward to it :)
any change I could have access to the test data set your using? This would make it much easier to test the performance improvements.
No, they are somehow confidential, but I will try to produce some dummy dataset, which the same behavior.
On Tue, Nov 27, 2007 at 10:32:52PM +0100, Markus Gritsch wrote: thats fine, thanks
You can download a dataset from http://xile.org/le/data.zip
mclient.exe -lsql < DDL.sql mclient.exe -lsql < DATA2.sql > log.txt
SELECT * FROM entry WHERE entry.note LIKE ('%kawilokare%')
SELECT * FROM entry, act WHERE entry.act_id = act.id AND act.is_closed = 0 AND entry.note LIKE ('%kawilokare%')
Both queries take about 0.5 seconds on my Notebook using MonetDB.
I got the second query down too 0.125 seconds (on my desktop). It also had about 0.5 seconds before. This is currently tested only on the current. I'll have to backport it to the stable and make it less specific too your query. To shortly explain the problem. The optimizer nicely finds out that you have an foreign key. This foreign key comes with a join index. Unfortunately when having these indices we do the selects on the base columns, ie on the columns of entry and act independend of the join. And only after that we use the index to combine these results. The improved optimizer will first do the select on the primary key 'column', use the join-index, then the select on this reduced column. Niels
Markus
------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On 28/11/2007, Niels Nes
The optimizer nicely finds out that you have an foreign key. This foreign key comes with a join index. Unfortunately when having these indices we do the selects on the base columns, ie on the columns of entry and act independend of the join. And only after that we use the index to combine these results. The improved optimizer will first do the select on the primary key 'column', use the join-index, then the select on this reduced column.
Great. Lets hope this change does not degrade preformance in other cases. Do you have any test suit, which runs various queries, measures their performance, and indicate if performance decreased? Markus
On Wed, Nov 28, 2007 at 09:23:01AM +0100, Markus Gritsch wrote:
On 28/11/2007, Niels Nes
wrote: The optimizer nicely finds out that you have an foreign key. This foreign key comes with a join index. Unfortunately when having these indices we do the selects on the base columns, ie on the columns of entry and act independend of the join. And only after that we use the index to combine these results. The improved optimizer will first do the select on the primary key 'column', use the join-index, then the select on this reduced column.
Great. Lets hope this change does not degrade preformance in other cases. Do you have any test suit, which runs various queries,
Yes: http://monetdb.cwi.nl/projects/monetdb/Development/TestWeb/index.html http://monetdb.cwi.nl/projects/monetdb/Development/TestWeb/Stable/index.html http://monetdb.cwi.nl/testing/projects/monetdb/Stable/sql5/.mTests103/index_...
measures their performance,
Kind-of: http://monetdb.cwi.nl/testing/projects/monetdb/Stable/sql5/.mTests103/times....
and indicate if performance decreased?
unfortunately not yet. For now, our (nightly automatic) testing effort are mainly focused on ensuring and monitoring functinal correctness and portability. Performance monitoring beyond measuring and archiving the execution times of the functionallity tests is on our wish-list, but so far, each day ended after just 24 hours ... ;-) Stefan
Markus
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
On Mon, Nov 26, 2007 at 09:37:48AM +0100, Markus Gritsch wrote:
Please look at my e-mail again. MonetDB outperforms MySQL in my case only when *not* using the index i.e. querying over the "entries" of *all* "acts", which is very good. Unfortunately, as already posted in my original e-mail, in the common case (in my application) of querying only over the "entries" of the *open* "acts", MonetDB is 3x slower than MySQL.
Sorry, my fault. I read MySQL: 0.26 Seconds MonetDB: 0.8 Seconds as MySQL: 2.6 Seconds MonetDB: 0.8 Seconds (... wishful thinking ... ? ;-)) Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
participants (4)
-
Markus Gritsch
-
Martin Kersten
-
Niels Nes
-
Stefan Manegold