[MonetDB-users] Clustered Index and Merge Join
I have two tables of the form R1(x INTEGER, y INTEGER) R2(x INTEGER, y INTEGER) where x and y both are NOT NULL and may both contain duplicate values (however, the combination (a,b) is unique). My questions are the following: a.) Is there a possibility to create a /clustered/ index on the x-column (or a clustered combined x-y index?). I could also create a PRIMARY KEY on (x,y). Would this result in a clustered index? Is there any command to find out which indices are clustered? b.) Does MonetDB support MergeJoins? I.e., assume I have a clustered index on columns R1.a and R2.a and perform a join on these columns, will this join be realized by an efficient merge join? Best regards, Michael
On Mon, Mar 24, 2008 at 5:05 PM, Michael Schmidt
I have two tables of the form
R1(x INTEGER, y INTEGER) R2(x INTEGER, y INTEGER)
where x and y both are NOT NULL and may both contain duplicate values (however, the combination (a,b) is unique). My questions are the following:
a.) Is there a possibility to create a /clustered/ index on the x-column (or a clustered combined x-y index?). I could also create a PRIMARY KEY on (x,y). Would this result in a clustered index? Is there any command to find out which indices are clustered?
MonetDB does not support user defined clustered indeces. To have the same effect you will have to sort your data on x-y be explicitly stating it on your insert command, like: INSERT INTO table_name (...) ORDER BY X,Y MonetDb then will know that your data is sorted and result in optimized plans. As far as I know there is no command to see the indices since there are no user indices.
b.) Does MonetDB support MergeJoins? I.e., assume I have a clustered index on columns R1.a and R2.a and perform a join on these columns, will this join be realized by an efficient merge join?
If your data is sorted, MonetDB will choose a fast merge join. Which version of MonetDB are you using? Stable release or from the cvs? I hope I could help, lefteris
Best regards, Michael
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi,
MonetDB does not support user defined clustered indeces. To have the same effect you will have to sort your data on x-y be explicitly stating it on your insert command, like:
INSERT INTO table_name (...) ORDER BY X,Y
I guess it is sufficient to do this when inserting the last tuple? I am loading up to 25M tuples, so sorting after each step is certainly a bad idea ;)
MonetDb then will know that your data is sorted and result in optimized plans. As far as I know there is no command to see the indices since there are no user indices.
b.) Does MonetDB support MergeJoins? I.e., assume I have a clustered index on columns R1.a and R2.a and perform a join on these columns, will this join be realized by an efficient merge join?
If your data is sorted, MonetDB will choose a fast merge join.
OK, I will try the approach you proposed above, and compare the execution plans.
Which version of MonetDB are you using? Stable release or from the cvs?
Just installed the CVS version from today. And maybe it is of interest that I am running the queries with the new algebra (flag "-G"), which might affect the execution plan. Will report on my results later, thank you very much. Michael
Hi again,
MonetDB does not support user defined clustered indeces. To have the same effect you will have to sort your data on x-y be explicitly stating it on your insert command, like:
INSERT INTO table_name (...) ORDER BY X,Y
unfortunately, I get a parse error when specifying this statement: "!syntax error, unexpected ORDER, expecting SCOLON [...]". I tried to find something like similar in the manual, but did not succees. So how can I force sorting? And is there any way to sort without an INSERT statement? Regards, Michael
On Tue, Mar 25, 2008 at 02:25:52AM +0100, Michael Schmidt wrote:
Hi again,
MonetDB does not support user defined clustered indeces. To have the same effect you will have to sort your data on x-y be explicitly stating it on your insert command, like:
INSERT INTO table_name (...) ORDER BY X,Y
unfortunately, I get a parse error when specifying this statement: "!syntax error, unexpected ORDER, expecting SCOLON [...]". I tried to find something like similar in the manual, but did not succees. So how can I force sorting? And is there any way to sort without an INSERT statement?
The following statement should work. INSERT INTO tname select * from input_table order by X, Y, Z; Niels
Regards, Michael
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Hi Niels,
INSERT INTO table_name (...) ORDER BY X,Y
OK, thank you for your answer. Two more questions: 1.) Is it also ok to insert the triples in sorted order one after the other, instead of copying combined with ORDER BY? Is there any possibility to see whether MonetDB /really/ recognizes that tuples are ordered? 2.) How can I detect whether MonetDB chooses merge joins when joining on ordered columns. I used EXPLAIN and investigated the QEP (though I must confess that I don't really understand what's going on), but there were only "algebra.join" operations, whatever this means... Michael
On Wed, Mar 26, 2008 at 02:28:05AM +0100, Michael Schmidt wrote:
Hi Niels,
INSERT INTO table_name (...) ORDER BY X,Y
OK, thank you for your answer. Two more questions:
1.) Is it also ok to insert the triples in sorted order one after the other, instead of copying combined with ORDER BY? Is there any possibility to see whether MonetDB /really/ recognizes that tuples are ordered? Inserting them one at the time will be slow but indeed we should recognize that the insorted values are sorted.
The sortedness of columns can only be seen on the mal/c-level. There is no easy way to show that (yet).
2.) How can I detect whether MonetDB chooses merge joins when joining on ordered columns. I used EXPLAIN and investigated the QEP (though I must confess that I don't really understand what's going on), but there were only "algebra.join" operations, whatever this means...
MonetDB uses dynamic (or runtime) optimization, ie the kind of join is chosen at runtime. So algebra.join could mean merge join as well as hash join. I you want to see I a 'fast' merge join is taken you should run with debugging flags. This will however give a lot of information (probably too much). So simply check if your runtimes improved is the easiest way. Assume you did a order by on X, the Y. Do a select count(*) where X == 'some constant' group by Y; That should be much faster in the ordered case than in the not ordered case. Niels
Michael
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
participants (3)
-
Lefteris
-
Michael Schmidt
-
Niels Nes