[MonetDB-users] Sorted materialized views
Hello, all! I am about to prototype something, and it would be convenient to use SQL to throw it together quickly to see if it flies. I downloaded MonetDB yesterday, so I am quite fresh :-) I have a table with a set of columns, and the like to have a few materialized views or projections (or whatever you DB people call it ;-)) sorted on different columns. The views may also have selection, I am not sure yet. And given problem (a data query), I use some statistics to decide which sorting order is most convenient, look up a (preferably continuous) vertical part of the columns, scan through it, and possibly do some filtering. The queries may also be trees, in which case I might use a different sorting order for each branch, and then do merging based on some of the columns. Assume: CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 3), (4, 2); I would like to do something like: CREATE VIEW v AS SELECT b, a FROM t ORDER BY b, a; This is not allowed (works if I remove ORDER BY). I would also like to have an index over v(b,a), like: CREATE UNIQUE INDEX j USING BTREE ON v (b, a) ; But this is never allowed on a view, right? Since updates are not an issue in the prototype, I could just create a new table like this: CREATE TABLE v (b int, a int, PRIMARY KEY(b, a), INDEX i USING BTREE (b ASC, a ASC)) AS SELECT b, a FROM t ORDER BY b, a; (Using both PRIMARY KEY and INDEX is always butter on bacon?) But this is not allowed because of the table definition, all I am allowed to do is: CREATE TABLE v (b, a) AS SELECT b, a FROM t ORDER BY b, a WITH DATA; If I use PRIMARY KEY(b, a), am I then guaranteed the table to be sorted on (b, a)? Is there any other way of enforcing this in SQL? 1. Any suggestions for what I should do? Is my best option to create all "views" as tables from scratch, and fill them by hand? 2. How much slower would this be than using low-level stuff like BATs and BUNs myself? (The merges between branches in my (problem level) queries are simple, but cannot be expressed as SQL.) Klem fra Nils
On Tue, Apr 15, 2008 at 01:16:46PM +0200, Nils Grimsmo wrote:
Hello, all!
I am about to prototype something, and it would be convenient to use SQL to throw it together quickly to see if it flies. I downloaded MonetDB yesterday, so I am quite fresh :-)
I have a table with a set of columns, and the like to have a few materialized views or projections (or whatever you DB people call it ;-)) sorted on different columns. The views may also have selection, I am not sure yet. The MonetDB/SQL views aren't materialized, ie a view which sorts (even if its allowed by the parser (which it isn't)), will be very slow.
And given problem (a data query), I use some statistics to decide which sorting order is most convenient, look up a (preferably continuous) vertical part of the columns, scan through it, and possibly do some filtering. The queries may also be trees, in which case I might use a different sorting order for each branch, and then do merging based on some of the columns.
Assume:
CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 3), (4, 2);
I would like to do something like:
CREATE VIEW v AS SELECT b, a FROM t ORDER BY b, a; Instead use a second table which holds the data of a order on b, a (possibly with a key between v and t)?
This is not allowed (works if I remove ORDER BY). I would also like to have an index over v(b,a), like:
CREATE UNIQUE INDEX j USING BTREE ON v (b, a) ; We don't support BTREEs.
But this is never allowed on a view, right? Since updates are not an issue in the prototype, I could just create a new table like this:
CREATE TABLE v (b int, a int, PRIMARY KEY(b, a), INDEX i USING BTREE (b ASC, a ASC)) AS SELECT b, a FROM t ORDER BY b, a;
(Using both PRIMARY KEY and INDEX is always butter on bacon?) But this is not allowed because of the table definition, all I am allowed to do is:
CREATE TABLE v (b, a) AS SELECT b, a FROM t ORDER BY b, a WITH DATA;
If I use PRIMARY KEY(b, a), am I then guaranteed the table to be sorted on (b, a)? Is there any other way of enforcing this in SQL? No only if you insert in order, ie if you only do cate table v (b,a) as select b, a from t order by b, a with data; and never insert it will stay sorted.
1. Any suggestions for what I should do? Is my best option to create all "views" as tables from scratch, and fill them by hand? We are thinking of implementing 'clustered indices' for this, but currently its not ready (ie implementation still needs to be started).
2. How much slower would this be than using low-level stuff like BATs and BUNs myself? (The merges between branches in my (problem level) queries are simple, but cannot be expressed as SQL.) Performance difference shouldn't be too high. Only if you have a lot of application knowledge you could exploit in the low level case, large difference can be expected.
NIels
Klem fra Nils
------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javao... _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Niels Nes wrote:
On Tue, Apr 15, 2008 at 01:16:46PM +0200, Nils Grimsmo wrote:
Hello, all!
I am about to prototype something, and it would be convenient to use SQL to throw it together quickly to see if it flies. I downloaded MonetDB yesterday, so I am quite fresh :-)
I have a table with a set of columns, and the like to have a few materialized views or projections (or whatever you DB people call it ;-)) sorted on different columns. The views may also have selection, I am not sure yet.
The MonetDB/SQL views aren't materialized, ie a view which sorts (even if its allowed by the parser (which it isn't)), will be very slow.
And given problem (a data query), I use some statistics to decide which sorting order is most convenient, look up a (preferably continuous) vertical part of the columns, scan through it, and possibly do some filtering. The queries may also be trees, in which case I might use a different sorting order for each branch, and then do merging based on some of the columns.
Assume:
CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 3), (4, 2);
I would like to do something like:
CREATE VIEW v AS SELECT b, a FROM t ORDER BY b, a;
Instead use a second table which holds the data of a order on b, a (possibly with a key between v and t)?
This is not allowed (works if I remove ORDER BY). I would also like to have an index over v(b,a), like:
CREATE UNIQUE INDEX j USING BTREE ON v (b, a) ;
We don't support BTREEs.
What kind of indexes are used? (It does not say under http://monetdb.cwi.nl/projects/monetdb/SQL/Documentation/Indexes.html)
But this is never allowed on a view, right? Since updates are not an issue in the prototype, I could just create a new table like this:
CREATE TABLE v (b int, a int, PRIMARY KEY(b, a), INDEX i USING BTREE (b ASC, a ASC)) AS SELECT b, a FROM t ORDER BY b, a;
(Using both PRIMARY KEY and INDEX is always butter on bacon?) But this is not allowed because of the table definition, all I am allowed to do is:
CREATE TABLE v (b, a) AS SELECT b, a FROM t ORDER BY b, a WITH DATA;
If I use PRIMARY KEY(b, a), am I then guaranteed the table to be sorted on (b, a)? Is there any other way of enforcing this in SQL?
No only if you insert in order, ie if you only do create table v (b,a) as select b, a from t order by b, a with data; and never insert it will stay sorted.
But with CREATE TABLE AS SELECT I cannot use PRIMARY KEY. Can I still get a fast retrieval of SELECT * FROM v WHERE b=13 AND a=14? Should I CREATE INDEX?
1. Any suggestions for what I should do? Is my best option to create all "views" as tables from scratch, and fill them by hand?
We are thinking of implementing 'clustered indices' for this, but currently its not ready (ie implementation still needs to be started).
2. How much slower would this be than using low-level stuff like BATs and BUNs myself? (The merges between branches in my (problem level) queries are simple, but cannot be expressed as SQL.)
Performance difference shouldn't be too high. Only if you have a lot of application knowledge you could exploit in the low level case, large difference can be expected.
Thank you for a swift and informative answer! Klem fra Nils
On Wed, Apr 16, 2008 at 09:16:27AM +0200, Nils Grimsmo wrote:
Niels Nes wrote:
On Tue, Apr 15, 2008 at 01:16:46PM +0200, Nils Grimsmo wrote:
Hello, all!
I am about to prototype something, and it would be convenient to use SQL to throw it together quickly to see if it flies. I downloaded MonetDB yesterday, so I am quite fresh :-)
I have a table with a set of columns, and the like to have a few materialized views or projections (or whatever you DB people call it ;-)) sorted on different columns. The views may also have selection, I am not sure yet.
The MonetDB/SQL views aren't materialized, ie a view which sorts (even if its allowed by the parser (which it isn't)), will be very slow.
And given problem (a data query), I use some statistics to decide which sorting order is most convenient, look up a (preferably continuous) vertical part of the columns, scan through it, and possibly do some filtering. The queries may also be trees, in which case I might use a different sorting order for each branch, and then do merging based on some of the columns.
Assume:
CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 3), (4, 2);
I would like to do something like:
CREATE VIEW v AS SELECT b, a FROM t ORDER BY b, a;
Instead use a second table which holds the data of a order on b, a (possibly with a key between v and t)?
This is not allowed (works if I remove ORDER BY). I would also like to have an index over v(b,a), like:
CREATE UNIQUE INDEX j USING BTREE ON v (b, a) ;
We don't support BTREEs.
What kind of indexes are used? (It does not say under http://monetdb.cwi.nl/projects/monetdb/SQL/Documentation/Indexes.html)
We have 2 indices, hash and join indices. Hash indices are used to support efficient checking of primary and unique keys. Join indices are there for foreign key checks. The last are exploited when you do joins on these keys.
But this is never allowed on a view, right? Since updates are not an issue in the prototype, I could just create a new table like this:
CREATE TABLE v (b int, a int, PRIMARY KEY(b, a), INDEX i USING BTREE (b ASC, a ASC)) AS SELECT b, a FROM t ORDER BY b, a;
(Using both PRIMARY KEY and INDEX is always butter on bacon?) But this is not allowed because of the table definition, all I am allowed to do is:
CREATE TABLE v (b, a) AS SELECT b, a FROM t ORDER BY b, a WITH DATA;
If I use PRIMARY KEY(b, a), am I then guaranteed the table to be sorted on (b, a)? Is there any other way of enforcing this in SQL?
No only if you insert in order, ie if you only do create table v (b,a) as select b, a from t order by b, a with data; and never insert it will stay sorted.
But with CREATE TABLE AS SELECT I cannot use PRIMARY KEY. Can I still
You could create the index (or key/constraint) later using an 'alter' statement.
get a fast retrieval of SELECT * FROM v WHERE b=13 AND a=14? Should I
If your data is order on b and a this should be fast (we had some fixes in this code resently, ie you should probaly switch to the cvs version, or nightly builds of the 'cvs head' branch) to fully exploit this. Niels
CREATE INDEX?
1. Any suggestions for what I should do? Is my best option to create all "views" as tables from scratch, and fill them by hand?
We are thinking of implementing 'clustered indices' for this, but currently its not ready (ie implementation still needs to be started).
2. How much slower would this be than using low-level stuff like BATs and BUNs myself? (The merges between branches in my (problem level) queries are simple, but cannot be expressed as SQL.)
Performance difference shouldn't be too high. Only if you have a lot of application knowledge you could exploit in the low level case, large difference can be expected.
Thank you for a swift and informative answer!
Klem fra Nils
------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javao... _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
Nils Grimsmo wrote:
Niels Nes wrote:
On Tue, Apr 15, 2008 at 01:16:46PM +0200, Nils Grimsmo wrote:
Hello, all!
I am about to prototype something, and it would be convenient to use SQL to throw it together quickly to see if it flies. I downloaded MonetDB yesterday, so I am quite fresh :-)
I have a table with a set of columns, and the like to have a few materialized views or projections (or whatever you DB people call it ;-)) sorted on different columns. The views may also have selection, I am not sure yet.
The MonetDB/SQL views aren't materialized, ie a view which sorts (even if its allowed by the parser (which it isn't)), will be very slow.
And given problem (a data query), I use some statistics to decide which sorting order is most convenient, look up a (preferably continuous) vertical part of the columns, scan through it, and possibly do some filtering. The queries may also be trees, in which case I might use a different sorting order for each branch, and then do merging based on some of the columns.
Assume:
CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 3), (4, 2);
I would like to do something like:
CREATE VIEW v AS SELECT b, a FROM t ORDER BY b, a;
Instead use a second table which holds the data of a order on b, a (possibly with a key between v and t)?
This is not allowed (works if I remove ORDER BY). I would also like to have an index over v(b,a), like:
CREATE UNIQUE INDEX j USING BTREE ON v (b, a) ;
We don't support BTREEs.
What kind of indexes are used? (It does not say under http://monetdb.cwi.nl/projects/monetdb/SQL/Documentation/Indexes.html)
Hash indices are used when needed and maintained under update. However, the are never saved on disk. Most performance benefits come from maintaining order. Details on the implementation can be found in the source code and the extracted documentation. http://monetdb.cwi.nl/projects/monetdb/MonetDB/Documentation/The-Inner-Core....
But this is never allowed on a view, right? Since updates are not an issue in the prototype, I could just create a new table like this:
CREATE TABLE v (b int, a int, PRIMARY KEY(b, a), INDEX i USING BTREE (b ASC, a ASC)) AS SELECT b, a FROM t ORDER BY b, a;
(Using both PRIMARY KEY and INDEX is always butter on bacon?) But this is not allowed because of the table definition, all I am allowed to do is:
CREATE TABLE v (b, a) AS SELECT b, a FROM t ORDER BY b, a WITH DATA;
If I use PRIMARY KEY(b, a), am I then guaranteed the table to be sorted on (b, a)? Is there any other way of enforcing this in SQL?
No only if you insert in order, ie if you only do create table v (b,a) as select b, a from t order by b, a with data; and never insert it will stay sorted.
But with CREATE TABLE AS SELECT I cannot use PRIMARY KEY. Can I still get a fast retrieval of SELECT * FROM v WHERE b=13 AND a=14? Should I CREATE INDEX?
1. Any suggestions for what I should do? Is my best option to create all "views" as tables from scratch, and fill them by hand?
We are thinking of implementing 'clustered indices' for this, but currently its not ready (ie implementation still needs to be started).
2. How much slower would this be than using low-level stuff like BATs and BUNs myself? (The merges between branches in my (problem level) queries are simple, but cannot be expressed as SQL.)
Performance difference shouldn't be too high. Only if you have a lot of application knowledge you could exploit in the low level case, large difference can be expected.
Thank you for a swift and informative answer!
Klem fra Nils
------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javao... _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (3)
-
Martin Kersten
-
Niels Nes
-
Nils Grimsmo