On Mon, Jun 22, 2020 at 08:11:26PM +0400, Miro Mannino wrote:
Dear Niels Nes,
We would like to insert data column by column, for various reasons.
The problem is that if we create a larger table with column T1 and column T2, and insert 1000 rows on column T1, then the next 1000 insert on T2 are going to be appended, resulting in 1000 rows of (val, NULL) followed by another 1000 rows of (NULL, val). We could do that if there is an option to INSERT INTO from the first row as well (ignoring non-null values).
In general, we are trying to use MonetDB taking advantage of a column store DB, but we can't really insert a batch of values for one column after another column due to the usual semantic of INSERT INTO that works really in rows.
Looking forward to hearing from you,
Column by column inserts is indeed a problem with the sql language being row oriented. We do have some solution, ie binary copy into, but that maybe a bit too much for you. See https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/BinaryBulkLoad Niels
Miro
On Mon, Jun 22, 2020 at 8:02 PM Niels Nes
wrote: Min
Why don't you just create one larger table?
Niels
On 22 June 2020 14:59:44 CEST, Min Kyu Jung
wrote: To whom this may concern,
I am Min, and I am currently utilizing MonetDB for a project and I was wondering if there are any optimizations made in MonetDB such that two tables with the same number of rows can be joined together? For example, consider the following two table: CREATE TABLE t1(rowNumber, val); WITH SERIES AS( SELECT * FROM generate_series(0, 10000)) INSERT INTO t1 SELECT ROW_NUMBER() over (), rand() FROM series; CREATE TABLE t2(rowNumber, val); WITH SERIES AS( SELECT * FROM generate_series(0, 10000)) INSERT INTO t2 SELECT ROW_NUMBER() over (), rand() FROM series;
In this situation, I currently join the table as follows: CREATE TABLE t3(row#, val1, val2); INSERT INTO t3 SELECT ROW_NUMBER(), t1.val, t2.val FROM t1 INNER JOIN t2 on t1.rowNumber = t2.rowNumber;
Given that MonetDB is column-stored, I was wondering if there is a simpler/faster/more optimized way of stitching the two tables into one table.
To be clearer, as of right now, I make sure all tables have the rowNumber column. However, is there a way where I can join multiple tables (all with the same number of rows) without pre-generating rowNumber column for each table? An example is this: INSERT INTO t3 SELECT t1.val, t2.val FROM t1 INNER JOIN t2 on row_number(t1) = row_number(t2);
Sincerely, Min
-- Niels Nes, CTO, MonetDB Solutions url: https://www.monetdbsolutions.com e-mail: Niels.Nes@monetdbsolutions.com
Min, Miro, -- assuming you have: CREATE TABLE t1(rowNumber int, val1 int); INSERT INTO t1 (rowNumber, val1) SELECT ROW_NUMBER() over (), rand() FROM sys.generate_series(0, 10000); CREATE TABLE t2(rowNumber int, val2 int); INSERT INTO t2 (rowNumber, val2) SELECT ROW_NUMBER() over (), rand() FROM sys.generate_series(0, 10000); -- you should extend table t1 with the column val2 to get a t3 lookalike: ALTER TABLE t1 ADD COLUMN val2 int; -- next update the values of the new column with those from table t2.val2: UPDATE t1 set val2 = (select t2.val2 from t2 where t2.rowNumber = t1.rowNumber); SELECT * FROM t1; -- if you cannot change t1 (or t2), create a view: CREATE VIEW v3 AS SELECT t1.rowNumber, t1.val1, t2.val2 FROM t1 INNER JOIN t2 on t1.rowNumber = t2.rowNumber; -- now you can simply use the v3 in all you queries: SELECT * FROM v3; -- if you want to make changes to the data in the view, materialise it first into a separate table: CREATE TABLE t3 as SELECT * FROM v3; SELECT * FROM t3; Note: A good database design requires that every table has a primary key (which is minimal, has a unique value (no NULLs) for each row and preferably should never change (immutable)). It is essential to be able to do meaningfull data joins between tables. In this case it could be the generated rowNumber column. ALTER TABLE t1 ADD CONSTRAINT t1_pk PRIMARY KEY (rowNumber); So you shouldn't want to get rid of that column unless you have another column which already is primary key. Mar+in On 22-06-2020 18:59, Niels Nes wrote:
On Mon, Jun 22, 2020 at 08:11:26PM +0400, Miro Mannino wrote:
Dear Niels Nes,
We would like to insert data column by column, for various reasons.
The problem is that if we create a larger table with column T1 and column T2, and insert 1000 rows on column T1, then the next 1000 insert on T2 are going to be appended, resulting in 1000 rows of (val, NULL) followed by another 1000 rows of (NULL, val). We could do that if there is an option to INSERT INTO from the first row as well (ignoring non-null values). Use UPDATE, see above.
In general, we are trying to use MonetDB taking advantage of a column store DB, but we can't really insert a batch of values for one column after another column due to the usual semantic of INSERT INTO that works really in rows.
Looking forward to hearing from you,
Column by column inserts is indeed a problem with the sql language being row oriented. We do have some solution, ie binary copy into, but that maybe a bit too much for you.
See https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/BinaryBulkLoad
Niels
Miro
On Mon, Jun 22, 2020 at 8:02 PM Niels Nes
wrote: Min
Why don't you just create one larger table?
Niels
On 22 June 2020 14:59:44 CEST, Min Kyu Jung
wrote: To whom this may concern,
I am Min, and I am currently utilizing MonetDB for a project and I was wondering if there are any optimizations made in MonetDB such that two tables with the same number of rows can be joined together? For example, consider the following two table: CREATE TABLE t1(rowNumber, val); WITH SERIES AS( SELECT * FROM generate_series(0, 10000)) INSERT INTO t1 SELECT ROW_NUMBER() over (), rand() FROM series; CREATE TABLE t2(rowNumber, val); WITH SERIES AS( SELECT * FROM generate_series(0, 10000)) INSERT INTO t2 SELECT ROW_NUMBER() over (), rand() FROM series;
In this situation, I currently join the table as follows: CREATE TABLE t3(row#, val1, val2); INSERT INTO t3 SELECT ROW_NUMBER(), t1.val, t2.val FROM t1 INNER JOIN t2 on t1.rowNumber = t2.rowNumber;
Given that MonetDB is column-stored, I was wondering if there is a simpler/faster/more optimized way of stitching the two tables into one table.
To be clearer, as of right now, I make sure all tables have the rowNumber column. However, is there a way where I can join multiple tables (all with the same number of rows) without pre-generating rowNumber column for each table? An example is this: INSERT INTO t3 SELECT t1.val, t2.val FROM t1 INNER JOIN t2 on row_number(t1) = row_number(t2);
Sincerely, Min
participants (2)
-
Martin van Dinther
-
Niels Nes