[Monetdb-developers] Suggestion to ditch SHA-1, and embrace SHA-2

On 08-05-2009 19:54:38 +0200, Stefan de Konink wrote:

...
As already found in the news SHA-1 is advised to migrated off by 2010. I would suggest SHA-2 in protocol 9.

Which one of the SHA-2 family then? And is it really that important given that the store is still protected by a vaultkey?

SHA-256; Is it important? For obvious reasons you choose not to use SHA-0 or MD5 for prototol 9. Within that logic SHA-1 is actively being broken; thus I would follow an advisory not to incorporate them in new software. Stefan

Fabian Groffen

9:50 a.m.

New subject: [Monetdb-developers] Suggestion to ditch SHA-1, and embrace SHA-2

On 09-05-2009 11:43:14 +0200, Stefan de Konink wrote:

...

Fabian Groffen wrote:

...
On 08-05-2009 19:54:38 +0200, Stefan de Konink wrote:

...
As already found in the news SHA-1 is advised to migrated off by 2010. I would suggest SHA-2 in protocol 9.

Which one of the SHA-2 family then? And is it really that important given that the store is still protected by a vaultkey?

SHA-256;

Is it important? For obvious reasons you choose not to use SHA-0 or MD5 for prototol 9. Within that logic SHA-1 is actively being broken; thus I would follow an advisory not to incorporate them in new software.

Which is ok with me, but just makes it a bad initial choice from my side, since this change will not be "convertable" from one to another. Not a problem for releases, but it is a problem for people depending on trunk at the moment ;) Perhaps we should make use of this and allow the DBA to configure what password backend to use, and have proto 9 automagically see what hash to use. This requires a bit more work, but saves us from an un-convertible upgrade in the future when SHA-2 is taken for weak as well.

Martin Kersten

9:57 a.m.

New subject: [Monetdb-developers] Suggestion to ditch SHA-1, and embrace SHA-2

Fabian Groffen wrote:

...

On 09-05-2009 11:43:14 +0200, Stefan de Konink wrote:

...
Fabian Groffen wrote:

...
On 08-05-2009 19:54:38 +0200, Stefan de Konink wrote:

...
As already found in the news SHA-1 is advised to migrated off by 2010. I would suggest SHA-2 in protocol 9. Which one of the SHA-2 family then? And is it really that important given that the store is still protected by a vaultkey? SHA-256;

Is it important? For obvious reasons you choose not to use SHA-0 or MD5 for prototol 9. Within that logic SHA-1 is actively being broken; thus I would follow an advisory not to incorporate them in new software.

Which is ok with me, but just makes it a bad initial choice from my side, since this change will not be "convertable" from one to another. Not a problem for releases, but it is a problem for people depending on trunk at the moment ;) Perhaps we should make use of this and allow the DBA to configure what No. the options for the DBA should be kept to a minimum. Go for the solution that will work for the next few years. Trunk users at some point should migrate through dump/restore, which can be used for passwords refresh requirements as well.

Roberto Cornacchia

11:54 a.m.

...

No. the options for the DBA should be kept to a minimum. Go for the solution that will work for the next few years. Trunk users at some point should migrate through dump/restore, which can be used for passwords refresh requirements as well.

Be careful with relying on dump/restore. 1. The dump is correct only if you do it as user monetdb, but not if you do it as a normal user (see bug #2782202) 2. The restore is almost unusable for large databases, as it is extremely slow. I suspect the main reasons being: - COPY INTO .. FROM STDIN (used in the dump) is much much slower than copying from a file, regardless of whether the number or record to copy is given or not. - Column constraints, as primary key, not null, etc, are issued upon table creation, before the COPY INTO, rather than being postponed with an ALTER TABLE after the copy (FOREIGN KEYS are instead postponed to the end of the script). Roberto -- | M.Sc. Roberto Cornacchia | CWI (Centrum voor Wiskunde en Informatica) | Science Park 123, 1098XG Amsterdam, The Netherlands | tel: +31 20 592 4322 , http://www.cwi.nl/~roberto

Stefan de Konink

10 May 10 May

10:51 a.m.

New subject: [Monetdb-developers] Suggestion to ditch SHA-1, and embrace SHA-2

Roberto Cornacchia wrote:

...

2. The restore is almost unusable for large databases, as it is extremely slow. I suspect the main reasons being: - COPY INTO .. FROM STDIN (used in the dump) is much much slower than copying from a file, regardless of whether the number or record to copy is given or not.

It seems that a configuration option added to msqldump to 'split' output could just fix that.

...

- Column constraints, as primary key, not null, etc, are issued upon table creation, before the COPY INTO, rather than being postponed with an ALTER TABLE after the copy (FOREIGN KEYS are instead postponed to the end of the script).

I am looking at this now I agree with you that this could help a lot. Will need to move some stuff around because of the -D option that (imho) should inline everything. Stefan

Stefan de Konink

12:52 p.m.

New subject: [Monetdb-developers] Suggestion to ditch SHA-1, and embrace SHA-2

Stefan de Konink wrote:

...

I am looking at this now I agree with you that this could help a lot. Will need to move some stuff around because of the -D option that (imho) should inline everything.

As attached. Ok to commit? If so could anyone help with approving tests? Stefan Index: dump.c =================================================================== RCS file: /cvsroot/monetdb/clients/src/mapiclient/dump.c,v retrieving revision 1.38 diff -u -r1.38 dump.c --- dump.c 26 Apr 2009 08:56:59 -0000 1.38 +++ dump.c 10 May 2009 12:50:26 -0000 @@ -105,6 +105,279 @@ return sname; } +static void dump_constraints(Mapi mid, char *schema, char *tname, stream *toConsole) { + + MapiHdl hdl; + int cnt; + char *query; + size_t maxquerylen = 0; + + if (tname == NULL) { + query = "SELECT \"c\".\"name\"," /* 0 */ + "\"c\".\"null\"," /* 1 */ + "\"c\".\"default\"," /* 2 */ + "\"c\".\"number\"," /* 3 */ + "\"s\".\"name\"," /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"_columns\" \"c\", " + "\"sys\".\"_tables\" \"t\", " + "\"sys\".\"schemas\" \"s\" " + "WHERE \"c\".\"table_id\" = \"t\".\"id\" " + "AND \"t\".\"schema_id\" = \"s\".\"id\" " + "ORDER BY \"s\".\"name\", \"t\".\"name\", \"number\""; + if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { + if (hdl) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + } else + mapi_explain(mid, stderr); + goto bailout; + } + + while ((mapi_fetch_row(hdl)) != 0) { + char *c_name = mapi_fetch_field(hdl, 0); + char *c_null = mapi_fetch_field(hdl, 1); + char *c_default = mapi_fetch_field(hdl, 2); + char *s_name = mapi_fetch_field(hdl, 4); + char *t_name = mapi_fetch_field(hdl, 5); + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + + if (strcmp(c_null, "false") == 0) { + stream_printf(toConsole, "ALTER TABLE "); + quoted_print(toConsole, s_name); + stream_printf(toConsole, "."); + quoted_print(toConsole, t_name); + stream_printf(toConsole, " ALTER COLUMN "); + quoted_print(toConsole, c_name); + stream_printf(toConsole, " SET NOT NULL;\n"); + } + + if (c_default != NULL) { + stream_printf(toConsole, "ALTER TABLE "); + quoted_print(toConsole, s_name); + stream_printf(toConsole, "."); + quoted_print(toConsole, t_name); + stream_printf(toConsole, " ALTER COLUMN "); + quoted_print(toConsole, c_name); + stream_printf(toConsole, " SET DEFAULT %s;\n", c_default); + } + } + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + mapi_close_handle(hdl); + } + + /* presumably we don't need to order on id, since there should + only be a single primary key, but it doesn't hurt, and the + code is then close to the code for the uniqueness + constraint */ + + if (tname != NULL) { + maxquerylen = 512 + strlen(tname) + strlen(schema); + query = malloc(maxquerylen); + snprintf(query, maxquerylen, + "SELECT \"kc\".\"column\", " /* 0 */ + "\"kc\".\"nr\", " /* 1 */ + "\"k\".\"name\", " /* 2 */ + "\"k\".\"id\", " /* 3 */ + "\"s\".\"name\", " /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"keycolumns\" \"kc\", " + "\"sys\".\"keys\" \"k\", " + "\"sys\".\"schemas\" \"s\", " + "\"sys\".\"_tables\" \"t\" " + "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " + "\"k\".\"table_id\" = \"t\".\"id\" AND " + "\"k\".\"type\" = 0 AND " + "\"t\".\"schema_id\" = \"s\".\"id\" AND " + "\"s\".\"name\" = '%s' AND " + "\"t\".\"name\" = '%s' " + "ORDER BY \"id\", \"nr\"", schema, tname); + } else { + query = "SELECT \"kc\".\"column\"," /* 0 */ + "\"kc\".\"nr\", " /* 1 */ + "\"k\".\"name\", " /* 2 */ + "\"k\".\"id\"," /* 3 */ + "\"s\".\"name\"," /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"keycolumns\" \"kc\", " + "\"sys\".\"keys\" \"k\", " + "\"sys\".\"schemas\" \"s\", " + "\"sys\".\"_tables\" \"t\" " + "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " + "\"k\".\"table_id\" = \"t\".\"id\" AND " + "\"k\".\"type\" = 0 AND " + "\"t\".\"schema_id\" = \"s\".\"id\" " + "ORDER BY \"s\".\"name\", \"t\".\"name\", \"id\", \"nr\""; + } + if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { + if (hdl) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + } else + mapi_explain(mid, stderr); + goto bailout; + } + cnt = 0; + while ((mapi_fetch_row(hdl)) != 0) { + char *c_column = mapi_fetch_field(hdl, 0); + char *k_name = mapi_fetch_field(hdl, 2); + char *s_name = mapi_fetch_field(hdl, 4); + char *t_name = mapi_fetch_field(hdl, 5); + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + if (cnt == 0) { + if (tname) { + stream_printf(toConsole, ",\n\t"); + } else { + if (s_name) { + if (cnt > 0) stream_printf(toConsole, ");\n"); + stream_printf(toConsole, "ALTER TABLE "); + quoted_print(toConsole, s_name); + stream_printf(toConsole, "."); + quoted_print(toConsole, t_name); + stream_printf(toConsole, " ADD "); + } + } + + if (k_name) { + stream_printf(toConsole, "CONSTRAINT "); + quoted_print(toConsole, k_name); + stream_write(toConsole, " ", 1, 1); + } + stream_printf(toConsole, "PRIMARY KEY ("); + } else + stream_printf(toConsole, ", "); + quoted_print(toConsole, c_column); + cnt++; + } + if (cnt) { + stream_printf(toConsole, ")"); + if (!tname) + stream_printf(toConsole, ";\n"); + } + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + mapi_close_handle(hdl); + + if (tname != NULL) { + snprintf(query, maxquerylen, + "SELECT \"kc\".\"column\"," /* 0 */ + "\"kc\".\"nr\"," /* 1 */ + "\"k\".\"name\"," /* 2 */ + "\"k\".\"id\"," /* 3 */ + "\"s\".\"name\"," /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"keycolumns\" \"kc\", " + "\"sys\".\"keys\" \"k\", " + "\"sys\".\"schemas\" \"s\", " + "\"sys\".\"_tables\" \"t\" " + "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " + "\"k\".\"table_id\" = \"t\".\"id\" AND " + "\"k\".\"type\" = 1 AND " + "\"t\".\"schema_id\" = \"s\".\"id\" AND " + "\"s\".\"name\" = '%s' AND " + "\"t\".\"name\" = '%s' " + "ORDER BY \"id\", \"nr\"", schema, tname); + } else { + query = "SELECT \"kc\".\"column\"," /* 0 */ + "\"kc\".\"nr\"," /* 1 */ + "\"k\".\"name\"," /* 2 */ + "\"k\".\"id\"," /* 3 */ + "\"s\".\"name\"," /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"keycolumns\" \"kc\", " + "\"sys\".\"keys\" \"k\", " + "\"sys\".\"schemas\" \"s\", " + "\"sys\".\"_tables\" \"t\" " + "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " + "\"k\".\"table_id\" = \"t\".\"id\" AND " + "\"k\".\"type\" = 1 AND " + "\"t\".\"schema_id\" = \"s\".\"id\" " + "ORDER BY \"s\".\"name\", \"t\".\"name\",\"id\", \"nr\""; + } + + if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { + if (hdl) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + } else + mapi_explain(mid, stderr); + goto bailout; + } + cnt = 0; + while ((mapi_fetch_row(hdl)) != 0) { + char *c_column = mapi_fetch_field(hdl, 0); + char *kc_nr = mapi_fetch_field(hdl, 1); + char *k_name = mapi_fetch_field(hdl, 2); + char *s_name = mapi_fetch_field(hdl, 4); + char *t_name = mapi_fetch_field(hdl, 5); + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + if (strcmp(kc_nr, "0") == 0) { + if (cnt) + stream_write(toConsole, ")", 1, 1); + + if (t_name) { + stream_printf(toConsole, ",\n\t"); + } else { + if (cnt > 0) stream_printf(toConsole, ");\n"); + stream_printf(toConsole, "ALTER TABLE "); + quoted_print(toConsole, s_name); + stream_printf(toConsole, "."); + quoted_print(toConsole, t_name); + stream_printf(toConsole, " ADD "); + } + + if (k_name) { + stream_printf(toConsole, "CONSTRAINT "); + quoted_print(toConsole, k_name); + stream_write(toConsole, " ", 1, 1); + } + stream_printf(toConsole, "UNIQUE ("); + cnt = 1; + } else + stream_printf(toConsole, ", "); + quoted_print(toConsole, c_column); + } + if (cnt) { + stream_write(toConsole, ")", 1, 1); + if (!tname) + stream_printf(toConsole, ";\n"); + } + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + mapi_close_handle(hdl); + +bailout: + if (query != NULL && maxquerylen != 0) + free(query); +} + static void dump_foreign_keys(Mapi mid, char *schema, char *tname, stream *toConsole) { @@ -117,14 +390,14 @@ maxquerylen = 1024 + strlen(tname) + strlen(schema); query = malloc(maxquerylen); snprintf(query, maxquerylen, - "SELECT \"ps\".\"name\"," /* 0 */ - "\"pkt\".\"name\"," /* 1 */ - "\"pkkc\".\"column\"," /* 2 */ - "\"fkkc\".\"column\"," /* 3 */ - "\"fkkc\".\"nr\"," /* 4 */ - "\"fkk\".\"name\"," /* 5 */ - "\"fkk\".\"action\"," /* 6 */ - "\"fs\".\"name\"," /* 7 */ + "SELECT \"ps\".\"name\", " /* 0 */ + "\"pkt\".\"name\", " /* 1 */ + "\"pkkc\".\"column\", " /* 2 */ + "\"fkkc\".\"column\", " /* 3 */ + "\"fkkc\".\"nr\", " /* 4 */ + "\"fkk\".\"name\", " /* 5 */ + "\"fkk\".\"action\", " /* 6 */ + "\"fs\".\"name\", " /* 7 */ "\"fkt\".\"name\" " /* 8 */ "FROM \"sys\".\"_tables\" \"fkt\"," "\"sys\".\"keycolumns\" \"fkkc\"," @@ -478,71 +751,14 @@ } else { stream_printf(toConsole, "%s(%s,%s)", c_type, c_type_digits, c_type_scale); } - if (strcmp(c_null, "false") == 0) - stream_printf(toConsole, " NOT NULL"); - if (c_default != NULL) - stream_printf(toConsole, " DEFAULT %s", c_default); - cnt++; - } - if (mapi_error(mid)) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - goto bailout; - } - mapi_close_handle(hdl); - /* presumably we don't need to order on id, since there should - only be a single primary key, but it doesn't hurt, and the - code is then close to the code for the uniqueness - constraint */ - snprintf(query, maxquerylen, - "SELECT \"kc\".\"column\"," /* 0 */ - "\"kc\".\"nr\", " /* 1 */ - "\"k\".\"name\", " /* 2 */ - "\"k\".\"id\" " /* 3 */ - "FROM \"sys\".\"keycolumns\" \"kc\", " - "\"sys\".\"keys\" \"k\", " - "\"sys\".\"schemas\" \"s\", " - "\"sys\".\"_tables\" \"t\" " - "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " - "\"k\".\"table_id\" = \"t\".\"id\" AND " - "\"k\".\"type\" = 0 AND " - "\"t\".\"schema_id\" = \"s\".\"id\" AND " - "\"s\".\"name\" = '%s' AND " - "\"t\".\"name\" = '%s' " - "ORDER BY \"id\", \"nr\"", schema, tname); - if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { - if (hdl) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - } else - mapi_explain(mid, stderr); - goto bailout; - } - cnt = 0; - while ((mapi_fetch_row(hdl)) != 0) { - char *c_column = mapi_fetch_field(hdl, 0); - char *k_name = mapi_fetch_field(hdl, 2); - - if (mapi_error(mid)) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - goto bailout; + if (describe) { + if (strcmp(c_null, "false") == 0) + stream_printf(toConsole, " NOT NULL"); + if (c_default != NULL) + stream_printf(toConsole, " DEFAULT %s", c_default); } - if (cnt == 0) { - stream_printf(toConsole, ",\n\t"); - if (k_name) { - stream_printf(toConsole, "CONSTRAINT "); - quoted_print(toConsole, k_name); - stream_write(toConsole, " ", 1, 1); - } - stream_printf(toConsole, "PRIMARY KEY ("); - } else - stream_printf(toConsole, ", "); - quoted_print(toConsole, c_column); cnt++; } - if (cnt) - stream_printf(toConsole, ")"); if (mapi_error(mid)) { mapi_explain_query(hdl, stderr); mapi_close_handle(hdl); @@ -550,64 +766,8 @@ } mapi_close_handle(hdl); - snprintf(query, maxquerylen, - "SELECT \"kc\".\"column\"," /* 0 */ - "\"kc\".\"nr\", " /* 1 */ - "\"k\".\"name\", " /* 2 */ - "\"k\".\"id\" " /* 3 */ - "FROM \"sys\".\"keycolumns\" \"kc\", " - "\"sys\".\"keys\" \"k\", " - "\"sys\".\"schemas\" \"s\", " - "\"sys\".\"_tables\" \"t\" " - "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " - "\"k\".\"table_id\" = \"t\".\"id\" AND " - "\"k\".\"type\" = 1 AND " - "\"t\".\"schema_id\" = \"s\".\"id\" AND " - "\"s\".\"name\" = '%s' AND " - "\"t\".\"name\" = '%s' " - "ORDER BY \"id\", \"nr\"", schema, tname); - if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { - if (hdl) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - } else - mapi_explain(mid, stderr); - goto bailout; - } - cnt = 0; - while ((mapi_fetch_row(hdl)) != 0) { - char *c_column = mapi_fetch_field(hdl, 0); - char *kc_nr = mapi_fetch_field(hdl, 1); - char *k_name = mapi_fetch_field(hdl, 2); - - if (mapi_error(mid)) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - goto bailout; - } - if (strcmp(kc_nr, "0") == 0) { - if (cnt) - stream_write(toConsole, ")", 1, 1); - stream_printf(toConsole, ",\n\t"); - if (k_name) { - stream_printf(toConsole, "CONSTRAINT "); - quoted_print(toConsole, k_name); - stream_write(toConsole, " ", 1, 1); - } - stream_printf(toConsole, "UNIQUE ("); - cnt = 1; - } else - stream_printf(toConsole, ", "); - quoted_print(toConsole, c_column); - } - if (cnt) - stream_write(toConsole, ")", 1, 1); - if (mapi_error(mid)) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - goto bailout; - } - mapi_close_handle(hdl); + if (describe) + dump_constraints(mid, schema, tname, toConsole); if (foreign) dump_foreign_keys(mid, schema, tname, toConsole); @@ -1158,6 +1318,7 @@ mapi_close_handle(hdl); if (!describe) { + dump_constraints(mid, NULL, NULL, toConsole); dump_foreign_keys(mid, NULL, NULL, toConsole); /* dump sequences, part 2 */

Fabian Groffen

12:56 p.m.

New subject: [Monetdb-developers] Some mdumpsql change

On 10-05-2009 14:52:14 +0200, Stefan de Konink wrote:

...

Stefan de Konink wrote:

...
I am looking at this now I agree with you that this could help a lot. Will need to move some stuff around because of the -D option that (imho) should inline everything.

As attached.

Ok to commit? If so could anyone help with approving tests?

Discuss on monday during MADAM?

Stefan de Konink

1:09 p.m.

New subject: [Monetdb-developers] Some mdumpsql change

Fabian Groffen wrote:

...

On 10-05-2009 14:52:14 +0200, Stefan de Konink wrote:

...
Stefan de Konink wrote:

...
I am looking at this now I agree with you that this could help a lot. Will need to move some stuff around because of the -D option that (imho) should inline everything. As attached.

Ok to commit? If so could anyone help with approving tests?

Discuss on monday during MADAM?

As you wish :) Stefan

Fabian Groffen

1:11 p.m.

New subject: [Monetdb-developers] Some mdumpsql change

On 10-05-2009 15:09:20 +0200, Stefan de Konink wrote:

...

Fabian Groffen wrote:

...
On 10-05-2009 14:52:14 +0200, Stefan de Konink wrote:

...
Stefan de Konink wrote:

...
I am looking at this now I agree with you that this could help a lot. Will need to move some stuff around because of the -D option that (imho) should inline everything. As attached.

Ok to commit? If so could anyone help with approving tests?

Discuss on monday during MADAM?

As you wish :)

s/wish/suggest/ ;)

Stefan de Konink

2 p.m.

New subject: [Monetdb-developers] Some mdumpsql change

Fabian Groffen wrote:

...

On 10-05-2009 15:09:20 +0200, Stefan de Konink wrote:

...
Fabian Groffen wrote:

...
On 10-05-2009 14:52:14 +0200, Stefan de Konink wrote:

...
Stefan de Konink wrote:

...
I am looking at this now I agree with you that this could help a lot. Will need to move some stuff around because of the -D option that (imho) should inline everything. As attached.

Ok to commit? If so could anyone help with approving tests? Discuss on monday during MADAM? As you wish :)

s/wish/suggest/

;)

Your suggestion is granted ;) Still two lines that can be purged from the patch. But of a 'doh' moment. I hope gcc was so smart to not even enter assemble them. Stefan

Stefan Manegold

11 May 11 May

6:03 a.m.

New subject: [Monetdb-developers] Suggestion to ditch SHA-1, and embrace SHA-2

On Sun, May 10, 2009 at 02:52:14PM +0200, Stefan de Konink wrote:

...

Stefan de Konink wrote:

...
I am looking at this now I agree with you that this could help a lot. Will need to move some stuff around because of the -D option that (imho) should inline everything.

As attached.

Ok to commit?

As Fabian suggested, please explain during MADAM what you did why and how, and then we'll discuss.

...

If so could anyone help with approving tests?

First, see the Mtest & Mapprove documentation in .../testing/README and/or http://monetdb.cwi.nl/projects/monetdb/Development/TestWeb/Mtest/index.html More then during/after MADAM. Stefan

...

Stefan

...

Index: dump.c =================================================================== RCS file: /cvsroot/monetdb/clients/src/mapiclient/dump.c,v retrieving revision 1.38 diff -u -r1.38 dump.c --- dump.c 26 Apr 2009 08:56:59 -0000 1.38 +++ dump.c 10 May 2009 12:50:26 -0000 @@ -105,6 +105,279 @@ return sname; }

+static void dump_constraints(Mapi mid, char *schema, char *tname, stream *toConsole) { + + MapiHdl hdl; + int cnt; + char *query; + size_t maxquerylen = 0; + + if (tname == NULL) { + query = "SELECT \"c\".\"name\"," /* 0 */ + "\"c\".\"null\"," /* 1 */ + "\"c\".\"default\"," /* 2 */ + "\"c\".\"number\"," /* 3 */ + "\"s\".\"name\"," /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"_columns\" \"c\", " + "\"sys\".\"_tables\" \"t\", " + "\"sys\".\"schemas\" \"s\" " + "WHERE \"c\".\"table_id\" = \"t\".\"id\" " + "AND \"t\".\"schema_id\" = \"s\".\"id\" " + "ORDER BY \"s\".\"name\", \"t\".\"name\", \"number\""; + if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { + if (hdl) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + } else + mapi_explain(mid, stderr); + goto bailout; + } + + while ((mapi_fetch_row(hdl)) != 0) { + char *c_name = mapi_fetch_field(hdl, 0); + char *c_null = mapi_fetch_field(hdl, 1); + char *c_default = mapi_fetch_field(hdl, 2); + char *s_name = mapi_fetch_field(hdl, 4); + char *t_name = mapi_fetch_field(hdl, 5); + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + + if (strcmp(c_null, "false") == 0) { + stream_printf(toConsole, "ALTER TABLE "); + quoted_print(toConsole, s_name); + stream_printf(toConsole, "."); + quoted_print(toConsole, t_name); + stream_printf(toConsole, " ALTER COLUMN "); + quoted_print(toConsole, c_name); + stream_printf(toConsole, " SET NOT NULL;\n"); + } + + if (c_default != NULL) { + stream_printf(toConsole, "ALTER TABLE "); + quoted_print(toConsole, s_name); + stream_printf(toConsole, "."); + quoted_print(toConsole, t_name); + stream_printf(toConsole, " ALTER COLUMN "); + quoted_print(toConsole, c_name); + stream_printf(toConsole, " SET DEFAULT %s;\n", c_default); + } + } + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + mapi_close_handle(hdl); + } + + /* presumably we don't need to order on id, since there should + only be a single primary key, but it doesn't hurt, and the + code is then close to the code for the uniqueness + constraint */ + + if (tname != NULL) { + maxquerylen = 512 + strlen(tname) + strlen(schema); + query = malloc(maxquerylen); + snprintf(query, maxquerylen, + "SELECT \"kc\".\"column\", " /* 0 */ + "\"kc\".\"nr\", " /* 1 */ + "\"k\".\"name\", " /* 2 */ + "\"k\".\"id\", " /* 3 */ + "\"s\".\"name\", " /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"keycolumns\" \"kc\", " + "\"sys\".\"keys\" \"k\", " + "\"sys\".\"schemas\" \"s\", " + "\"sys\".\"_tables\" \"t\" " + "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " + "\"k\".\"table_id\" = \"t\".\"id\" AND " + "\"k\".\"type\" = 0 AND " + "\"t\".\"schema_id\" = \"s\".\"id\" AND " + "\"s\".\"name\" = '%s' AND " + "\"t\".\"name\" = '%s' " + "ORDER BY \"id\", \"nr\"", schema, tname); + } else { + query = "SELECT \"kc\".\"column\"," /* 0 */ + "\"kc\".\"nr\", " /* 1 */ + "\"k\".\"name\", " /* 2 */ + "\"k\".\"id\"," /* 3 */ + "\"s\".\"name\"," /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"keycolumns\" \"kc\", " + "\"sys\".\"keys\" \"k\", " + "\"sys\".\"schemas\" \"s\", " + "\"sys\".\"_tables\" \"t\" " + "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " + "\"k\".\"table_id\" = \"t\".\"id\" AND " + "\"k\".\"type\" = 0 AND " + "\"t\".\"schema_id\" = \"s\".\"id\" " + "ORDER BY \"s\".\"name\", \"t\".\"name\", \"id\", \"nr\""; + } + if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { + if (hdl) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + } else + mapi_explain(mid, stderr); + goto bailout; + } + cnt = 0; + while ((mapi_fetch_row(hdl)) != 0) { + char *c_column = mapi_fetch_field(hdl, 0); + char *k_name = mapi_fetch_field(hdl, 2); + char *s_name = mapi_fetch_field(hdl, 4); + char *t_name = mapi_fetch_field(hdl, 5); + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + if (cnt == 0) { + if (tname) { + stream_printf(toConsole, ",\n\t"); + } else { + if (s_name) { + if (cnt > 0) stream_printf(toConsole, ");\n"); + stream_printf(toConsole, "ALTER TABLE "); + quoted_print(toConsole, s_name); + stream_printf(toConsole, "."); + quoted_print(toConsole, t_name); + stream_printf(toConsole, " ADD "); + } + } + + if (k_name) { + stream_printf(toConsole, "CONSTRAINT "); + quoted_print(toConsole, k_name); + stream_write(toConsole, " ", 1, 1); + } + stream_printf(toConsole, "PRIMARY KEY ("); + } else + stream_printf(toConsole, ", "); + quoted_print(toConsole, c_column); + cnt++; + } + if (cnt) { + stream_printf(toConsole, ")"); + if (!tname) + stream_printf(toConsole, ";\n"); + } + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + mapi_close_handle(hdl); + + if (tname != NULL) { + snprintf(query, maxquerylen, + "SELECT \"kc\".\"column\"," /* 0 */ + "\"kc\".\"nr\"," /* 1 */ + "\"k\".\"name\"," /* 2 */ + "\"k\".\"id\"," /* 3 */ + "\"s\".\"name\"," /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"keycolumns\" \"kc\", " + "\"sys\".\"keys\" \"k\", " + "\"sys\".\"schemas\" \"s\", " + "\"sys\".\"_tables\" \"t\" " + "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " + "\"k\".\"table_id\" = \"t\".\"id\" AND " + "\"k\".\"type\" = 1 AND " + "\"t\".\"schema_id\" = \"s\".\"id\" AND " + "\"s\".\"name\" = '%s' AND " + "\"t\".\"name\" = '%s' " + "ORDER BY \"id\", \"nr\"", schema, tname); + } else { + query = "SELECT \"kc\".\"column\"," /* 0 */ + "\"kc\".\"nr\"," /* 1 */ + "\"k\".\"name\"," /* 2 */ + "\"k\".\"id\"," /* 3 */ + "\"s\".\"name\"," /* 4 */ + "\"t\".\"name\" " /* 5 */ + "FROM \"sys\".\"keycolumns\" \"kc\", " + "\"sys\".\"keys\" \"k\", " + "\"sys\".\"schemas\" \"s\", " + "\"sys\".\"_tables\" \"t\" " + "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " + "\"k\".\"table_id\" = \"t\".\"id\" AND " + "\"k\".\"type\" = 1 AND " + "\"t\".\"schema_id\" = \"s\".\"id\" " + "ORDER BY \"s\".\"name\", \"t\".\"name\",\"id\", \"nr\""; + } + + if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { + if (hdl) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + } else + mapi_explain(mid, stderr); + goto bailout; + } + cnt = 0; + while ((mapi_fetch_row(hdl)) != 0) { + char *c_column = mapi_fetch_field(hdl, 0); + char *kc_nr = mapi_fetch_field(hdl, 1); + char *k_name = mapi_fetch_field(hdl, 2); + char *s_name = mapi_fetch_field(hdl, 4); + char *t_name = mapi_fetch_field(hdl, 5); + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + if (strcmp(kc_nr, "0") == 0) { + if (cnt) + stream_write(toConsole, ")", 1, 1); + + if (t_name) { + stream_printf(toConsole, ",\n\t"); + } else { + if (cnt > 0) stream_printf(toConsole, ");\n"); + stream_printf(toConsole, "ALTER TABLE "); + quoted_print(toConsole, s_name); + stream_printf(toConsole, "."); + quoted_print(toConsole, t_name); + stream_printf(toConsole, " ADD "); + } + + if (k_name) { + stream_printf(toConsole, "CONSTRAINT "); + quoted_print(toConsole, k_name); + stream_write(toConsole, " ", 1, 1); + } + stream_printf(toConsole, "UNIQUE ("); + cnt = 1; + } else + stream_printf(toConsole, ", "); + quoted_print(toConsole, c_column); + } + if (cnt) { + stream_write(toConsole, ")", 1, 1); + if (!tname) + stream_printf(toConsole, ";\n"); + } + + if (mapi_error(mid)) { + mapi_explain_query(hdl, stderr); + mapi_close_handle(hdl); + goto bailout; + } + mapi_close_handle(hdl); + +bailout: + if (query != NULL && maxquerylen != 0) + free(query); +} + static void dump_foreign_keys(Mapi mid, char *schema, char *tname, stream *toConsole) { @@ -117,14 +390,14 @@ maxquerylen = 1024 + strlen(tname) + strlen(schema); query = malloc(maxquerylen); snprintf(query, maxquerylen, - "SELECT \"ps\".\"name\"," /* 0 */ - "\"pkt\".\"name\"," /* 1 */ - "\"pkkc\".\"column\"," /* 2 */ - "\"fkkc\".\"column\"," /* 3 */ - "\"fkkc\".\"nr\"," /* 4 */ - "\"fkk\".\"name\"," /* 5 */ - "\"fkk\".\"action\"," /* 6 */ - "\"fs\".\"name\"," /* 7 */ + "SELECT \"ps\".\"name\", " /* 0 */ + "\"pkt\".\"name\", " /* 1 */ + "\"pkkc\".\"column\", " /* 2 */ + "\"fkkc\".\"column\", " /* 3 */ + "\"fkkc\".\"nr\", " /* 4 */ + "\"fkk\".\"name\", " /* 5 */ + "\"fkk\".\"action\", " /* 6 */ + "\"fs\".\"name\", " /* 7 */ "\"fkt\".\"name\" " /* 8 */ "FROM \"sys\".\"_tables\" \"fkt\"," "\"sys\".\"keycolumns\" \"fkkc\"," @@ -478,71 +751,14 @@ } else { stream_printf(toConsole, "%s(%s,%s)", c_type, c_type_digits, c_type_scale); } - if (strcmp(c_null, "false") == 0) - stream_printf(toConsole, " NOT NULL"); - if (c_default != NULL) - stream_printf(toConsole, " DEFAULT %s", c_default); - cnt++; - } - if (mapi_error(mid)) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - goto bailout; - } - mapi_close_handle(hdl); - /* presumably we don't need to order on id, since there should - only be a single primary key, but it doesn't hurt, and the - code is then close to the code for the uniqueness - constraint */ - snprintf(query, maxquerylen, - "SELECT \"kc\".\"column\"," /* 0 */ - "\"kc\".\"nr\", " /* 1 */ - "\"k\".\"name\", " /* 2 */ - "\"k\".\"id\" " /* 3 */ - "FROM \"sys\".\"keycolumns\" \"kc\", " - "\"sys\".\"keys\" \"k\", " - "\"sys\".\"schemas\" \"s\", " - "\"sys\".\"_tables\" \"t\" " - "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " - "\"k\".\"table_id\" = \"t\".\"id\" AND " - "\"k\".\"type\" = 0 AND " - "\"t\".\"schema_id\" = \"s\".\"id\" AND " - "\"s\".\"name\" = '%s' AND " - "\"t\".\"name\" = '%s' " - "ORDER BY \"id\", \"nr\"", schema, tname); - if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { - if (hdl) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - } else - mapi_explain(mid, stderr); - goto bailout; - } - cnt = 0; - while ((mapi_fetch_row(hdl)) != 0) { - char *c_column = mapi_fetch_field(hdl, 0); - char *k_name = mapi_fetch_field(hdl, 2); - - if (mapi_error(mid)) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - goto bailout; + if (describe) { + if (strcmp(c_null, "false") == 0) + stream_printf(toConsole, " NOT NULL"); + if (c_default != NULL) + stream_printf(toConsole, " DEFAULT %s", c_default); } - if (cnt == 0) { - stream_printf(toConsole, ",\n\t"); - if (k_name) { - stream_printf(toConsole, "CONSTRAINT "); - quoted_print(toConsole, k_name); - stream_write(toConsole, " ", 1, 1); - } - stream_printf(toConsole, "PRIMARY KEY ("); - } else - stream_printf(toConsole, ", "); - quoted_print(toConsole, c_column); cnt++; } - if (cnt) - stream_printf(toConsole, ")"); if (mapi_error(mid)) { mapi_explain_query(hdl, stderr); mapi_close_handle(hdl); @@ -550,64 +766,8 @@ } mapi_close_handle(hdl);

- snprintf(query, maxquerylen, - "SELECT \"kc\".\"column\"," /* 0 */ - "\"kc\".\"nr\", " /* 1 */ - "\"k\".\"name\", " /* 2 */ - "\"k\".\"id\" " /* 3 */ - "FROM \"sys\".\"keycolumns\" \"kc\", " - "\"sys\".\"keys\" \"k\", " - "\"sys\".\"schemas\" \"s\", " - "\"sys\".\"_tables\" \"t\" " - "WHERE \"kc\".\"id\" = \"k\".\"id\" AND " - "\"k\".\"table_id\" = \"t\".\"id\" AND " - "\"k\".\"type\" = 1 AND " - "\"t\".\"schema_id\" = \"s\".\"id\" AND " - "\"s\".\"name\" = '%s' AND " - "\"t\".\"name\" = '%s' " - "ORDER BY \"id\", \"nr\"", schema, tname); - if ((hdl = mapi_query(mid, query)) == NULL || mapi_error(mid)) { - if (hdl) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - } else - mapi_explain(mid, stderr); - goto bailout; - } - cnt = 0; - while ((mapi_fetch_row(hdl)) != 0) { - char *c_column = mapi_fetch_field(hdl, 0); - char *kc_nr = mapi_fetch_field(hdl, 1); - char *k_name = mapi_fetch_field(hdl, 2); - - if (mapi_error(mid)) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - goto bailout; - } - if (strcmp(kc_nr, "0") == 0) { - if (cnt) - stream_write(toConsole, ")", 1, 1); - stream_printf(toConsole, ",\n\t"); - if (k_name) { - stream_printf(toConsole, "CONSTRAINT "); - quoted_print(toConsole, k_name); - stream_write(toConsole, " ", 1, 1); - } - stream_printf(toConsole, "UNIQUE ("); - cnt = 1; - } else - stream_printf(toConsole, ", "); - quoted_print(toConsole, c_column); - } - if (cnt) - stream_write(toConsole, ")", 1, 1); - if (mapi_error(mid)) { - mapi_explain_query(hdl, stderr); - mapi_close_handle(hdl); - goto bailout; - } - mapi_close_handle(hdl); + if (describe) + dump_constraints(mid, schema, tname, toConsole);

if (foreign) dump_foreign_keys(mid, schema, tname, toConsole); @@ -1158,6 +1318,7 @@ mapi_close_handle(hdl);

if (!describe) { + dump_constraints(mid, NULL, NULL, toConsole); dump_foreign_keys(mid, NULL, NULL, toConsole);

/* dump sequences, part 2 */

...

------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Stefan de Konink

17 May 17 May

6:26 p.m.

New subject: [Monetdb-developers] Suggestion to ditch SHA-1, and embrace SHA-2

Roberto Cornacchia wrote:

...

2. The restore is almost unusable for large databases, as it is extremely slow. I suspect the main reasons being: - COPY INTO .. FROM STDIN (used in the dump) is much much slower than copying from a file, regardless of whether the number or record to copy is given or not.

This is my conclusion as well.

...

- Column constraints, as primary key, not null, etc, are issued upon table creation, before the COPY INTO, rather than being postponed with an ALTER TABLE after the copy (FOREIGN KEYS are instead postponed to the end of the script).

I have interesting results; it seems to be actually ~5% faster to *insert* with a primary key, opposed to postpone it (!) This even excludes the time to enforce the primary key afterwards. I have verified in different runs that it is not related to the 'NOT NULL' constraints. So this is a pretty surprising results. The data was highly structural and had a multicolumn primary key. Stefan

Stefan Manegold

18 May 18 May

9:40 a.m.

New subject: [Monetdb-developers] Performance of COPY INTO: data source & constraint checking

On Sun, May 17, 2009 at 08:26:07PM +0200, Stefan de Konink wrote:

...

Roberto Cornacchia wrote:

...
2. The restore is almost unusable for large databases, as it is extremely slow. I suspect the main reasons being: - COPY INTO .. FROM STDIN (used in the dump) is much much slower than copying from a file, regardless of whether the number or record to copy is given or not.

This is my conclusion as well.

In case you came to your conclusion from experimental/emperical study, I would be very interested to know about the experiemntes performed and the results measured. In that case, I would be verythankful if you could share the informative details with us, i.e. detailed descriptions of - experimental setup (HW, SW, datasets, workload/queries) - measured results (what was measured how, and what are the results?) - (your) discussion/analysis/interpretation of the results

...

...
- Column constraints, as primary key, not null, etc, are issued upon table creation, before the COPY INTO, rather than being postponed with an ALTER TABLE after the copy (FOREIGN KEYS are instead postponed to the end of the script).

I have interesting results; it seems to be actually ~5% faster to *insert* with a primary key, opposed to postpone it (!) This even excludes the time to enforce the primary key afterwards.

I have verified in different runs that it is not related to the 'NOT NULL' constraints. So this is a pretty surprising results.

Same as above: If you consider your results surprising, I'd be curious to learn about them and would hence be thankful if you could share them with us (see above).

...

The data was highly structural and had a multicolumn primary key.

Could you please reveal what you refer to as "highly structural", and how many columns of which type the primary key contained? Could you please elaborate on whether you consider you setup/schema "representaive" and/or whether/to which extend you think that your results for one specific case could be generalized to other cases? You might want to study (e.g.) any or all of http://old-www.cwi.nl/htbin/ins1/publications?request=abstract&key=MaMa:EDBT:09 http://old-www.cwi.nl/htbin/ins1/publications?request=abstract&key=Ma:IS:08 http://old-www.cwi.nl/htbin/ins1/publications?request=abstract&key=SiGoKeNeMa:VLDB:08 on how to design, perform, and in particular report on repeatable experiemental studies. Please don't hesitate to ask for specific advice/help in case of doubt. Looking to learning about your detailed experiences! Stefan

...

Stefan

------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers

Stefan de Konink

9:57 a.m.

New subject: [Monetdb-developers] Performance of COPY INTO: data source & constraint checking

On Mon, 18 May 2009, Stefan Manegold wrote:

...

In case you came to your conclusion from experimental/emperical study, I would be very interested to know about the experiemntes performed and the results measured. In that case, I would be verythankful if you could share the informative details with us, i.e. detailed descriptions of - experimental setup (HW, SW, datasets, workload/queries) - measured results (what was measured how, and what are the results?) - (your) discussion/analysis/interpretation of the results

I thought the MADAM meeting was for this.

...

Could you please elaborate on whether you consider you setup/schema "representaive" and/or whether/to which extend you think that your results for one specific case could be generalized to other cases?

Technically the only thing I have to prove to get my code reverted is that there are cases where late contraints hurt performance. Stefan

Stefan Manegold

11:03 a.m.

New subject: [Monetdb-developers] Performance of COPY INTO: data source & constraint checking

On Mon, May 18, 2009 at 11:57:57AM +0200, Stefan de Konink wrote:

...

On Mon, 18 May 2009, Stefan Manegold wrote:

...
In case you came to your conclusion from experimental/emperical study, I would be very interested to know about the experiemntes performed and the results measured. In that case, I would be verythankful if you could share the informative details with us, i.e. detailed descriptions of - experimental setup (HW, SW, datasets, workload/queries) - measured results (what was measured how, and what are the results?) - (your) discussion/analysis/interpretation of the results

I thought the MADAM meeting was for this.

Ok. fine. ... we should then revive the ideas of keeping note of MADAMs (e.g., on a Wiki or alike) as reference for those who cannot be present ...

...

...
Could you please elaborate on whether you consider you setup/schema "representaive" and/or whether/to which extend you think that your results for one specific case could be generalized to other cases?

Technically the only thing I have to prove to get my code reverted is that there are cases where late contraints hurt performance.

IMHO, our task should be the following: - analyze whether there is indeed a significat performance difference between early and late constraint checking; (5% do not justify any further time to be spent on this) - and if so, analyze under which circumstances which altertative performs (significatly) better or worse than the other; - and (only) then try to find out where the (significat) performance difference comes from - and (only) then + either try to eliminate the difference; + or * make dump/restore use the most suitable alternative (as far as possible with reasonable effort); * and document our experiences to give adivce for users Stefan

...

Stefan

Roberto Cornacchia

24 Jul 24 Jul

10:53 a.m.

New subject: [Monetdb-developers] Performance of COPY INTO: data source & constraint checking

I come back to this issue with some more interesting results. I did mention in the past that 'COPY INTO ... FROM stdin ..' seems to be a lot slower than 'COPY INTO ... FROM <filename> ...' Actually, I have just found something that may shift the problem away from the stdin itself. Look at the following two methods of restoring a previously dumped (rather small) database. RESTORE METHOD 1 (far too slow): $ mclient < backup.sql [ 62 ] [ 70199 ] [ 288 ] [ 70322 ] [ 69 ] [ 423 ] [ 280596 ] [ 4302 ] [ 49038 ] [ 1 ] [ 10 ] [ 6943 ] [ 6943 ] [ 3424 ] [ 3424 ] [ 3424 ] [ 1 ] [ 6943 ] [ 3424 ] TIME elapsed: 68 seconds RESTORE METHOD 2: $ echo "\< backup.sql" | mclient [ 62 ] [ 70199 ] [ 288 ] [ 70322 ] [ 69 ] [ 423 ] [ 280596 ] [ 4302 ] [ 49038 ] [ 1 ] [ 10 ] [ 6943 ] [ 6943 ] [ 3424 ] [ 3424 ] [ 3424 ] [ 1 ] [ 6943 ] [ 3424 ] TIME elapsed: 1 seconds Can anyone explain this difference? One further consideration: Method 2 works fine, but cannot be used to load gzipped dumps. Maybe this can be a feature request: '\<' should accept gzipped input. Roberto On Mon, 2009-05-18 at 13:03 +0200, Stefan Manegold wrote:

...

On Mon, May 18, 2009 at 11:57:57AM +0200, Stefan de Konink wrote:

...
On Mon, 18 May 2009, Stefan Manegold wrote:

...
In case you came to your conclusion from experimental/emperical study, I would be very interested to know about the experiemntes performed and the results measured. In that case, I would be verythankful if you could share the informative details with us, i.e. detailed descriptions of - experimental setup (HW, SW, datasets, workload/queries) - measured results (what was measured how, and what are the results?) - (your) discussion/analysis/interpretation of the results

I thought the MADAM meeting was for this.

Ok. fine.

... we should then revive the ideas of keeping note of MADAMs (e.g., on a Wiki or alike) as reference for those who cannot be present ...

...
...
Could you please elaborate on whether you consider you setup/schema "representaive" and/or whether/to which extend you think that your results for one specific case could be generalized to other cases?

Technically the only thing I have to prove to get my code reverted is that there are cases where late contraints hurt performance.

IMHO, our task should be the following:

- analyze whether there is indeed a significat performance difference between early and late constraint checking; (5% do not justify any further time to be spent on this) - and if so, analyze under which circumstances which altertative performs (significatly) better or worse than the other; - and (only) then try to find out where the (significat) performance difference comes from - and (only) then + either try to eliminate the difference; + or * make dump/restore use the most suitable alternative (as far as possible with reasonable effort); * and document our experiences to give adivce for users

Stefan

...
Stefan

-- | M.Sc. Roberto Cornacchia | CWI (Centrum voor Wiskunde en Informatica) | Science Park 123, 1098XG Amsterdam, The Netherlands | tel: +31 20 592 4322 , http://www.cwi.nl/~roberto

Sjoerd Mullender

12:30 p.m.

New subject: [Monetdb-developers] Performance of COPY INTO: data source & constraint checking

Roberto Cornacchia wrote:

...

I come back to this issue with some more interesting results.

I did mention in the past that 'COPY INTO ... FROM stdin ..' seems to be a lot slower than 'COPY INTO ... FROM <filename> ...'

Actually, I have just found something that may shift the problem away from the stdin itself.

Look at the following two methods of restoring a previously dumped (rather small) database.

RESTORE METHOD 1 (far too slow):

$ mclient < backup.sql [ 62 ] [ 70199 ] [ 288 ] [ 70322 ] [ 69 ] [ 423 ] [ 280596 ] [ 4302 ] [ 49038 ] [ 1 ] [ 10 ] [ 6943 ] [ 6943 ] [ 3424 ] [ 3424 ] [ 3424 ] [ 1 ] [ 6943 ] [ 3424 ] TIME elapsed: 68 seconds

RESTORE METHOD 2:

$ echo "\< backup.sql" | mclient [ 62 ] [ 70199 ] [ 288 ] [ 70322 ] [ 69 ] [ 423 ] [ 280596 ] [ 4302 ] [ 49038 ] [ 1 ] [ 10 ] [ 6943 ] [ 6943 ] [ 3424 ] [ 3424 ] [ 3424 ] [ 1 ] [ 6943 ] [ 3424 ] TIME elapsed: 1 seconds

Can anyone explain this difference?

When mclient reads from standard input, it reads a line at a time. When it reads from file (either using the \< construct or as a command line argument (i.e. without <)) it reads in large chunks.

...

One further consideration: Method 2 works fine, but cannot be used to load gzipped dumps. Maybe this can be a feature request: '\<' should accept gzipped input.

Patches are gratefully accepted. ;-)

...

Roberto

On Mon, 2009-05-18 at 13:03 +0200, Stefan Manegold wrote:

...
On Mon, May 18, 2009 at 11:57:57AM +0200, Stefan de Konink wrote:

...
On Mon, 18 May 2009, Stefan Manegold wrote:

...
In case you came to your conclusion from experimental/emperical study, I would be very interested to know about the experiemntes performed and the results measured. In that case, I would be verythankful if you could share the informative details with us, i.e. detailed descriptions of - experimental setup (HW, SW, datasets, workload/queries) - measured results (what was measured how, and what are the results?) - (your) discussion/analysis/interpretation of the results I thought the MADAM meeting was for this. Ok. fine.

... we should then revive the ideas of keeping note of MADAMs (e.g., on a Wiki or alike) as reference for those who cannot be present ...

...
...
Could you please elaborate on whether you consider you setup/schema "representaive" and/or whether/to which extend you think that your results for one specific case could be generalized to other cases? Technically the only thing I have to prove to get my code reverted is that there are cases where late contraints hurt performance. IMHO, our task should be the following:

- analyze whether there is indeed a significat performance difference between early and late constraint checking; (5% do not justify any further time to be spent on this) - and if so, analyze under which circumstances which altertative performs (significatly) better or worse than the other; - and (only) then try to find out where the (significat) performance difference comes from - and (only) then + either try to eliminate the difference; + or * make dump/restore use the most suitable alternative (as far as possible with reasonable effort); * and document our experiences to give adivce for users

Stefan

...
Stefan

-- Sjoerd Mullender

Fabian Groffen

10 May 10 May

7 p.m.

New subject: [Monetdb-developers] Suggestion to ditch SHA-1, and embrace SHA-2

On 08-05-2009 19:54:38 +0200, Stefan de Konink wrote:

...

As already found in the news SHA-1 is advised to migrated off by 2010. I would suggest SHA-2 in protocol 9.

I implemented your suggestion. trunk users that cannot (or simply will not) dump and restore their database can configure MonetDB5 with --with-password-backend=SHA1 to retain compatability with the current format.

5666

Age (days ago)

5743

Last active (days ago)

List overview

Download

19 comments

6 participants

participants (6)

Fabian Groffen
Martin Kersten
Roberto Cornacchia
Sjoerd Mullender
Stefan de Konink
Stefan Manegold