- users-list - monetdb.org

Merge table error with one table
by Ioannis Foufoulas 17 Sep '20

17 Sep '20

Hi, we encounter an error when only one table is added to a merge table definition During a query like "select * from merge table” we get: “Table: missing ')'" Probably merge tables cannot contain just one table but the returned error looks like a syntax error. Some more info to reproduce: the table that has been added to the merge table is a remote table. Best, Yannis

3 4

Recommended compiler version and flags for Linux x86_64
by Daniel Glöckner 15 Sep '20

15 Sep '20

Hi, I'm currently building and testing MonetDB on CentOS7, x86_64. I was wondering about the recommended compiler version and compiler flags. So far I'm using gcc 9 and specifically enabled SSE3 extensions (though I did not notice any performance gain). Here's my cmake command cmake3 -DCMAKE_C_FLAGS="pthread -msse3 -O -Wall" -DCMAKE_INSTALL_PREFIX=../install ../MonetDB Is there a list of recommended compiler flags? Kind regards, Daniel

4 4

Nested tables?
by louie＠gmx.cn 14 Sep '20

14 Sep '20

Hi, Does MonetDB support nested tables? Oracle diagram for reference: https://docs.oracle.com/cd/B10501_01/appdev.920/a96594/adobjdes.htm#441615 Thanks, Louie

2 1

Default username and password: documented?
by Stefan Manegold 14 Sep '20

14 Sep '20

Hi, I just realize that, while the default (system-/admin-)username and password for a newly generated database in MonetDB are well known in the experienced MonetDB community, where do novice users find this information? I cannot find any explicit (or at least implicit?) documentation of this information, but maybe it's just that I fail to search properly? Any help is highly appreciated! Thanks! Best, Stefan -- | Stefan.Manegold(a)CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

3 3

Why MonetDB uses Materialization
by Amit Pandey 11 Sep '20

11 Sep '20

Hey All, Thanks for the great open source project. So while going through some materials in a CMU DB course I saw it mentioned that MonetDB uses full materialization, and this is a bit weird since in c-store potentially scanning a billion columns a full materialization may take a lot of time and the classic volcano model might be better. Can you guys please explain the rationale ? These are the slides :- https://15445.courses.cs.cmu.edu/fall2019/slides/12-queryexecution1.pdfMone… is mentioned in Slide 11. Thanks and Regards

3 7

Bad query performance after lots of updates
by Daniel Glöckner 01 Sep '20

01 Sep '20

Hi, select count(*) from table where field = 'some value'; --> a few ms update table set field = 'value'; --> update 2 million rows out of 1 billion rows select count(*) from table where field = 'some value'; --> almost a minute :( Analyze did not help and sys.vacuum could not be used as the table has indexes (foreign key). Re-starting the DB helped ;) What's the recommended way to make MonetDB perform well after lots of updates? Kind regards, Daniel

2 1

compilers
by Roberto Cornacchia 01 Sep '20

01 Sep '20

Hi, I was wondering whether anyone has run any benchmarks with MonetDB for testing the effect of compilation options. For example, any combination of * gcc vs clang * -O1 vs -O2 vs -O3 * -march and -mtune I am actually planning to do this myself, but knowing about any previous evidence (even anecdotal) would be helpful. Cheers, Roberto

1 0

custom joins in Jun2020
by Roberto Cornacchia 27 Aug '20

27 Aug '20

This is a question about join order. In general, about how it changed from Nov2019 to Jun2020 releases. In particular, with respect to custom joins (filter functions). With a schema: CREATE TABLE t1(s string); CREATE TABLE t2(s string); Consider the following 2 queries, which only differ for having swapped conditions: Q1: SELECT t1.s, t2.s FROM t1, t2 WHERE t1.s <> t2.s AND [t1.s] maxlevenshtein [t2.s, 1]; Q2: SELECT t1.s, t2.s FROM t1, t2 WHERE [t1.s] maxlevenshtein [t2.s, 1] AND t1.s <> t2.s; [t1.s] maxlevenshtein [t2.s, 1] is equivalent to levenshtein(t1.s, t2.s) <=1 (i.e. the two strings have a levenshtein distance at most 1) This is a relatively expensive and selective function. In Nov2019, both Q1 and Q2 are translated to: - "maxlevenshtein" custom join - "!=" selection on the result In Jun2020, the two queries happen to be evaluated in the same order as they are written. Which means that Q2 is evaluated as: - "!=" join - "maxlevenshtein" selection This last evaluation plan is unfortunately not viable at all. The first join is not very selective, and the "maxlevenshtein" selection is run on way too many pairs, without the optimizations that can be exploited in a join (in a join, it is possible to skip the actual levenshtein computation for most combinations). Q2 in Jun2020 is 2 orders of magnitude slower than Q1, which quickly leads to unreasonably long query times. Of course, this is just one specific case. A very unfortunate one, due to the combination of a couple of factors: - The "!=" join is not selective enough - The custom function is an expensive one I guess my real questions are: - Is it by chance (or, as a by-product of more generic join ordering rules) that Nov2019 executes custom joins first in both Q1 and Q2, or was it an intentional choice to first execute custom joins? - What would a reasonable approach be? Is it reasonable to assume that if one writes a custom join, this is expected to use an expensive comparison function and that the join implementation can be much more efficient than the selection implementation (by skipping unnecessary comparisons)? If no assumptions can be made, can there be a way to annotate custom implementations with information on selectivity and cost? Thanks for you input, Roberto

1 1

Loading timestamps via Python loader functions
by Daniel Glöckner 26 Aug '20

26 Aug '20

Hi, there seems to be a limitation for Python-based loader functions wrt timestamps. Looking at the implementation in sql/backends/monet5/UDF/pyapi3/convert_loops.h suggests that timestamps are indeed not covered. Both of the the 2 following examples fail with "Failed conversion: MALException:pyapi3.eval:PY000!Unrecognized type. Could not convert to NPY_UNICODE.". CREATE LOADER array_loader() LANGUAGE PYTHON { from datetime import datetime _emit.emit( { 'a': [1,2,3], 'b': [datetime.utcnow(), datetime.utcnow(), datetime.utcnow()], 'c': ['1', '2', '3']}) }; CREATE LOADER array_loader() LANGUAGE PYTHON { import numpy as np from datetime import datetime dt = datetime.utcnow() _emit.emit( { 'a': [1,2,3], 'b': np.array([dt, dt, dt], dtype=np.datetime64), 'c': ['1', '2', '3']}) }; Is anyone aware of a workaround? An obvious workaround is to insert the timestamps as strings and convert them via SQL but that's much less efficient. Kind regards, Daniel

1 0

result properties in pcre.c
by Roberto Cornacchia 25 Aug '20

25 Aug '20

Hi, I was looking at pcre.c for some inspiration and found something suspicious. At the end of both pcre_likeselect() and re_likeselect(): if (bn && !msg) { BATsetcount(bn, BATcount(bn)); /* set some properties */ bn->tsorted = true; bn->trevsorted = bn->batCount <= 1; bn->tkey = true; bn->tseqbase = bn->batCount == 0 ? 0 : bn->batCount == 1 ? * (oid *) Tloc(bn, 0) : oid_nil; } Which I read as: if everything went well, then the result is sorted and key. But I miss why it should be sorted and key. Roberto

2 2