MonetDB performance disappointing

older
Concurrency test causes crash of...

Martin Schwitalla

25 Jun 2015 25 Jun '15

3:17 p.m.

Hello, I`m doing some experiments with a database, which originally runs on PostgreSQL. But some queries take very long on this system. Especially one query type, which takes about 14s to finish. The idea was to port it to MonetDB, because the database is not that big (~8GB) and it`s an analytical task. But suprisingly MonetDB takes even longer to finish. With the default pipeline it takes about 22-23s. I tried various pipelines and the minimal_pipe was the best. It finished the task in about 16,5s, better, but still not great. I`m asking myself what I could do to further improve the performance. I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins. Do you have any advices for me? Best regards, Martin

Show replies by date

Stefan Manegold

25 Jun 25 Jun

3:20 p.m.

Dear Martin, to be able to provide qualified and useful help, we'd need some more information about your data (type & volume), query, hardware & OS environment, MonetDB version, etc. It would be great, if you could share this information with us. Best, Stefan ----- On Jun 25, 2015, at 5:17 PM, Martin Schwitalla MSchwitalla1@gmx.de wrote:

...

Hello,

I`m doing some experiments with a database, which originally runs on PostgreSQL. But some queries take very long on this system. Especially one query type, which takes about 14s to finish. The idea was to port it to MonetDB, because the database is not that big (~8GB) and it`s an analytical task. But suprisingly MonetDB takes even longer to finish. With the default pipeline it takes about 22-23s. I tried various pipelines and the minimal_pipe was the best. It finished the task in about 16,5s, better, but still not great. I`m asking myself what I could do to further improve the performance.

I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

Do you have any advices for me?

Best regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Martin Schwitalla

3:38 p.m.

Hi, ok, i will try :) MonetDB version: 11.19.9 (Oct2014-SP2) OS: Ubuntu 14.04.2 LTS Hardware: 16GB RAM, Intel Core something. Sorry, but I`m currently unable to get this information. SSD data: Gene information with some varchars query: SELECT anon_1.sample_id AS anon_1_sample_id, anon_1.variant_id AS anon_1_variant_id, anon_1.qual AS anon_1_qual, anon_1.is_heterozygous AS anon_1_is_heterozygous, anon_1.read_depth AS anon_1_read_depth, anon_1.ref_depth AS anon_1_ref_depth, anon_1.alt_depth AS anon_1_alt_depth, anon_1.strand_bias AS anon_1_strand_bias, anon_1.qual_by_depth AS anon_1_qual_by_depth, anon_1.mapping_qual AS anon_1_mapping_qual, anon_1.haplotype_score AS anon_1_haplotype_score, anon_1.mapping_qual_bias AS anon_1_mapping_qual_bias, anon_1.read_pos_bias AS anon_1_read_pos_bias, annotations.variant_id AS annotations_variant_id, annotations.feature_id AS annotations_feature_id, annotations.ref_codon AS annotations_ref_codon, annotations.alt_codon AS annotations_alt_codon, annotations.ref_acid AS annotations_ref_acid, annotations.alt_acid AS annotations_alt_acid, annotations.type AS annotations_type, annotations.region AS annotations_region, annotations.splice_dist AS annotations_splice_dist FROM (SELECT calls.sample_id AS sample_id, calls.variant_id AS variant_id, calls.qual AS qual, calls.is_heterozygous AS is_heterozygous, calls.read_depth AS read_depth, calls.ref_depth AS ref_depth, calls.alt_depth AS alt_depth, calls.strand_bias AS strand_bias, calls.qual_by_depth AS qual_by_depth, calls.mapping_qual AS mapping_qual, calls.haplotype_score AS haplotype_score, calls.mapping_qual_bias AS mapping_qual_bias, calls.read_pos_bias AS read_pos_bias FROM calls JOIN samples ON samples.id = calls.sample_id JOIN patients ON patients.id = samples.patient_id JOIN diseases ON diseases.id = samples.disease_id LEFT OUTER JOIN (SELECT calls.sample_id AS sample_id, calls.variant_id AS variant_id, calls.qual AS qual, calls.is_heterozygous AS is_heterozygous, calls.read_depth AS read_depth, calls.ref_depth AS ref_depth, calls.alt_depth AS alt_depth, calls.strand_bias AS strand_bias, calls.qual_by_depth AS qual_by_depth, calls.mapping_qual AS mapping_qual, calls.haplotype_score AS haplotype_score, calls.mapping_qual_bias AS mapping_qual_bias, calls.read_pos_bias AS read_pos_bias FROM calls WHERE calls.qual > 0 AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368)) AS anon_2 ON anon_2.variant_id = calls.variant_id LEFT OUTER JOIN known ON known.variant_id = calls.variant_id AND known.source_id IN (1, 2, 20, 19, 46) WHERE samples.accession IN ('17041R5', '20195R', '21984R', '23273R', '23390R', '13264R', '18337R', '18533R', '19811R', '20039R', '21776R', '21809R', '22927R', '17294R', '17071R', '21016R', 'Greif1R', '18337T', '18533T', '19811T', '20039T', '21809T', '22927T', '20195T', '21984T', '23273T', '23390T', '17041T', '17294T', '17071T', '21016T', '13264T', '21776T', 'Greif1T1S') AND calls.qual >= 50 AND anon_2.variant_id IS NULL AND (known.variant_id IS NULL OR known.clinical AND (NOT known.clinical_significance = 2) OR known.precious OR known.locus_specific_db)) AS anon_1 JOIN variants ON variants.id = anon_1.variant_id JOIN annotations ON variants.id = annotations.variant_id JOIN features ON features.id = annotations.feature_id JOIN transcripts ON transcripts.id = features.transcript_id JOIN genes ON genes.id = transcripts.gene_id WHERE (variants.is_transition IS NULL OR variants.is_transversion IS NULL OR (variants.is_transition OR variants.is_transversion) AND (anon_1.strand_bias IS NULL OR anon_1.strand_bias <= 60.0) AND (anon_1.qual_by_depth IS NULL OR anon_1.qual_by_depth >= 2.0) AND (anon_1.mapping_qual IS NULL OR anon_1.mapping_qual >= 40.0) AND (anon_1.haplotype_score IS NULL OR anon_1.haplotype_score <= 13.0) AND (anon_1.mapping_qual_bias IS NULL OR anon_1.mapping_qual_bias >= -12.5) AND (anon_1.read_pos_bias IS NULL OR anon_1.read_pos_bias >= -8.0) OR NOT (variants.is_transition OR variants.is_transversion) AND (anon_1.strand_bias IS NULL OR anon_1.strand_bias <= 200.0) AND (anon_1.qual_by_depth IS NULL OR anon_1.qual_by_depth >= 2.0) AND (anon_1.read_pos_bias IS NULL OR anon_1.read_pos_bias >= -20.0)) AND (abs(annotations.splice_dist) <= 10 OR annotations.region = 2 AND (NOT annotations.type = 2) OR annotations.region = 3 OR annotations.region = 1); annotations has 7726144 tupels variants has 61132844 tupels features has 2280612 tupels transcripts has 213272 tupels genes has 62069 tupels known has 90066562 tupels calls has 18885425 tupels patients and diseases have only some tupels. I hope this is something to start with. annotations has Best regards, Martin

...

Am 25.06.2015 um 17:20 schrieb Stefan Manegold :

Dear Martin,

to be able to provide qualified and useful help, we'd need some more information about your data (type & volume), query, hardware & OS environment, MonetDB version, etc.

It would be great, if you could share this information with us.

Best, Stefan

----- On Jun 25, 2015, at 5:17 PM, Martin Schwitalla MSchwitalla1@gmx.de wrote:

...
Hello,

I`m doing some experiments with a database, which originally runs on PostgreSQL. But some queries take very long on this system. Especially one query type, which takes about 14s to finish. The idea was to port it to MonetDB, because the database is not that big (~8GB) and it`s an analytical task. But suprisingly MonetDB takes even longer to finish. With the default pipeline it takes about 22-23s. I tried various pipelines and the minimal_pipe was the best. It finished the task in about 16,5s, better, but still not great. I`m asking myself what I could do to further improve the performance.

I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

Do you have any advices for me?

Best regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Martin Kersten

8:08 p.m.

Hi On 25/06/15 17:38, Martin Schwitalla wrote: ...

...

...
...
I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....)) The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution. regards, Martin

Martin Schwitalla

26 Jun 26 Jun

11:45 a.m.

Hi, the prefilter makes it faster, but somehow i get three times more tupels in the result than before. How is that possible? ((calls.sample_id BETWEEN 226 AND 369) AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368))) AS anon_2 Best regards, Martin

...

Am 25.06.2015 um 22:08 schrieb Martin Kersten :

Hi

On 25/06/15 17:38, Martin Schwitalla wrote: ...

...
...
...
I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....))

The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution.

regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Martin Schwitalla

12:14 p.m.

Well ok, the values for the between statement were wrong. It should be between 189 and 368 :) The result is now right, but the execution time stays the same or is even minimal longer than before. I have uploaded a trace to my dropbox of the query. https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0 https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0 Regards, Martin

...

Am 26.06.2015 um 13:45 schrieb Martin Schwitalla :

Hi,

the prefilter makes it faster, but somehow i get three times more tupels in the result than before. How is that possible?

((calls.sample_id BETWEEN 226 AND 369) AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368))) AS anon_2

Best regards, Martin

...
Am 25.06.2015 um 22:08 schrieb Martin Kersten :

Hi

On 25/06/15 17:38, Martin Schwitalla wrote: ...

...
...
...
I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....))

The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution.

regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Niels Nes

28 Jun 28 Jun

11:14 a.m.

On Fri, Jun 26, 2015 at 02:14:18PM +0200, Martin Schwitalla wrote:

...

Well ok, the values for the between statement were wrong. It should be between 189 and 368 :) The result is now right, but the execution time stays the same or is even minimal longer than before. I have uploaded a trace to my dropbox of the query.

https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0

Martin Could you send the output of plan, ie plan followed by your query? That should return the relational plan. It seems the join order is incorrect (not optimal) and MonetDB seems to have some problems with the many 'OR's in your query. Niels

...

Regards, Martin

Am 26.06.2015 um 13:45 schrieb Martin Schwitalla < MSchwitalla1@gmx.de>:

Hi,

the prefilter makes it faster, but somehow i get three times more tupels in the result than before. How is that possible?

((calls.sample_id BETWEEN 226 AND 369) AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368))) AS anon_2

Best regards, Martin

Am 25.06.2015 um 22:08 schrieb Martin Kersten < martin@monetdb.org>:

Hi

On 25/06/15 17:38, Martin Schwitalla wrote: ...

I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....))

The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution.

regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

...

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl

Martin Schwitalla

30 Jun 30 Jun

11:17 a.m.

Hi, I can`t do that, because the database contains sensitive data. I`m not allowed to give someone else the data. I don`t know if the schema would help… I think of something else…Currently I`m executing the query only once and look at the time. Could it help if I run the query many 10 or 100 times in order, so that the caches and memory can be filled with the right data? I think that this could help the OS to fill it with the needed data. But I don`t see any possibility to run sql statements from a file. Am I`m missing something? Best regards, Martin

...

Am 28.06.2015 um 13:14 schrieb Niels Nes :

On Fri, Jun 26, 2015 at 02:14:18PM +0200, Martin Schwitalla wrote:

...
Well ok, the values for the between statement were wrong. It should be between 189 and 368 :) The result is now right, but the execution time stays the same or is even minimal longer than before. I have uploaded a trace to my dropbox of the query.

https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0 https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0

Martin

Could you send the output of plan, ie plan followed by your query? That should return the relational plan.

It seems the join order is incorrect (not optimal) and MonetDB seems to have some problems with the many 'OR's in your query.

Niels

...
Regards, Martin

Am 26.06.2015 um 13:45 schrieb Martin Schwitalla < MSchwitalla1@gmx.de>:

Hi,

the prefilter makes it faster, but somehow i get three times more tupels in the result than before. How is that possible?

((calls.sample_id BETWEEN 226 AND 369) AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368))) AS anon_2

Best regards, Martin

Am 25.06.2015 um 22:08 schrieb Martin Kersten < martin@monetdb.org>:

Hi

On 25/06/15 17:38, Martin Schwitalla wrote: ...

I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....))

The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution.

regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

...
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list

-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl mailto:Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list https://www.monetdb.org/mailman/listinfo/users-list

Niels Nes

11:25 a.m.

On Tue, Jun 30, 2015 at 01:17:24PM +0200, Martin Schwitalla wrote:

...

Hi,

I can`t do that, because the database contains sensitive data. I`m not allowed to give someone else the data. I don`t know if the schema would help… I need atleast a schema to debug the problem. Could be anonimized etc.

I think of something else…Currently I`m executing the query only once and look at the time. Could it help if I run the query many 10 or 100 times in order, so that the caches and memory can be filled with the right data? I think that this could help the OS to fill it with the needed data. But I don`t see any possibility to run sql statements from a file. Am I`m missing something? mclient < file

Any second run of a similar query on the server should benefit from data allready in memory etc. Niels

...

Best regards, Martin

Am 28.06.2015 um 13:14 schrieb Niels Nes :

On Fri, Jun 26, 2015 at 02:14:18PM +0200, Martin Schwitalla wrote:

Well ok, the values for the between statement were wrong. It should be between 189 and 368 :) The result is now right, but the execution time stays the same or is even minimal longer than before. I have uploaded a trace to my dropbox of the query.

https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0

Martin

Could you send the output of plan, ie plan followed by your query? That should return the relational plan.

It seems the join order is incorrect (not optimal) and MonetDB seems to have some problems with the many 'OR's in your query.

Niels

Regards, Martin

Am 26.06.2015 um 13:45 schrieb Martin Schwitalla < MSchwitalla1@gmx.de>:

Hi,

the prefilter makes it faster, but somehow i get three times more tupels in the result than before. How is that possible?

((calls.sample_id BETWEEN 226 AND 369) AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368))) AS anon_2

Best regards, Martin

Am 25.06.2015 um 22:08 schrieb Martin Kersten < martin@monetdb.org>:

Hi

On 25/06/15 17:38, Martin Schwitalla wrote: ...

I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....))

The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution.

regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

...

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Martin Kersten

11:26 a.m.

Hi Ok. Understood. You always should run the query in cold and hot mode, i.e. run it multiple times within the same session. If necessary, MonetDB will create auxiliary index structures during the first run. You can definitely run SQL commands from a file. In mclient you have the \< option or you use a command line argument. regards, Martin On 30/06/15 13:17, Martin Schwitalla wrote:

...

Hi,

I can`t do that, because the database contains sensitive data. I`m not allowed to give someone else the data. I don`t know if the schema would help…

I think of something else…Currently I`m executing the query only once and look at the time. Could it help if I run the query many 10 or 100 times in order, so that the caches and memory can be filled with the right data? I think that this could help the OS to fill it with the needed data. But I don`t see any possibility to run sql statements from a file. Am I`m missing something?

Best regards, Martin

...
Am 28.06.2015 um 13:14 schrieb Niels Nes mailto:Niels.Nes@cwi.nl>:

On Fri, Jun 26, 2015 at 02:14:18PM +0200, Martin Schwitalla wrote:

...
Well ok, the values for the between statement were wrong. It should be between 189 and 368 :) The result is now right, but the execution time stays the same or is even minimal longer than before. I have uploaded a trace to my dropbox of the query.

https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0

Martin

Could you send the output of plan, ie plan followed by your query? That should return the relational plan.

It seems the join order is incorrect (not optimal) and MonetDB seems to have some problems with the many 'OR's in your query.

Niels

...
Regards, Martin

Am 26.06.2015 um 13:45 schrieb Martin Schwitalla < MSchwitalla1@gmx.de mailto:MSchwitalla1@gmx.de>:

Hi,

the prefilter makes it faster, but somehow i get three times more tupels in the result than before. How is that possible?

((calls.sample_id BETWEEN 226 AND 369) AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368))) AS anon_2

Best regards, Martin

Am 25.06.2015 um 22:08 schrieb Martin Kersten < martin@monetdb.org mailto:martin@monetdb.org>:

Hi

On 25/06/15 17:38, Martin Schwitalla wrote: ...

I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....))

The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution.

regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

...
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098sip:4098@sip.cwi.nl url:https://www.cwi.nl/people/nielse-mail:Niels.Nes@cwi.nl mailto:Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Martin Schwitalla

11:34 a.m.

Hi, I`m currently checking if it would be ok to give out the schema of the database. Does the file have to be in a special format? Because it won’t work for me. I have a file named query_statement with the statement within and mclient does something, but it won’t end and I can’t get any output. Regards, Martin

...

Am 30.06.2015 um 13:26 schrieb Martin Kersten :

Hi

Ok. Understood. You always should run the query in cold and hot mode, i.e. run it multiple times within the same session. If necessary, MonetDB will create auxiliary index structures during the first run.

You can definitely run SQL commands from a file. In mclient you have the \< option or you use a command line argument.

regards, Martin

On 30/06/15 13:17, Martin Schwitalla wrote:

...
Hi,

I can`t do that, because the database contains sensitive data. I`m not allowed to give someone else the data. I don`t know if the schema would help…

I think of something else…Currently I`m executing the query only once and look at the time. Could it help if I run the query many 10 or 100 times in order, so that the caches and memory can be filled with the right data? I think that this could help the OS to fill it with the needed data. But I don`t see any possibility to run sql statements from a file. Am I`m missing something?

Best regards, Martin

...
Am 28.06.2015 um 13:14 schrieb Niels Nes mailto:Niels.Nes@cwi.nl>:

On Fri, Jun 26, 2015 at 02:14:18PM +0200, Martin Schwitalla wrote:

...
Well ok, the values for the between statement were wrong. It should be between 189 and 368 :) The result is now right, but the execution time stays the same or is even minimal longer than before. I have uploaded a trace to my dropbox of the query.

https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0

Martin

Could you send the output of plan, ie plan followed by your query? That should return the relational plan.

It seems the join order is incorrect (not optimal) and MonetDB seems to have some problems with the many 'OR's in your query.

Niels

...
Regards, Martin

Am 26.06.2015 um 13:45 schrieb Martin Schwitalla < MSchwitalla1@gmx.de mailto:MSchwitalla1@gmx.de>:

Hi,

the prefilter makes it faster, but somehow i get three times more tupels in the result than before. How is that possible?

((calls.sample_id BETWEEN 226 AND 369) AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368))) AS anon_2

Best regards, Martin

Am 25.06.2015 um 22:08 schrieb Martin Kersten < martin@monetdb.org mailto:martin@monetdb.org>:

Hi

On 25/06/15 17:38, Martin Schwitalla wrote: ...

I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....))

The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution.

regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

...
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098sip:4098@sip.cwi.nl url:https://www.cwi.nl/people/nielse-mail:Niels.Nes@cwi.nl mailto:Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Martin Schwitalla

11:53 a.m.

Hi, I can give you the schema. Regards, Martin

...

Am 30.06.2015 um 13:26 schrieb Martin Kersten :

Hi

Ok. Understood. You always should run the query in cold and hot mode, i.e. run it multiple times within the same session. If necessary, MonetDB will create auxiliary index structures during the first run.

You can definitely run SQL commands from a file. In mclient you have the \< option or you use a command line argument.

regards, Martin

On 30/06/15 13:17, Martin Schwitalla wrote:

...
Hi,

I can`t do that, because the database contains sensitive data. I`m not allowed to give someone else the data. I don`t know if the schema would help…

I think of something else…Currently I`m executing the query only once and look at the time. Could it help if I run the query many 10 or 100 times in order, so that the caches and memory can be filled with the right data? I think that this could help the OS to fill it with the needed data. But I don`t see any possibility to run sql statements from a file. Am I`m missing something?

Best regards, Martin

...
Am 28.06.2015 um 13:14 schrieb Niels Nes mailto:Niels.Nes@cwi.nl>:

On Fri, Jun 26, 2015 at 02:14:18PM +0200, Martin Schwitalla wrote:

...
Well ok, the values for the between statement were wrong. It should be between 189 and 368 :) The result is now right, but the execution time stays the same or is even minimal longer than before. I have uploaded a trace to my dropbox of the query.

https://www.dropbox.com/s/q2j9g0gsvaylkw6/query_trace_.txt?dl=0

Martin

Could you send the output of plan, ie plan followed by your query? That should return the relational plan.

It seems the join order is incorrect (not optimal) and MonetDB seems to have some problems with the many 'OR's in your query.

Niels

...
Regards, Martin

Am 26.06.2015 um 13:45 schrieb Martin Schwitalla < MSchwitalla1@gmx.de mailto:MSchwitalla1@gmx.de>:

Hi,

the prefilter makes it faster, but somehow i get three times more tupels in the result than before. How is that possible?

((calls.sample_id BETWEEN 226 AND 369) AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368))) AS anon_2

Best regards, Martin

Am 25.06.2015 um 22:08 schrieb Martin Kersten < martin@monetdb.org mailto:martin@monetdb.org>:

Hi

On 25/06/15 17:38, Martin Schwitalla wrote: ...

I added some primary and foreign key constraints and set the database to readonly, but it didn`t do much. The trace tells me that some joins take the most time and some subselects also take much time. But nothing besides that. The query has 10 joins.

... Indeed a complex, probably generated SQL query. The trace information of the expensive joins and thetaselects would be my first target.

An expensive join might typically result if its hash support does not fit in memory, which will cause disk accesses. The predicate (calls.sample_id IN (227, 230, 233,.... and the like probably produce to multiple thetaselect that are merged in the end. It might help (worth trying) to put a range pre-filter (call.sample between 227 and 368 and calls.sample in (227,....))

The fact that minimal-pipe is better then the default-pipe indicates a lot of competition for scarce resources during execution.

regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

...
_______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098sip:4098@sip.cwi.nl url:https://www.cwi.nl/people/nielse-mail:Niels.Nes@cwi.nl mailto:Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org mailto:users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

3542

Age (days ago)

3547

Last active (days ago)

List overview

Download

11 comments

5 participants

participants (5)

Martin Kersten
Martin Kersten
Martin Schwitalla
Niels Nes
Stefan Manegold