Re: MonetDB performance disappointing - users-list - monetdb.org

newer
RE: Concurrent queries problem...

Re: MonetDB performance disappointing

older
Concurrent queries problem

Martin Schwitalla

18 Jul 2015 18 Jul '15

10:52 a.m.

Hi, sorry that it took so long but I could test the patch and it didn`t seem to change anything. The trace looks the same and the performance didn`t improve. Regards, Martin

Reply

Sign in to reply online Use email software

Show replies by date

Martin Schwitalla

17 Aug 17 Aug

11:07 a.m.

New subject: MonetDB performance disappointing

Hi, I`m trying to rewrite my query to improve the performance. For the start: This is the initial query: SELECT anon_1.sample_id AS anon_1_sample_id, anon_1.variant_id AS anon_1_variant_id, anon_1.qual AS anon_1_qual, anon_1.is_heterozygous AS anon_1_is_heterozygous, anon_1.read_depth AS anon_1_read_depth, anon_1.ref_depth AS anon_1_ref_depth, anon_1.alt_depth AS anon_1_alt_depth, anon_1.strand_bias AS anon_1_strand_bias, anon_1.qual_by_depth AS anon_1_qual_by_depth, anon_1.mapping_qual AS anon_1_mapping_qual, anon_1.haplotype_score AS anon_1_haplotype_score, anon_1.mapping_qual_bias AS anon_1_mapping_qual_bias, anon_1.read_pos_bias AS anon_1_read_pos_bias, annotations.variant_id AS annotations_variant_id, annotations.feature_id AS annotations_feature_id, annotations.ref_codon AS annotations_ref_codon, annotations.alt_codon AS annotations_alt_codon, annotations.ref_acid AS annotations_ref_acid, annotations.alt_acid AS annotations_alt_acid, annotations.type AS annotations_type, annotations.region AS annotations_region, annotations.splice_dist AS annotations_splice_dist FROM (SELECT calls.sample_id AS sample_id, calls.variant_id AS variant_id, calls.qual AS qual, calls.is_heterozygous AS is_heterozygous, calls.read_depth AS read_depth, calls.ref_depth AS ref_depth, calls.alt_depth AS alt_depth, calls.strand_bias AS strand_bias, calls.qual_by_depth AS qual_by_depth, calls.mapping_qual AS mapping_qual, calls.haplotype_score AS haplotype_score, calls.mapping_qual_bias AS mapping_qual_bias, calls.read_pos_bias AS read_pos_bias FROM calls JOIN samples ON samples.id = calls.sample_id JOIN patients ON patients.id = samples.patient_id JOIN diseases ON diseases.id = samples.disease_id LEFT OUTER JOIN (SELECT calls.sample_id AS sample_id, calls.variant_id AS variant_id, calls.qual AS qual, calls.is_heterozygous AS is_heterozygous, calls.read_depth AS read_depth, calls.ref_depth AS ref_depth, calls.alt_depth AS alt_depth, calls.strand_bias AS strand_bias, calls.qual_by_depth AS qual_by_depth, calls.mapping_qual AS mapping_qual, calls.haplotype_score AS haplotype_score, calls.mapping_qual_bias AS mapping_qual_bias, calls.read_pos_bias AS read_pos_bias FROM calls WHERE calls.qual > 0 AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368)) AS anon_2 ON anon_2.variant_id = calls.variant_id LEFT OUTER JOIN known ON known.variant_id = calls.variant_id AND known.source_id IN (1, 2, 20, 19, 46) WHERE samples.accession IN ('17041R5', '20195R', '21984R', '23273R', '23390R', '13264R', '18337R', '18533R', '19811R', '20039R', '21776R', '21809R', '22927R', '17294R', '17071R', '21016R', 'Greif1R', '18337T', '18533T', '19811T', '20039T', '21809T', '22927T', '20195T', '21984T', '23273T', '23390T', '17041T', '17294T', '17071T', '21016T', '13264T', '21776T', 'Greif1T1S') AND calls.qual >= 50 AND anon_2.variant_id IS NULL AND (known.variant_id IS NULL OR known.clinical AND (NOT known.clinical_significance = 2) OR known.precious OR known.locus_specific_db)) AS anon_1 JOIN variants ON variants.id = anon_1.variant_id JOIN annotations ON variants.id = annotations.variant_id JOIN features ON features.id = annotations.feature_id JOIN transcripts ON transcripts.id = features.transcript_id JOIN genes ON genes.id = transcripts.gene_id WHERE (variants.is_transition IS NULL OR variants.is_transversion IS NULL OR (variants.is_transition OR variants.is_transversion) AND (anon_1.strand_bias IS NULL OR anon_1.strand_bias <= 60.0) AND (anon_1.qual_by_depth IS NULL OR anon_1.qual_by_depth >= 2.0) AND (anon_1.mapping_qual IS NULL OR anon_1.mapping_qual >= 40.0) AND (anon_1.haplotype_score IS NULL OR anon_1.haplotype_score <= 13.0) AND (anon_1.mapping_qual_bias IS NULL OR anon_1.mapping_qual_bias >= -12.5) AND (anon_1.read_pos_bias IS NULL OR anon_1.read_pos_bias >= -8.0) OR NOT (variants.is_transition OR variants.is_transversion) AND (anon_1.strand_bias IS NULL OR anon_1.strand_bias <= 200.0) AND (anon_1.qual_by_depth IS NULL OR anon_1.qual_by_depth >= 2.0) AND (anon_1.read_pos_bias IS NULL OR anon_1.read_pos_bias >= -20.0)) AND (abs(annotations.splice_dist) <= 10 OR annotations.region = 2 AND (NOT annotations.type = 2) OR annotations.region = 3 OR annotations.region = 1); I have done some things that you suggested as primary/foreign key constraints and even patched the code to push down the selection. But nothing really helped. So I`m trying to rewrite the query and I`m having some problems. I have one set A with special data from the table calls with 18.885.425 tuples. Then I have a control group B from the same table and I want only the values from A that aren`t in B. After that I look in the set C of the table known with 90.066.562 tuples and I want only that values from (A-B) that aren`t in C or have special values in C. Basically it comes down to a simple difference between 3 sets, (A-B)-C. The query generator did this with Left Joins to build a huge table with many null values and selects only the right tupels. The other joins aren`t so important and these left joins take the most amount of time. Do you know a way how to improve the performance of this special operation? I tried it with some EXCEPT operations, but got nothing helpful. Is there any way to do this in MonetDB faster than with the Left Joins? I also tried it with HyperDB and without any improvements from my side it took only 0,7s to run the query. It`s hard to compare to completely different systems, but I think that the join operations in MonetDB aren`t as good. Because the query plans are very similar. So, do you have any ideas? Like last time I can`t give away my data, because it contains informations from patients. Kind regards, Martin

Am 18.07.2015 um 12:52 schrieb Martin Schwitalla :

Hi,

sorry that it took so long but I could test the patch and it didn`t seem to change anything. The trace looks the same and the performance didn`t improve.

Regards, Martin

Reply

Sign in to reply online Use email software

3492

Age (days ago)

3522

Last active (days ago)

Download

1 comments

1 participants

tags

participants (1)

Martin Schwitalla