- users-list - monetdb.org

Vacuum limitations
by Sébastien RAILLARD (PASSMAN) 24 Nov '15

24 Nov '15

Dear all, I have tried to call the vacuum function using this command: call sys.vacuum('my_schema','my_table'); I have a message telling me "vacuum not allowed on tables with indices". So, if I understand correctly, the table on which vacuum can be done must not have any index, even a primary key? Best regards, Sebastien -- Parc d'activité Tolstoï 4 rue Edouard Aynard 69100 Villeurbanne Tél. +33 (0)4 78 95 05 80 Fax +33 (0)4 78 95 00 17 www.passman.fr www.passman-hotels.com www.passman-camping.com www.passman-sante.com <https://www.facebook.com/PASSMAN-187787814053/>

2 2

err incorrect checksum for freed object
by Lynn Carol Johnson 24 Nov '15

24 Nov '15

Hello - I had a working installation of monetdb on my MAC (OS x 10.9.5). Initially it was installed with the July 2015 release, I then updated to the july2015SP1. With the original version I was able to load a large (10G ) file without problems. For reasons relating to debugging a monetdb install on a red hat machine, I deleted the entries in the DB on my MAC and tried to re-add them. My commands were: > delete from annosites; > select count(*) from annosites; The db showed 0 items as expected. sql>\d annosites CREATE TABLE "testjeff"."annosites" ( "chr" INTEGER, "pos" INTEGER, "hapmap31_total_depth" INTEGER, "hapmap31_num_taxa" SMALLINT, "hapmap31_num_alleles" SMALLINT, "hapmap31_minor_allele_avg_depth" REAL, "hapmap31_minor_allele_avg_phred" REAL, "hapmap31_num_hets" SMALLINT, "hapmap31_ed_factor" REAL, "hapmap31_seg_test_p_value" REAL, "hapmap31_ibd_one_allele" BOOLEAN, "hapmap31_in_local_ld" BOOLEAN, "hapmap31_maf" REAL, "hapmap31_near_indel" BOOLEAN, "hapmap31_first_alt_allele_is_ins_or_del" BOOLEAN, "snpeff40e_effect_hapmap31" CHARACTER LARGE OBJECT, "snpeff40e_effectimpact_hapmap31" CHARACTER LARGE OBJECT, "snpeff40e_functionalclass_hapmap31" CHARACTER LARGE OBJECT, "gerp_neutral_tree_length" REAL, "gerp_score" REAL, "gerp_conserved" BOOLEAN, "mnase_low_minus_high_rpm_shoots" REAL, "mnase_bayes_factor_shoots" REAL, "mnase_hotspot_shoots" BOOLEAN, "mnase_low_minus_high_rpm_roots" REAL, "mnase_bayes_factor_roots" REAL, "mnase_hotspot_roots" BOOLEAN, "within_gene" BOOLEAN, "within_transcript" BOOLEAN, "within_exon" BOOLEAN, "within_cds" BOOLEAN, "within_cds_from_gff3" BOOLEAN, "within_five_prime_utr" BOOLEAN, "within_three_prime_utr" BOOLEAN, "codon_position" SMALLINT, "go_term_accession" CHARACTER LARGE OBJECT, "go_term_name" CHARACTER LARGE OBJECT ); But now, doing a "COPY INTO" continues to either hang with no error in the merovingian.log file , or to fail with the following error in the merovingian.log file. Most often it hangs. I have tried COPY into with and without the "records" option: sql> COPY INTO annosites from '/Users/lcj34/notes_files/machineLearningDB/annoDB_related/siteAnnoNoHdrsCol35Fixed_20151011.txt' USING DELIMITERS '\t','\n'; sql> sql>COPY 61000000 records INTO annosites from '/Users/lcj34/notes_files/machineLearningDB/annoDB_related/siteAnnoNoHdrsCol35Fixed_20151011.txt' USING DELIMITERS '\t','\n'; The merovingian.log file the one time it had an error: 015-11-23 15:02:55 MSG testJeff[752]: # loading sql script: 99_system.sql 2015-11-23 15:15:49 ERR testJeff[752]: mserver5(752,0x112908000) malloc: *** error for object 0x7fbe21a08208: incorrect checksum for freed object - object was probably modified after being freed. 2015-11-23 15:15:49 ERR testJeff[752]: *** set a breakpoint in malloc_error_break to debug 2015-11-23 15:15:50 MSG merovingian[747]: database 'testJeff' (752) was killed by signal SIGABRT Any idea what could be wrong? I tried stopping, then destroying the database and starting it over. This didn't help. I then created a new dbfarm, and new database, connect monetdbd to it, and tried again. While I can create this database of 37 columns, I am unable to load the file that previously was successfully loaded. Is this my a MAC issue? What else can I try on monetdb side? Is there any reason to re-install everything ? I'm still in the testing stage so would not be losing data. Thanks - Lynn

1 2

commit failures with dbfarm on iSCSI LUN
by Roberto Cornacchia 24 Nov '15

24 Nov '15

Hi there, Do you have any experience with running a dbfarm over iSCSI? We have tried to use the NAS in our 1Gbit LAN for our largish daily experiments with MonetDB. It's a very handy setup and seems more suited than NFS. It seems to achieve reasonable performance, but we get quite regularly (though not predictably for now) commit failures during a rather long ETL. We do not get such commit failures when the same db and ETL are run on a local disk. Excerpt from merovingian.log (Jul2015-SP1): 2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: bm_subcommit: commit failed 2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: log_tend: write failed 2015-11-24 11:52:37 ERR trec01[19110]: !FATAL: 40000!COMMIT: transation commit failed (perhaps your disk is full?) exiting (kernel error: !ERROR: GDKsave: error on: name=07/717, ext=theap, mode=1 2015-11-24 11:52:37 ERR trec01[19110]: !OS: Input/output error 2015-11-24 11:52:37 ERR trec01[19110]: ) The disk is most definitely not full, 1.5 TB available (the same works on a local disk with less space available). It looks like iSCSI is the problem (which works perfectly except these random failures). Can you think of any reason why iSCSI could could fail where a real local block device would not? iscsi client (where MonetDB runs): libiscsi 1.11.0 iscsi storage (where the dbfarm is stored): iscsid 2.0-871 The iSCSI LUN is created as regular file with thin provisioning (a file that dynamically grows on the NAS). We haven't tried yet with a fixed-size block-level LUN (trying this today anyway) Hoping someone can have an idea already. Roberto

2 2

String functions support in MonetDB
by Sébastien RAILLARD (PASSMAN) 24 Nov '15

24 Nov '15

Dear all, Is there a way to concatenate more than 2 strings at once? The CONCAT function only allow for concatenating 2 strings. Also, hhe JDBC driver for MonetDB reports that the following string functions are supported (see the list below). Are these functions documented somewhere? ascii char_length character_length code concat copyfrom difference editdistance editdistance2 get_value_for ilike index insert lcase left length levenshtein like locate lower lpad ltrim next_value_for not_ilike not_like octet_length patindex qgramnormalize repeat replace restart right rpad rtrim similarity soundex space splitpart strings substring trim truncate ucase upper Best regards, Sebastien -- Parc d'activité Tolstoï 4 rue Edouard Aynard 69100 Villeurbanne Tél. +33 (0)4 78 95 05 80 Fax +33 (0)4 78 95 00 17 www.passman.fr www.passman-hotels.com www.passman-camping.com www.passman-sante.com <https://www.facebook.com/PASSMAN-187787814053/>

5 7

Making RPM packages for CentOS
by Sébastien RAILLARD (PASSMAN) 24 Nov '15

24 Nov '15

Dear all, Using a CentOS 6.7 x64 system, I was able to compile and make the RPM packages using the command "make rpm". The created packages were correctly installed and are working on the CentOS 6.7 x64 system. As I understand, the creation of the RPM packages process was made for Fedora systems: is-there any problem or limitation to be expected to create packages using "make rpm" for CentOS 6.x? Best regards, Sebastien -- Parc d'activité Tolstoï 4 rue Edouard Aynard 69100 Villeurbanne Tél. +33 (0)4 78 95 05 80 Fax +33 (0)4 78 95 00 17 www.passman.fr www.passman-hotels.com www.passman-camping.com www.passman-sante.com <https://www.facebook.com/PASSMAN-187787814053/>

2 3

installing on redHat
by Lynn Carol Johnson 23 Nov '15

23 Nov '15

Hi all - I'm attempting to install on Redhat release 7.1 (Maipo) following the instructions here: https://www.monetdb.org/downloads/epel/ I have successfully executed : yum install http://dev.monetdb.org/downloads/epel/MonetDB-release-epel-1.1-1.monetdb.no… And rpm --import http://dev.monetdb.org/downloads/MonetDB-GPG-KEY But installing the server and client fail with the following message: [root@aztec Monetdb]# yum install MonetDB-SQL-server5 MonetDB-client Loaded plugins: langpacks, product-id, rhnplugin, subscription-manager This system is receiving updates from RHN Classic or Red Hat Satellite. http://dev.monetdb.org/downloads/epel/7Server/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found Trying other mirror. One of the configured repositories failed (MonetDB 7Server - x86_64), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is most often useful if you are using a newer distribution release than is supported by the repository (and the packages for the previous distribution release still work). 3. Disable the repository, so yum won't use it by default. Yum will then just ignore the repository until you permanently enable it again or use --enablerepo for temporary usage: yum-config-manager --disable monetdb 4. Configure the failing repository to be skipped, if it is unavailable. Note that yum will try to contact the repo. when it runs most commands, so will have to try and fail each time (and thus. yum will be be much slower). If it is a very temporary problem though, this is often a nice compromise: yum-config-manager --save --setopt=monetdb.skip_if_unavailable=true failure: repodata/repomd.xml from monetdb: [Errno 256] No more mirrors to try. http://dev.monetdb.org/downloads/epel/7Server/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found [root@aztec Monetdb]# When attempting to find this site from a browser I am unsuccessful. Should the repomd.xml file be grabbed from elsewhere? Any suggestions? Thanks - Lynn

3 3

MonetDBLite issue: Compute() function not working
by Iain Wallace 23 Nov '15

23 Nov '15

Hi, Firstly, thanks for the wonderful MonetDBLite package. It certainly speeds up my analysis! I came acrros an issue that I thought you might want to be aware of. The compute() function doesn't appear to be working with MonetDBLite while collect() does Building from the example https://www.monetdb.org/blog/monetdblite-r library(MonetDB.R) ms <- src_monetdb(embedded=dbdir) x<-table(ms,"mtcars") y<-x%>%filter(gear==4) dim(y) [1] 12 11 compute(y) Source: MonetDB () From: lmsomncnjj [0 x 11] Error: n not greater than 0L I wasn't sure where to post this, so have also posted the issue on dplyr's github page: https://github.com/hadley/dplyr/issues/1533 Thanks Iain

3 4

MonetDB Remote / Merge ( Distributed Query Processing ) Bug
by Brian Hood 19 Nov '15

19 Nov '15

Hi All, It seems when i run my SQL in sequence i can repeated break things with a MAL error. If this isn't a bug please can you point me in the right direction but looks like one with me. Master has 4 remote tables and a MERGE table ippacket with all the remote table added. All mserver5 instances are the same version / release. Welcome to mclient, the MonetDB/SQL interactive terminal (Jul2015-SP1) Database: MonetDB v11.21.11 (Jul2015-SP1), 'mapi:monetdb://mdb-master-01:50000/threatmonitor' Type \q to quit, \? for a list of available commands auto commit mode: on sql>\d MERGE TABLE threatmonitor.ippacket REMOTE TABLE threatmonitor.wifi_ippacket REMOTE TABLE threatmonitor.wifi_ippacket2 REMOTE TABLE threatmonitor.wifi_ippacket3 REMOTE TABLE threatmonitor.wifi_ippacket4 sql>select count(*) from ippacket limit 10; +------+ | L1 | +======+ | 951 | +------+ 1 tuple (44.499ms) sql>select guid, recv_date, ip_dst from ippacket limit 10 more>; (mapi:monetdb://monetdb@172.17.0.5/threatmonitor) 'user.l4' undefined in: (rmt3424_X_3_bat_oid_str:bat[:oid,:str],rmt3425_X_6_bat_oid_str:bat[:oid,:str],rmt3427_X_7_bat_oid_str:bat[:oid,:str]) := user.l4(); The second time i run the same query and its absolutely fine. sql>select guid, recv_date, ip_dst from ippacket limit 10 more>; +--------------------------------------+---------------------------+----------------+ | guid | recv_date | ip_dst | +======================================+===========================+================+ | 5f4c9bcb-e4b7-0af1-2919-fdef5623d20a | 2015-07-20 19:52:45 +0000 | 10.130.19.42 | | 2238c353-33cd-5174-31b6-ca57b4decade | 2015-07-20 19:52:45 +0000 | 216.58.209.227 | | 8ac3dde3-92ce-ff27-f6b1-f296b0d139f5 | 2015-07-20 19:52:45 +0000 | 10.130.19.42 | | 7a204771-8623-e498-5f45-ba1c58adb3da | 2015-07-20 19:52:45 +0000 | 85.25.200.151 | | 496ab33b-3ae9-56b5-9108-209413dcddeb | 2015-07-20 19:52:45 +0000 | 85.25.200.151 | | a1537072-8545-74eb-3b7e-9615a4c534ac | 2015-07-20 19:52:45 +0000 | 10.130.19.42 | | 91124c5a-0854-7f21-e60a-f175e45d68d1 | 2015-07-20 19:52:45 +0000 | 10.130.19.42 | | acefc01b-6ef8-6197-f5c0-27194e5749db | 2015-07-20 19:52:45 +0000 | 85.25.200.151 | | f0694ee8-2e7f-e019-a96e-450489ae8f37 | 2015-07-20 19:52:45 +0000 | 10.130.19.42 | | 78c246a9-eba2-79b5-fb87-59609f44a7d9 | 2015-07-20 19:52:45 +0000 | 85.25.200.151 | +--------------------------------------+---------------------------+----------------+ 10 tuples (63.443ms) sql> I've attached my example schema. Regards, Brian Hood

1 1

MonetDB writing to disk?!
by Martin Schwitalla 19 Nov '15

19 Nov '15

Hi, I´m running some analysis on MonetDB and have quite bad results with MonetDB. At first I thought it would be the fault of the Operator-at-a-Time Modell. I could replicate the behaviour that predicates on more columns resulted in a linear growing in execution times. This is a direct consequence of the Operator-at-a-Time Modell. But know i realized that the behaviour heavily depends on the used disk in the system. If I have the database on my primary disk storage I get quite good results. But because the primary disk storage is quite limited in our experimental PC I have worked on an NAS with 11TB. When I launch the same query on the NAS the query doesn`t take 0,4s, but 25s! After further testing I could see with iotop that the mserver is constantly writing something to the disks. But why is that so? I`m running Ubuntu 14.04 64bit, with 2 Intel Xeon E5-2690 CPUs and 62GB RAM. I have installed the TPC-H benchmark and running select count(*) from lineitem where l_orderkey%2=0; on my primary disk storage took about 400ms. On the external disk storage it took about 23s. The table contains 149996355 tuple and I created the TPC-H database with a scaling factor of 25. While running this query SELECT anon_1.sample_id AS anon_1_sample_id, anon_1.variant_id AS anon_1_variant_id, anon_1.qual AS anon_1_qual, anon_1.is_heterozygous AS anon_1_is_heterozygous, anon_1.read_depth AS anon_1_read_depth, anon_1.ref_depth AS anon_1_ref_depth, anon_1.alt_depth AS anon_1_alt_depth, anon_1.strand_bias AS anon_1_strand_bias, anon_1.qual_by_depth AS anon_1_qual_by_depth, anon_1.mapping_qual AS anon_1_mapping_qual, anon_1.haplotype_score AS anon_1_haplotype_score, anon_1.mapping_qual_bias AS anon_1_mapping_qual_bias, anon_1.read_pos_bias AS anon_1_read_pos_bias, annotations.variant_id AS annotations_variant_id, annotations.feature_id AS annotations_feature_id, annotations.ref_codon AS annotations_ref_codon, annotations.alt_codon AS annotations_alt_codon, annotations.ref_acid AS annotations_ref_acid, annotations.alt_acid AS annotations_alt_acid, annotations.type AS annotations_type, annotations.region AS annotations_region, annotations.splice_dist AS annotations_splice_dist FROM (SELECT calls.sample_id AS sample_id, calls.variant_id AS variant_id, calls.qual AS qual, calls.is_heterozygous AS is_heterozygous, calls.read_depth AS read_depth, calls.ref_depth AS ref_depth, calls.alt_depth AS alt_depth, calls.strand_bias AS strand_bias, calls.qual_by_depth AS qual_by_depth, calls.mapping_qual AS mapping_qual, calls.haplotype_score AS haplotype_score, calls.mapping_qual_bias AS mapping_qual_bias, calls.read_pos_bias AS read_pos_bias FROM calls JOIN samples ON samples.id = calls.sample_id JOIN patients ON patients.id = samples.patient_id JOIN diseases ON diseases.id = samples.disease_id LEFT OUTER JOIN (SELECT calls.sample_id AS sample_id, calls.variant_id AS variant_id, calls.qual AS qual, calls.is_heterozygous AS is_heterozygous, calls.read_depth AS read_depth, calls.ref_depth AS ref_depth, calls.alt_depth AS alt_depth, calls.strand_bias AS strand_bias, calls.qual_by_depth AS qual_by_depth, calls.mapping_qual AS mapping_qual, calls.haplotype_score AS haplotype_score, calls.mapping_qual_bias AS mapping_qual_bias, calls.read_pos_bias AS read_pos_bias FROM calls WHERE calls.qual > 0 AND calls.sample_id IN (227, 230, 233, 234, 237, 190, 195, 198, 199, 203, 270, 273, 276, 189, 343, 366, 367, 368)) AS anon_2 ON anon_2.variant_id = calls.variant_id LEFT OUTER JOIN known ON known.variant_id = calls.variant_id AND known.source_id IN (1, 2, 20, 19, 46) WHERE samples.accession IN ('17041R5', '20195R', '21984R', '23273R', '23390R', '13264R', '18337R', '18533R', '19811R', '20039R', '21776R', '21809R', '22927R', '17294R', '17071R', '21016R', 'Greif1R', '18337T', '18533T', '19811T', '20039T', '21809T', '22927T', '20195T', '21984T', '23273T', '23390T', '17041T', '17294T', '17071T', '21016T', '13264T', '21776T', 'Greif1T1S') AND calls.qual >= 50 AND anon_2.variant_id IS NULL AND (known.variant_id IS NULL OR known.clinical AND (NOT known.clinical_significance = 2) OR known.precious OR known.locus_specific_db)) AS anon_1 JOIN variants ON variants.id = anon_1.variant_id JOIN annotations ON variants.id = annotations.variant_id JOIN features ON features.id = annotations.feature_id JOIN transcripts ON transcripts.id = features.transcript_id JOIN genes ON genes.id = transcripts.gene_id WHERE (variants.is_transition IS NULL OR variants.is_transversion IS NULL OR (variants.is_transition OR variants.is_transversion) AND (anon_1.strand_bias IS NULL OR anon_1.strand_bias <= 60.0) AND (anon_1.qual_by_depth IS NULL OR anon_1.qual_by_depth >= 2.0) AND (anon_1.mapping_qual IS NULL OR anon_1.mapping_qual >= 40.0) AND (anon_1.haplotype_score IS NULL OR anon_1.haplotype_score <= 13.0) AND (anon_1.mapping_qual_bias IS NULL OR anon_1.mapping_qual_bias >= -12.5) AND (anon_1.read_pos_bias IS NULL OR anon_1.read_pos_bias >= -8.0) OR NOT (variants.is_transition OR variants.is_transversion) AND (anon_1.strand_bias IS NULL OR anon_1.strand_bias <= 200.0) AND (anon_1.qual_by_depth IS NULL OR anon_1.qual_by_depth >= 2.0) AND (anon_1.read_pos_bias IS NULL OR anon_1.read_pos_bias >= -20.0)) AND (abs(annotations.splice_dist) <= 10 OR annotations.region = 2 AND (NOT annotations.type = 2) OR annotations.region = 3 OR annotations.region = 1); I could see that the mserver is constantly writing to the disks. So why is MonetDB constantly writing to the disks, even when the query is quite simpel like the one TPC-H query above? The data should fit into the memory without problems. Any ideas or suggestions? Kind Regards, Martin

2 3

Using multiple COPY INTO inside a transaction from Java
by Gerardo Blanco 18 Nov '15

18 Nov '15

Hi, We are already using the COPY INTO bulk insert from Java via BufferedMCLWriter, as you suggest in your recipes book<https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData>, and based in the example<http://dev.monetdb.org/hg/MonetDB/file/tip/java/example/SQLcopyinto.java> you provided. I understood that each COPY INTO behaves as a single transaction. My question now is if there is a way to group two or more COPY INTO and make them behave as a single transaction. Thank you Gerardo

3 2