speed query with 2 columns group by

Hello, I have an query with the same source data on Jan2014 -SP3 version of monetddb and Oct2014 - SP1 : CREATE TABLE tmp __pierre5 AS ( SELECT id_unique, Device , COUNT (*) AS NBVISISTES FROM donnees_lmd_pierre GROUP BY id_unique, Device ) WITH DATA; On version Jan2014 -SP3 : Welcome to mclient , the MonetDB / SQL interactive terminal (Jan2014-SP3) Database : MonetDB v11.17.21 (Jan2014-SP3), 'mapi : monetdb :// lemondedev :50000/ lemonde' Type \q to quit , \? for a list of available commands auto commit mode: on sql > CREATE TABLE tmp __pierre5 AS ( SELECT id_unique, Device , COUNT (*) AS NBVISISTES FROM donnees _ lmd _pierre GROUP BY id_unique, Device ) WITH DATA; operation successful (6m 30s) On version Oct2014-SP1 : Welcome to mclient , the MonetDB / SQL interactive terminal (Oct2014-SP1) Database : MonetDB v11.19.7 (Oct2014-SP1), 'mapi : monetdb :// monetdb :50000/ lemonde' Type \q to quit , \? for a list of available commands auto commit mode: on sql > CREATE TABLE tmp __pierre5 AS ( SELECT id_unique, Device , COUNT (*) AS NBVISISTES FROM donnees _ lmd _pierre GROUP BY id_unique, Device ) WITH DATA; operation successful (383m 20s) explain query is different , you want to explain the results of the query ? Pierre -- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35

I tested on two other server with the same data. I have the same bug
Server 3 : Jan2014-SP2
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP2)
Database: MonetDB v11.17.17 (Jan2014-SP2), 'mapi:monetdb://rd:50000/lemonde'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>CREATE TABLE tmp__pierre5 AS ( SELECT id_unique, Device, COUNT(*) AS NBVISISTES FROM donnees_lmd_pierre GROUP BY id_unique, Device ) WITH DATA;
operation successful (9m 4s)
Server 4 : Oct2014-SP1
Welcome to mclient, the MonetDB/SQL interactive terminal (Oct2014-SP1)
Database: MonetDB v11.19.7 (Oct2014-SP1), 'mapi:monetdb://poledev:50000/lemonde'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>CREATE TABLE tmp__pierre5 AS ( SELECT id_unique, Device, COUNT(*) AS NBVISISTES FROM donnees_lmd_pierre GROUP BY id_unique, Device ) WITH DATA;
The query runs from 10 hours
is what you want to dump the table for download ?
Pierre
--
1G6
52 route de bischwiller
67300 Schiltigheim
Société de Services et de Formations en Logiciels Libres
http://1g6.biz
Tél : 06 64 63 70 35
----- Mail original -----
De: "Pierre-Adrien Coustillas"

I do not understand ...
I thought of bad data in my table.
I run the query several times and It Works
Welcome to mclient, the MonetDB/SQL interactive terminal (Oct2014-SP1)
Database: MonetDB v11.19.7 (Oct2014-SP1), 'mapi:monetdb://monetdb:50000/lemonde'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>SELECT DISTINCT Device FROM donnees_lmd_pierre ;
+----------------------+
| device |
+======================+
| Application Tablette |
| Application Mobile |
| Web/Tablette |
| Web/Ordinateur |
| Web/Mobile |
| |
+----------------------+
6 tuples (10.4s)
sql>CREATE TABLE tmp__pierre5 AS SELECT id_unique,'Application Tablette' as Device,count(*) FROM donnees_lmd_pierre WHERE Device ='Application Tablette' GROUP BY id_unique WITH DATA;
operation successful (4.8s)
sql>INSERT INTO tmp__pierre5 SELECT id_unique,'Application Mobile' as Device,count(*) FROM donnees_lmd_pierre WHERE Device ='Application Mobile' GROUP BY id_unique;
13724181 affected rows (27.8s)
sql>INSERT INTO tmp__pierre5 SELECT id_unique,'Web/Tablette' as Device,count(*) FROM donnees_lmd_pierre WHERE Device ='Web/Tablette' GROUP BY id_unique;
8491961 affected rows (8.4s)
sql>INSERT INTO tmp__pierre5 SELECT id_unique,'Web/Ordinateur' as Device,count(*) FROM donnees_lmd_pierre WHERE Device ='Web/Ordinateur' GROUP BY id_unique;
43188751 affected rows (46.0s)
sql>INSERT INTO tmp__pierre5 SELECT id_unique,'Web/Mobile' as Device,count(*) FROM donnees_lmd_pierre WHERE Device ='Web/Mobile' GROUP BY id_unique;
3587951 affected rows (3.5s)
sql>INSERT INTO tmp__pierre5 SELECT id_unique,'' as Device,count(*) FROM donnees_lmd_pierre WHERE Device ='' GROUP BY id_unique;
5327969 affected rows (2.8s)
But this is not a good solution is a patch
--
1G6
52 route de bischwiller
67300 Schiltigheim
Société de Services et de Formations en Logiciels Libres
http://1g6.biz
Tél : 06 64 63 70 35
----- Mail original -----
De: "Pierre-Adrien Coustillas"

Hello, I'm running the latest version (Oct2014-SP1) installed from binary RPMs on Fedora 20, and have just gotten a segmentation violation. The server had been running for about 10 hours, doing an msqldump of a table with about 3x10^9 records. Unfortunately for me the server crashed after about 2x10^9. There is nothing useful in the merovingian log: just a single line that says: database 'p0hybs2014' (9669) was killed by signal SIGSEGV I'll try the msqldump command again overnight. If there is something I can do that would help gather more information, just let me know! Tim

I ran the msqldump operation again and the server crashed at exactly the
same place, which is to say, about 2/3 through the table, after
outputting exactly the same number of lines.
Some more information about the crash. This is a backtrace:
#0 0x0000003f27da4577 in strCmp () from /lib64/libbat.so.11
#1 0x0000003f2773a8dc in TABLEToutput_file () from /lib64/libmonetdb5.so.17
#2 0x00007f212dbe8168 in mvc_export_table.isra () from
/usr/lib64/monetdb5/lib_sql.so
#3 0x00007f212dbed587 in mvc_export_chunk () from
/usr/lib64/monetdb5/lib_sql.so
#4 0x00007f212dbc955d in SQLparser () from /usr/lib64/monetdb5/lib_sql.so
#5 0x0000003f27661649 in runScenarioBody () from /lib64/libmonetdb5.so.17
#6 0x0000003f2766226d in runScenario () from /lib64/libmonetdb5.so.17
#7 0x0000003f27662340 in MSserveClient () from /lib64/libmonetdb5.so.17
#8 0x0000003f27e0fb1f in thread_starter () from /lib64/libbat.so.11
#9 0x00000035cb207ee5 in start_thread () from /lib64/libpthread.so.0
#10 0x00000035caaf4b8d in clone () from /lib64/libc.so.6
Register contents:
rax 0x0 0
rbx 0xbb9a3067f37e0 3300334583625696
rcx 0x3f2816b380 271255516032
rdx 0x80 128
rsi 0xbb9a3067f37e0 3300334583625696
rdi 0x3f27f44880 271253260416
rbp 0xc 0xc
rsp 0x7f212cc6ca18 0x7f212cc6ca18
r8 0x3f2816aae0 271255513824
r9 0x8a0 2208
r10 0x3f27f44880 271253260416
r11 0x35cab74bf0 231034276848
r12 0x4 4
r13 0x2000 8192
r14 0x7f2120576b68 139780253248360
r15 0x7f2120578060 139780253253728
rip 0x3f27da4577 0x3f27da4577

On 14/12/14 11:41, Tim Burress wrote:
I ran the msqldump operation again and the server crashed at exactly the same place, which is to say, about 2/3 through the table, after outputting exactly the same number of lines.
Some more information about the crash. This is a backtrace:
#0 0x0000003f27da4577 in strCmp () from /lib64/libbat.so.11 #1 0x0000003f2773a8dc in TABLEToutput_file () from /lib64/libmonetdb5.so.17 #2 0x00007f212dbe8168 in mvc_export_table.isra () from /usr/lib64/monetdb5/lib_sql.so #3 0x00007f212dbed587 in mvc_export_chunk () from /usr/lib64/monetdb5/lib_sql.so #4 0x00007f212dbc955d in SQLparser () from /usr/lib64/monetdb5/lib_sql.so #5 0x0000003f27661649 in runScenarioBody () from /lib64/libmonetdb5.so.17 #6 0x0000003f2766226d in runScenario () from /lib64/libmonetdb5.so.17 #7 0x0000003f27662340 in MSserveClient () from /lib64/libmonetdb5.so.17 #8 0x0000003f27e0fb1f in thread_starter () from /lib64/libbat.so.11 #9 0x00000035cb207ee5 in start_thread () from /lib64/libpthread.so.0 #10 0x00000035caaf4b8d in clone () from /lib64/libc.so.6
Register contents:
rax 0x0 0 rbx 0xbb9a3067f37e0 3300334583625696 rcx 0x3f2816b380 271255516032 rdx 0x80 128 rsi 0xbb9a3067f37e0 3300334583625696 rdi 0x3f27f44880 271253260416 rbp 0xc 0xc rsp 0x7f212cc6ca18 0x7f212cc6ca18 r8 0x3f2816aae0 271255513824 r9 0x8a0 2208 r10 0x3f27f44880 271253260416 r11 0x35cab74bf0 231034276848 r12 0x4 4 r13 0x2000 8192 r14 0x7f2120576b68 139780253248360 r15 0x7f2120578060 139780253253728 rip 0x3f27da4577 0x3f27da4577
eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 st0 0 (raw 0x00000000000000000000) st1 0 (raw 0x00000000000000000000) st2 0 (raw 0x00000000000000000000) st3 0 (raw 0x00000000000000000000) st4 0 (raw 0x00000000000000000000) st5 0 (raw 0x00000000000000000000) st6 0 (raw 0x00000000000000000000) st7 0 (raw 0x00000000000000000000) fctrl 0x37f 895 fstat 0x0 0 ftag 0xffff 65535 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 mxcsr 0x1fa0 [ PE IM DM ZM OM UM PM ] This is disassembly of the code from strCmp(), though of course the origin of the problem will probably be higher in the call chain:
0x0000003f27da4500 <+0>: test %rdi,%rdi 0x0000003f27da4503 <+3>: je 0x3f27da4570
0x0000003f27da4505 <+5>: movzbl (%rdi),%edx 0x0000003f27da4508 <+8>: cmp $0x80,%dl 0x0000003f27da450b <+11>: je 0x3f27da4570 0x0000003f27da450d <+13>: test %rsi,%rsi 0x0000003f27da4510 <+16>: je 0x3f27da4580 0x0000003f27da4512 <+18>: movzbl (%rsi),%ecx 0x0000003f27da4515 <+21>: mov $0x1,%eax 0x0000003f27da451a <+26>: cmp $0x80,%cl 0x0000003f27da451d <+29>: je 0x3f27da4568 0x0000003f27da451f <+31>: cmp %dl,%cl 0x0000003f27da4521 <+33>: mov $0xffffffff,%eax 0x0000003f27da4526 <+38>: ja 0x3f27da4568 0x0000003f27da4528 <+40>: mov $0x1,%eax 0x0000003f27da452d <+45>: jb 0x3f27da4568 0x0000003f27da452f <+47>: jne 0x3f27da4568 0x0000003f27da4531 <+49>: test %cl,%cl 0x0000003f27da4533 <+51>: jne 0x3f27da4544 0x0000003f27da4535 <+53>: jmp 0x3f27da4560 0x0000003f27da4537 <+55>: nopw 0x0(%rax,%rax,1) 0x0000003f27da4540 <+64>: test %al,%al 0x0000003f27da4542 <+66>: je 0x3f27da4560 0x0000003f27da4544 <+68>: add $0x1,%rdi 0x0000003f27da4548 <+72>: add $0x1,%rsi 0x0000003f27da454c <+76>: movzbl (%rdi),%eax 0x0000003f27da454f <+79>: cmp (%rsi),%al 0x0000003f27da4551 <+81>: je 0x3f27da4540 0x0000003f27da4553 <+83>: sbb %eax,%eax 0x0000003f27da4555 <+85>: or $0x1,%eax 0x0000003f27da4558 <+88>: retq 0x0000003f27da4559 <+89>: nopl 0x0(%rax) 0x0000003f27da4560 <+96>: xor %eax,%eax 0x0000003f27da4562 <+98>: nopw 0x0(%rax,%rax,1) 0x0000003f27da4568 <+104>: repz retq 0x0000003f27da456a <+106>: nopw 0x0(%rax,%rax,1) 0x0000003f27da4570 <+112>: test %rsi,%rsi 0x0000003f27da4573 <+115>: je 0x3f27da4560 0x0000003f27da4575 <+117>: xor %eax,%eax => 0x0000003f27da4577 <+119>: cmpb $0x80,(%rsi) 0x0000003f27da457a <+122>: setne %al 0x0000003f27da457d <+125>: neg %eax 0x0000003f27da457f <+127>: retq 0x0000003f27da4580 <+128>: mov $0x1,%eax 0x0000003f27da4585 <+133>: retq I'm not sure how MonetDB works its way through the table to produce the msqldump output, so it's hard to guess which records it is working on at the time, but the crash always occurs after writing 2,147,749,019 msqldump records. Just using SELECT on, say, the next 20,000 records after that displays them with no problem, so it seems like the issue may be specific to msqldump.
This is what appeared in the system log at the time of the crash:
mserver5[20195] general protection ip:3f27da4577 sp:7f212cc6ca18 error:0 in libbat.so.11.0.3[3f27c00000+369000] mserver5[20195] general protection ip:3f27da4577 sp:7f212cc6ca18 error:0 in libbat.so.11.0.3[3f27c00000+369000]
The machine is a 64-bit system with 32GB of ECC memory. There is no indication of a memory fault.
What I thought I'd try to do tonight is recompile the server with debug options and run it in valgrind, but I'm not sure what the best build options are. If you have any ideas for what I could/should do to help find this problem, just let me know.
Probably best is to configure with --enable-debug --enable-assert --disable-optimize. And maybe running it under gdb (or attach gdb to the server) to catch the crash would already be enough. That'll certainly be faster than running under valgrind.
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender

Well, since I wasn't sure what else to do I ran the server under valgrind and got a fault after about an hour, but it doesn't seem to be in the same routine. Maybe it will be interesting in any case, though. ==5988== Invalid read of size 8 ==5988== at 0xFAD78F4: delta_bind_bat (bat_storage.c:169) ==5988== by 0xFAD7A6D: bind_col (bat_storage.c:186) ==5988== by 0xF9921B4: mvc_bind (sql.c:1607) ==5988== by 0xF992957: mvc_bind_wrap (sql.c:1683) ==5988== by 0x4C66559: runMALsequence (mal_interpreter.c:651) ==5988== by 0x4C6AFEF: DFLOWworker (mal_dataflow.c:362) ==5988== by 0x35CB207EE4: start_thread (in /usr/lib64/libpthread-2.18.so) ==5988== by 0x35CAAF4B8C: clone (in /usr/lib64/libc-2.18.so) ==5988== Address 0x18 is not stack'd, malloc'd or (recently) free'd I looked at the code but am not sure what's going on, so rather than me guessing I hope this is of some value. By the way, when I rebuilt the server I used --enable-debug --enable-assert --with-valgrind but did not use --disable-optimize, so maybe this is suspect. I'll rebuild without optimization and have another go. Tim

So, the results are the same without optimization. My initial thought is that this is a different problem and the server simply didn't reach the point where the strCmp() problem occurred. Sorry for the bad news! Tim On 12/14/2014 09:30 PM, Tim Burress wrote:
Well, since I wasn't sure what else to do I ran the server under valgrind and got a fault after about an hour, but it doesn't seem to be in the same routine. Maybe it will be interesting in any case, though.
==5988== Invalid read of size 8 ==5988== at 0xFAD78F4: delta_bind_bat (bat_storage.c:169) ==5988== by 0xFAD7A6D: bind_col (bat_storage.c:186) ==5988== by 0xF9921B4: mvc_bind (sql.c:1607) ==5988== by 0xF992957: mvc_bind_wrap (sql.c:1683) ==5988== by 0x4C66559: runMALsequence (mal_interpreter.c:651) ==5988== by 0x4C6AFEF: DFLOWworker (mal_dataflow.c:362) ==5988== by 0x35CB207EE4: start_thread (in /usr/lib64/libpthread-2.18.so) ==5988== by 0x35CAAF4B8C: clone (in /usr/lib64/libc-2.18.so) ==5988== Address 0x18 is not stack'd, malloc'd or (recently) free'd
I looked at the code but am not sure what's going on, so rather than me guessing I hope this is of some value.
By the way, when I rebuilt the server I used --enable-debug --enable-assert --with-valgrind but did not use --disable-optimize, so maybe this is suspect. I'll rebuild without optimization and have another go.
Tim
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Pierre-Adrien Coustillas
-
Sjoerd Mullender
-
Tim Burress