On 21-10-2007 14:51 Niels Nes wrote:
On Sun, Oct 21, 2007 at 02:33:58PM +0200, Arjen van der Meijden wrote:
Hi list,
I was just doing some basic queries on MonetDB5/SQL to see if it is
suitable for my application, I'm doing lots of aggregates on some
logfile-abstractions. Basically they all boil down to 'how many unique
visitors and total pageviews where there in period X-Y in section Z'.
Which version of M5/SQL are you using?
I downloaded/installed the source tarball 'MonetDB-Mars-SuperBall-SR3'
and installed it using the monetdb-install.sh-script with the
--enable-sql and --enable-optimise switches.
The README in that file sais I have these versions (is there another way
to get those version numbers?):
MonetDB-1.18.2
MonetDB-SQL-2.18.2
MonetDB-client-1.18.2
MonetDB5-server-5.0.0
Its run on top of a virtualized debian linux etch, on top of Xen 3 again
on debian etch.
The base of the system is a Xeon 5310 quad-core, 1.6Ghz machine with 4GB
of memory and 10x 500GB sata disk in raid5 on a sas raid controller. The
VM with this monetdb-install and the postgresql-install I compare it to,
is given two of those cpu-cores and 1.5GB of memory.
At the time of these queries I was the only user on the entire system
(including the other vm's) and as the other results show the data itself
can be read very fast since it all fits easily in memory.
While the query is running, one cpu-core is just occupied 100% with
user-time, there is no system time and no i/o-wait. The disks where idle
as well.
Could you run the same query with trace instead of explain?
That could help identifying the problem.
I did, it took 1780 seconds. I've attached the trace-output, and also
attached the trace-output for a fast (1.4 sec) query.
Slow:
select channelid, count(distinct clientip), count(*)
from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid;
Removing the count(*) its also slow, but I didn't wait for it to finish.
A rewrite to something like this was also slow, but again, I didn't wait
for the result:
select channelid, count(*), sum(views)
from
(
select channelid, clientip, count(*) as views
from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid, clientip
) as f group by channelid
(in postgresql that one is faster due to the fact that their group by
implementation is faster than their distinct)
The fast query:
select channelid, count(clientip) from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid;
Best regards,
Arjen
sql>trace select channelid, count(clientip) from pageviews where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00' group by channelid;
# 2 usec# mdb.setTimer(_2=true)
# 1 usec# 0 0# mdb.setFlow(_2=true)
# 10 usec# 426M 0# _3:bat[:oid,:timestamp] := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _7=0)
# 6 usec# 0 0# _12:bat[:oid,:timestamp] := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _13=1)
# 6 usec# 0 0# constraints.emptySet(_12=bat[:oid,:timestamp]{0})
# 6 usec# 0 0# _12:bat[:oid,:timestamp] := nil;
# 5 usec# 0 0# _14:bat[:oid,:timestamp] := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _15=2)
# 5 usec# 0 0# constraints.emptySet(_14=bat[:oid,:timestamp]{0})
# 4 usec# 0 0# _14:bat[:oid,:timestamp] := nil;
# 5 usec# 106M 0# _19:bat[:oid,:sht] := sql.bind(_4="sys", _5="pageviews", _20="channelid", _7=0)
# 5 usec# 426M 0# _26:bat[:oid,:str] := sql.bind(_4="sys", _5="pageviews", _27="clientip", _7=0)
# 12 usec# 0 0# _8 := calc.timestamp(A0="2007-09-21 00:00:00")
# 5 usec# 0 0# _9 := calc.timestamp(A1="2007-09-22 00:00:00")
#477472 usec# 14M426M# _10 := algebra.uselect(_3=bat[:oid,:timestamp]{55914459}, _8=2007-09-21 00:00:00.000, _9=2007-09-22 00:00:00.000, _11=true, _11=true)
# 14 usec# 0 0# _3:bat[:oid,:timestamp] := nil;
# 18 usec# 14M 14M# _17 := algebra.markT(_10=bat[:oid,:oid]{1916813}, _16=0@0)
# 5 usec# 0 0# _10 := nil;
# 7 usec# 14M 14M# _18 := bat.reverse(_17=bat[:oid,:oid]{1916813})
# 4 usec# 0 0# _17 := nil;
# 17425 usec# 3M121M# _21 := algebra.join(_18=<~tmp_4535>bat[:oid,:oid]{1916813}, _19=bat[:oid,:sht]{55914459})
# 73145 usec# 14M 3M# (ext40,grp38):= group.new(_21=bat[:oid,:sht]{1916813})
# 2023 usec# 0 0# _21 := nil;
# 31 usec# 80200# _24 := bat.mirror(ext40=bat[:oid,:int]{5})
# 6 usec# 0 0# ext40 := nil;
# 31 usec# 80121M# _25 := algebra.joinPath(_24=bat[:oid,:oid]{5}, _18=<~tmp_4535>bat[:oid,:oid]{1916813}, _19=bat[:oid,:sht]{55914459})
# 4 usec# 0 0# _19:bat[:oid,:sht] := nil;
#754188 usec# 14M441M# _28 := algebra.join(_18=<~tmp_4535>bat[:oid,:oid]{1916813}, _26=bat[:oid,:str]{55914459})
# 6937 usec# 0 0# _18 := nil;
# 9 usec# 0 0# _26:bat[:oid,:str] := nil;
# 38470 usec# 80 29M# _29:bat[:oid,:int] := aggr.count_no_nil(_28=bat[:oid,:str]{1916813}, grp38=bat[:oid,:oid]{1916813}, _24=bat[:oid,:oid]{5})
# 21815 usec# 0 0# _28 := nil;
# 7452 usec# 0 0# grp38 := nil;
# 11 usec# 0 0# _24 := nil;
# 11 usec# 0 80# _30 := sql.resultSet(_15=2, _13=1, _25=bat[:oid,:sht]{5})
# 11 usec# 0 80# sql.rsColumn(_30=1, _32="sys.pageviews", _20="channelid", _33="smallint", _34=16, _7=0, _25=bat[:oid,:sht]{5})
# 4 usec# 0 0# _25 := nil;
# 6 usec# 0 80# sql.rsColumn(_30=1, _32="sys.pageviews", _36="count_no_nil_clientip", _37="int", _38=15, _7=0, _29=bat[:oid,:int]{5})
# 5 usec# 0 0# _29:bat[:oid,:int] := nil;
% sys.pageviews, sys.pageviews # table_name
% channelid, count_no_nil_clientip # name
% smallint, int # type
% 1, 7 # length
[ 0, 538187 ]
[ 1, 1108478 ]
[ 4, 42867 ]
[ 3, 145565 ]
[ 2, 81716 ]
# 33 usec# 0 0# sql.exportResult(_30=1, _40="")
#1399855 usec# 0 0# user.s1_1(_5="2007-09-21 00:00:00", _6="2007-09-22 00:00:00")
# 0 0
Timer 1401.374 msec
sql>trace select channelid, count(distinct clientip), count(*)
from pageviews
where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00'
group by channelid;
# 9 usec# mdb.setTimer(_2=true)
# 1 usec# 0 0# mdb.setFlow(_2=true)
# 19 usec# 426M 0# _3:bat[:oid,:timestamp] := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _7=0)
# 6 usec# 0 0# _12:bat[:oid,:timestamp] := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _13=1)
# 5 usec# 0 0# constraints.emptySet(_12=bat[:oid,:timestamp]{0})
# 6 usec# 0 0# _12:bat[:oid,:timestamp] := nil;
# 5 usec# 0 0# _14:bat[:oid,:timestamp] := sql.bind(_4="sys", _5="pageviews", _6="timestamp", _15=2)
# 4 usec# 0 0# constraints.emptySet(_14=bat[:oid,:timestamp]{0})
# 4 usec# 0 0# _14:bat[:oid,:timestamp] := nil;
# 6 usec# 106M 0# _19:bat[:oid,:sht] := sql.bind(_4="sys", _5="pageviews", _20="channelid", _7=0)
# 5 usec# 426M 0# _26:bat[:oid,:str] := sql.bind(_4="sys", _5="pageviews", _27="clientip", _7=0)
# 27 usec# 0 0# _8 := calc.timestamp(A0="2007-09-21 00:00:00")
# 6 usec# 0 0# _9 := calc.timestamp(A1="2007-09-22 00:00:00")
#5670933 usec# 14M426M# _10 := algebra.uselect(_3=bat[:oid,:timestamp]{55914459}, _8=2007-09-21 00:00:00.000, _9=2007-09-22 00:00:00.000, _11=true, _11=true)
# 12 usec# 0 0# _3:bat[:oid,:timestamp] := nil;
# 33 usec# 14M 14M# _17 := algebra.markT(_10=bat[:oid,:oid]{1916813}, _16=0@0)
# 5 usec# 0 0# _10 := nil;
# 14 usec# 14M 14M# _18 := bat.reverse(_17=bat[:oid,:oid]{1916813})
# 5 usec# 0 0# _17 := nil;
# 20206 usec# 3M121M# _21 := algebra.join(_18=<~tmp_4533>bat[:oid,:oid]{1916813}, _19=bat[:oid,:sht]{55914459})
# 73032 usec# 14M 3M# (ext40,grp38):= group.new(_21=bat[:oid,:sht]{1916813})
# 1954 usec# 0 0# _21 := nil;
# 47 usec# 80200# _24 := bat.mirror(ext40=bat[:oid,:int]{5})
# 6 usec# 0 0# ext40 := nil;
# 53 usec# 80121M# _25 := algebra.joinPath(_24=bat[:oid,:oid]{5}, _18=<~tmp_4533>bat[:oid,:oid]{1916813}, _19=bat[:oid,:sht]{55914459})
# 5 usec# 0 0# _19:bat[:oid,:sht] := nil;
#1116473 usec# 14M441M# _28 := algebra.join(_18=<~tmp_4533>bat[:oid,:oid]{1916813}, _26=bat[:oid,:str]{55914459})
# 6845 usec# 0 0# _18 := nil;
# 7 usec# 0 0# _26:bat[:oid,:str] := nil;
# 96203 usec# 14M 14M# (_29,grp55):= group.new(grp38=bat[:oid,:oid]{1916813})
#1774153086 usec# 25M 29M# (_31,grp57):= group.derive(_29=bat[:oid,:int]{5}, grp55=bat[:oid,:oid]{1916813}, _28=bat[:oid,:str]{1916813})
# 21 usec# 0 0# _29 := nil;
# 7605 usec# 0 0# grp55 := nil;
# 15287 usec# 4M 10M# _33 := bat.mirror(_31=bat[:oid,:int]{283056})
# 5978 usec# 0 0# _31 := nil;
#121028 usec# 4M 18M# _34 := algebra.semijoin(_28=bat[:oid,:str]{1916813}, _33=bat[:oid,:oid]{283056})
# 21771 usec# 0 0# _28 := nil;
# 2345 usec# 0 0# _33 := nil;
# 11474 usec# 80 18M# _35:bat[:oid,:int] := aggr.count_no_nil(_34=bat[:oid,:str]{283056}, grp38=bat[:oid,:oid]{1916813}, _24=bat[:oid,:oid]{5})
# 5053 usec# 0 0# _34 := nil;
# 16686 usec# 80 29M# _36:bat[:oid,:int] := aggr.count(grp38=bat[:oid,:oid]{1916813}, grp38=bat[:oid,:oid]{1916813}, _24=bat[:oid,:oid]{5})
# 7407 usec# 0 0# grp38 := nil;
# 9 usec# 0 0# _24 := nil;
# 10 usec# 0 80# _37 := sql.resultSet(_38=3, _13=1, _25=bat[:oid,:sht]{5})
# 12 usec# 0 80# sql.rsColumn(_37=0, _40="sys.pageviews", _20="channelid", _41="smallint", _42=16, _7=0, _25=bat[:oid,:sht]{5})
# 5 usec# 0 0# _25 := nil;
# 6 usec# 0 80# sql.rsColumn(_37=0, _40="sys.pageviews", _44="count_no_nil_clientip", _45="int", _46=15, _7=0, _35=bat[:oid,:int]{5})
# 5 usec# 0 0# _35:bat[:oid,:int] := nil;
# 5 usec# 0 80# sql.rsColumn(_37=0, _40="sys.pageviews", _48="count_channelid", _45="int", _49=32, _7=0, _36=bat[:oid,:int]{5})
# 4 usec# 0 0# _36:bat[:oid,:int] := nil;
% sys.pageviews, sys.pageviews, sys.pageviews # table_name
% channelid, count_no_nil_clientip, count_channelid # name
% smallint, int, int # type
% 1, 6, 7 # length
[ 0, 77524, 538187 ]
[ 1, 110681, 1108478 ]
[ 4, 20120, 42867 ]
[ 3, 44457, 145565 ]
[ 2, 30274, 81716 ]
# 65 usec# 0 0# sql.exportResult(_37=0, _51="")
#1781362757 usec# 0 0# user.s0_1(_5="2007-09-21 00:00:00", _6="2007-09-22 00:00:00")
# 0 0
Timer 1784035.472 msec