possible to improve paging performance on the windows version of monetdb?

11 Mar 2014

      hi monetdb team, could disk paging performance be improved on windows?  a
query run without versus with any disk paging is kinda like the hulk versus
plain old bruce banner.  ;)

==background==

i'm working on a longer tutorial for http://asdfree.com/ about how to work
with census microdata from 74 different countries (yes, 74!) with R and
monetdb.  i'm planning to release a blog post specifically for the
brazilian microdata and then expand that code to work on the other 73
countries https://international.ipums.org/international-action/countries.
both brazilian census bureau employees and outside users have taken a
serious interest in this work, with official links from
onehttp://www.ibge.gov.br/home/estatistica/populacao/trabalhoerendimento/pnad20...,
twohttp://www.ibge.gov.br/home/estatistica/indicadores/trabalhoerendimento/pme_...,
three http://www.ibge.gov.br/home/xml/pof_2008_2009.shtm of their
microdata pages.

r and monetdb are obviously great choices for analysts who don't have the
resources for expensive computers and/or software.  a former employee at
insee (the french stat. bureau) who currently consults on international
demography projects outlined this problem for me pretty clearly:
lower-resource statistical agencies in the developing world need a good,
free option like (R + monetdb) for analysis of larger data sets on the
(generally cheaper) hardware that they already have.

international census bureau employees and other demographers are much more
likely to have statistical but not computer programming skills, so as i
work on these instructional documents, i'll aim to make the computing side
of things as easy as possible.  and, most central to this e-mail, much of
this new potential audience uses windows.  (on march 10th,
2014http://cran-logs.rstudio.com/2014/2014-03-10.csv.gz,
67% of MonetDB.R package downloads were windows users)

==problem==

certain sql commands - it's hard for me to pin down which ones - tend to
hang.  i will run a command, look at my "process manager" and see RAM start
getting gobbled up (not by mserver.exe but by some invisible process--i
think disk paging?), and then wait.  mserver.exe will use between 0% and 1%
of the total CPU for a few minutes, then something will "wake up" and
suddenly the CPU starts lighting up.. with 70%-90% usage across multiple
cores.  this 1% CPU usage to 75% CPU usage can jump back and forth a few
times until the full command finally finishes.

all of the time spent with 0% or 1% CPU usage - waiting for commands to
execute - seems like an opportunity for improvement of the windows build?

here's an example that took 11 minutes for a query run on 20,000,000
records..
...
system.time( z <- svymean( ~v0601 , pes.design , byvar = ~v0001 , se =
TRUE ) )
   user  system elapsed
  18.47    1.13  *710.86 *
..totally awesome!  but can it be improved?

==comparison between two machines==

the attached .sql code has been run on two windows machines:

on the windows server with 64 gb of ram--

# Serving database 'test', using 4 threads
# Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
# Found 64.000 GiB available main-memory.

on my personal windows laptop--

# Serving database 'test', using 8 threads
# Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
# Found 7.860 GiB available main-memory.

==processing time comparison==

processing time on the 64gb server: 27 tuples *(5.2s)*

processing time on my 8gb laptop: 27 tuples *(7m 46s)*

*so on a windows machine, if disk paging can be avoided, the processing
time is 90x faster for this attached sql code?*

==how to load this reproducible example==

** data file with four columns **
https://drive.google.com/file/d/0B3LuuP081KDYNERWQWx5MEpZU2M/edit?usp=sharin...

** commands to load the data **
CREATE TABLE "sys"."d10" (
        "idkey"   INTEGER,
        "v0601"   VARCHAR(255),
        "pes_wgt" DOUBLE,
        "v0001"   VARCHAR(255)
);

COPY 20635472 RECORDS INTO d10 FROM 's:/temp/example.txt';

[[see .sql attachment for post-import commands]]

==general diagnostics==

# MonetDB 5 server v11.17.11 "Jan2014-SP1"
# Serving database 'test', using 4 threads
# Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
# Found 64.000 GiB available main-memory.
# Copyright (c) 1993-July 2008 CWI.
# Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
# Visit http://www.monetdb.org/ for further information
# Listening for connection requests on mapi:monetdb://127.0.0.1:50000/
# MonetDB/JAQL module loaded
# MonetDB/SQL module loaded
...
==conclusion question==

the windows version of monetdb is intensely slow at rather unexpected
times.  can and will anything be done about this in the near future?

thanks for monetdb!

Anthony Damico

Anthony Damico

tags

participants (1)