hi monetdb team, could disk paging performance be improved on windows?  a query run without versus with any disk paging is kinda like the hulk versus plain old bruce banner.  ;)



==background==

i'm working on a longer tutorial for http://asdfree.com/ about how to work with census microdata from 74 different countries (yes, 74!) with R and monetdb.  i'm planning to release a blog post specifically for the brazilian microdata and then expand that code to work on the other 73 countries.  both brazilian census bureau employees and outside users have taken a serious interest in this work, with official links from one, two, three of their microdata pages.

r and monetdb are obviously great choices for analysts who don't have the resources for expensive computers and/or software.  a former employee at insee (the french stat. bureau) who currently consults on international demography projects outlined this problem for me pretty clearly: lower-resource statistical agencies in the developing world need a good, free option like (R + monetdb) for analysis of larger data sets on the (generally cheaper) hardware that they already have.

international census bureau employees and other demographers are much more likely to have statistical but not computer programming skills, so as i work on these instructional documents, i'll aim to make the computing side of things as easy as possible.  and, most central to this e-mail, much of this new potential audience uses windows.  (on march 10th, 2014, 67% of MonetDB.R package downloads were windows users)



==problem==

certain sql commands - it's hard for me to pin down which ones - tend to hang.  i will run a command, look at my "process manager" and see RAM start getting gobbled up (not by mserver.exe but by some invisible process--i think disk paging?), and then wait.  mserver.exe will use between 0% and 1% of the total CPU for a few minutes, then something will "wake up" and suddenly the CPU starts lighting up.. with 70%-90% usage across multiple cores.  this 1% CPU usage to 75% CPU usage can jump back and forth a few times until the full command finally finishes.

all of the time spent with 0% or 1% CPU usage - waiting for commands to execute - seems like an opportunity for improvement of the windows build?

here's an example that took 11 minutes for a query run on 20,000,000 records..

> system.time( z <- svymean( ~v0601 , pes.design , byvar = ~v0001 , se = TRUE ) )
   user  system elapsed
  18.47    1.13  710.86

..totally awesome!  but can it be improved?



==comparison between two machines==

the attached .sql code has been run on two windows machines:


on the windows server with 64 gb of ram--

# Serving database 'test', using 4 threads
# Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
# Found 64.000 GiB available main-memory.

on my personal windows laptop--

# Serving database 'test', using 8 threads
# Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
# Found 7.860 GiB available main-memory.



==processing time comparison==

processing time on the 64gb server: 27 tuples (5.2s)

processing time on my 8gb laptop: 27 tuples (7m 46s)


so on a windows machine, if disk paging can be avoided, the processing time is 90x faster for this attached sql code?



==how to load this reproducible example==

** data file with four columns **
https://drive.google.com/file/d/0B3LuuP081KDYNERWQWx5MEpZU2M/edit?usp=sharing


** commands to load the data **
CREATE TABLE "sys"."d10" (
        "idkey"   INTEGER,
        "v0601"   VARCHAR(255),
        "pes_wgt" DOUBLE,
        "v0001"   VARCHAR(255)
);

COPY 20635472 RECORDS INTO d10 FROM 's:/temp/example.txt';

[[see .sql attachment for post-import commands]]



==general diagnostics==

# MonetDB 5 server v11.17.11 "Jan2014-SP1"
# Serving database 'test', using 4 threads
# Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
# Found 64.000 GiB available main-memory.
# Copyright (c) 1993-July 2008 CWI.
# Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
# Visit http://www.monetdb.org/ for further information
# Listening for connection requests on mapi:monetdb://127.0.0.1:50000/
# MonetDB/JAQL module loaded
# MonetDB/SQL module loaded
>



==conclusion question==

the windows version of monetdb is intensely slow at rather unexpected times.  can and will anything be done about this in the near future?


thanks for monetdb!