if anyone has ideas on #3-5 below, i'd appreciate it.
first just following up with a few additional snags..
1)
this screenshot i've uploaded shows one more example from some demonstrations i gave
to a census consultant last week.. that server has 64gb of ram and still sometimes behaves like it's paging. :/
note that R (background) is frozen because it's waiting for monetdb to do something (there's plenty of RAM accessible but 0% of the cpus are being used). this issue is simply resolved by waiting.. but it's not ideal, and i don't think it can be blamed on overloaded memory.
2) i made
this commit to my code to import the brazilian census. on my 8gb laptop: previously, the code crashed mserver. now everything works. as you can see from the additions, the only thing that's different is a few additional server shutdowns and restarts. this was a pretty painless workaround, but it's weird to me that something as silly as a server restart would prevent a crash.
..i hit issues like these all the time. and when i do, i just generally feel like the windows build could use some more scrutiny from people who know what should happen. if you could incorporate more windows performance reviews on machines with 16GB or less, i'd be grateful. sometimes monetdb just gets finnicky, but that's hard for me to reproduce for your team.
===========
i also think it might be wise to update
this guide with the answers to a few additional questions:
3) at times, the monetdb-backed data sets
i have written about already do end up with disk paging in certain circumstances. my instructions to users have generally been "leave your computer overnight and let it run" but that only works on download/import/cleaning, not during an interactive data analysis session. can any of you give me some straightforward guidelines about "how to avoid disk paging" that would be understandable to people who may not have much experience with SQL?
the memory footprint description is helpful for more advanced users, but for people who have never used a database before, are there any good rules of thumb like "never go over ten columns x one million records per gigabyte of RAM on your machine" ? more concrete pointers might help new users understand why some commands run very quickly and others clog the computer. i am looking for something like "how to avoid disk paging for dummies" :)
4) in addition to sheer data table size, are there specific commands that tend to hog memory that should be avoided or re-written gracefully to conserve computing resources? as a starting example (but hopefully there are others as well), could the SQL code from my previous e-mail (the code that was very slow on my 8gb laptop) be re-written in a more efficient manner? commands like the ones attached to my previous e-mail are very very common in the
sqlsurvey package's R code, so if the monetdb team can give me some re-write advice, i would be able to comb through dr. lumley's code and make some recommended revisions for him to improve the whole system.
5) can someone just confirm that "how to detect when your disk is paging" is pretty straightforward on microsoft windows: (a) open task manager (b) click the "performance" tab (c) look for zero CPU usage and overloaded RAM usage, and optional (d) click on "resource monitor" and note the heavy movement under "disk activity" ..i think that's it, but maybe there are other things to look for?