We did a gdb after the database "hangs" during the COPY... INTO.... phase (note the server was set to run in SINGLE thread mode, i.e. set gdk_nr_threads = 1):

All of the threads are suspect, in the sense that they point out either to a deadlock situation or 'mmap' related issues which are both quite difficult to debug.

 gdb's "info threads" gives:

  6 process 41456 thread 0x2603  0x00007fff846c6d7e in semaphore_wait_signal_trap ()                                                                                                                                                                                                    
  5 process 41456 thread 0x1403  0x00007fff846c6d7e in semaphore_wait_signal_trap ()                                                                                                                                                                                                    
  4 process 41456 thread 0x1203  0x00007fff84745886 in msync ()                                                                                                                                                                                                                         
  3 process 41456 thread 0x1103  0x00007fff84712526 in select$DARWIN_EXTSN ()                                                                                                                                                                                                           
  2 process 41456 thread 0xf03  0x00007fff846cdd02 in __semwait_signal ()                                                                                                                                                                                                               
* 1 process 41456 thread 0x10b  0x00007fff846cdd02 in __semwait_signal ()

The backtrace for each thread is:

(gdb) thread apply all bt

Thread 6 (process 41456 thread 0x2603):
#0  0x00007fff846c6d7e in semaphore_wait_signal_trap ()
#1  0x00007fff846ce698 in pthread_mutex_lock ()
#2  0x0000000109dc00fe in store_lock ()
#3  0x0000000109d8375c in mvc_create ()
#4  0x0000000109d3e8fa in SQLinitClient (c=0x100115db0) at sql_scenario.c:315
#5  0x000000010005c2c9 in runPhase (c=0x100115db0, phase=5) at mal_scenario.c:352
#6  0x000000010005c332 in runScenarioBody (c=0x100115db0) at mal_scenario.c:375
#7  0x000000010005c5ea in runScenario (c=0x100115db0) at mal_scenario.c:420
#8  0x0000000100017d35 in MSserveClient (dummy=0x100115db0) at mal_session.c:360
#9  0x00007fff846f4dcb in _pthread_start ()
#10 0x00007fff846f4c8d in thread_start ()

Thread 5 (process 41456 thread 0x1403):
#0  0x00007fff846c6d7e in semaphore_wait_signal_trap ()
#1  0x00007fff846ce698 in pthread_mutex_lock ()
#2  0x0000000109dc00fe in store_lock ()
#3  0x0000000109d82ad1 in mvc_trans ()
#4  0x0000000109d3ebe1 in SQLcacheRefresh (m=0x101872208) at sql_scenario.c:386
#5  0x0000000109d3fb00 in SQLparser (c=0x100115b18) at sql_scenario.c:788
#6  0x000000010005c2c9 in runPhase (c=0x100115b18, phase=1) at mal_scenario.c:352
#7  0x000000010005c3c0 in runScenarioBody (c=0x100115b18) at mal_scenario.c:386
#8  0x000000010005c5ea in runScenario (c=0x100115b18) at mal_scenario.c:420
#9  0x0000000100017d35 in MSserveClient (dummy=0x100115b18) at mal_session.c:360
#10 0x00007fff846f4dcb in _pthread_start ()
#11 0x00007fff846f4c8d in thread_start ()

Thread 4 (process 41456 thread 0x1203):
#0  0x00007fff84745886 in msync ()
#1  0x0000000100c38190 in MT_msync (p=0x9ac507000, off=0, len=4391849608, mode=16384) at gdk_posix.c:800
#2  0x0000000100b0ea7c in GDKsave (nme=0x101331d78 "11/1150", ext=0x100d64104 "tail", buf=0x9ac507000, size=4391849608, mode=1) at gdk_storage.c:211
#3  0x000000010089fcc6 in HEAPsave (h=0x10a8ffdb0, nme=0x101331d78 "11/1150", ext=0x100d64104 "tail") at gdk_heap.c:296
#4  0x0000000100b10dfd in BATsave (bd=0x10a02f808) at gdk_storage.c:470
#5  0x000000010089d348 in BBPsync (cnt=250, subcommit=0x10a8fff20) at gdk_bbp.c:2905
#6  0x0000000100886ce7 in TMsubcommit (b=0xc0e057458) at gdk_tm.c:115
#7  0x0000000100c3bc28 in bm_subcommit (list=0x10130ee78, catalog=0x10130ee78, extra=0xc0e07bf38, debug=0) at gdk_logger.c:838
#8  0x0000000100c3eded in bm_commit (lg=0x109f74f08) at gdk_logger.c:1559
#9  0x0000000100c3afc3 in logger_commit (lg=0x109f74f08) at gdk_logger.c:776
#10 0x0000000100c3ca0e in logger_exit (lg=0x109f74f08) at gdk_logger.c:1053
#11 0x0000000100c3cb1d in logger_restart (lg=0x109f74f08) at gdk_logger.c:1080
#12 0x0000000109dca6fd in bl_restart ()
#13 0x0000000109dc0013 in store_manager ()
#14 0x0000000109d82a37 in mvc_logmanager ()
#15 0x00007fff846f4dcb in _pthread_start ()
#16 0x00007fff846f4c8d in thread_start ()

Thread 3 (process 41456 thread 0x1103):
#0  0x00007fff84712526 in select$DARWIN_EXTSN ()
#1  0x00000001087f1e35 in SERVERlistenThread (Sock=0x1094ba0f8) at mal_mapi.c:152
#2  0x00007fff846f4dcb in _pthread_start ()
#3  0x00007fff846f4c8d in thread_start ()

Thread 2 (process 41456 thread 0xf03):
#0  0x00007fff846cdd02 in __semwait_signal ()
#1  0x00007fff84735f27 in nanosleep ()
#2  0x0000000100c3882f in MT_sleep_ms (ms=5000) at gdk_posix.c:1774
#3  0x000000010096c45b in GDKvmtrim (limit=0x100d74868) at gdk_utils.c:1350
#4  0x00007fff846f4dcb in _pthread_start ()
#5  0x00007fff846f4c8d in thread_start ()

Thread 1 (process 41456 thread 0x10b):
#0  0x00007fff846cdd02 in __semwait_signal ()
#1  0x00007fff84735f27 in nanosleep ()
#2  0x0000000100c3882f in MT_sleep_ms (ms=5000) at gdk_posix.c:1774
#3  0x0000000100002dd7 in main (argc=7, av=0x7fff5fbfee80) at mserver5.c:514
 



On Thu, Mar 19, 2009 at 6:01 PM, Yue Sheng <yuesheng8@gmail.com> wrote:
Managed to COPY 161 files to one table and another 162 files to another table. But when I do a simple select count(*) from table2; it failed (nothing, just hangs)....

Has anyone successfully run this database on the MAC, in 64bit, with table (not database) size larger than 500million rows (10 columns)?


On Thu, Mar 19, 2009 at 8:59 AM, Yue Sheng <yuesheng8@gmail.com> wrote:
Now have tried the same bulk load with COPY on the latest (v5.10.0) with single thread setting. 

Failed again!

System:
OS X 
MonetDB complied in 64bit
number of files to load: 322
total size of files: 15GB
Max row in file: 3million

On Wed, Mar 18, 2009 at 11:51 AM, Martin Kersten <Martin.Kersten@cwi.nl> wrote:
Yue Sheng wrote:
I'm not sure how "The (parallel) load used scratch area as well" is related to the question.
If you look at the code, you will notice that there is a two phase
loading process involving (possibly) multiple threads

Sorry if I'm a bit slow.

On Wed, Mar 18, 2009 at 11:25 AM, Martin Kersten <Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>> wrote:

   Yue Sheng wrote:

       Sorry, if I wasn't clear on the first question:

       (1) we ramp up N for the first insert to claim sufficient space.
       Sure, understand that one.

       But:

       The claimed space got "given back" *right after* the first
       insert. (this is the part I don't understand.)

   The (parallel) load used scratch area as well

       Question: how does the second, third, .... inserts get the
       "benefit" of the ramp up that we did for the first insert?

       Is this a bit clearer what my question pertains?

       Thanks.


       On Wed, Mar 18, 2009 at 10:26 AM, Martin Kersten
       <Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>
       <mailto:Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>>>
       wrote:

          Yue Sheng wrote:

              Three questions that bothers me are:
              (1) why we need to ramp up N to total of all line in
       first insert.

          to let the kernel claim sufficient space

              Reason I ask is that right after first insert, the allocation
              drop right down from, say 100GB to 35GB, and stays
       roughly there
              for *all* subsequent inserts. I totally do not understand
       this.
              (2) in your opinion, based on this experience, what could
       be the
              potential problem here?

          little to none, as the files are memory mapped, which only
       may cause
          io on some systems

              (3) in your opinion, would the newer version cure the
       problem?

          a system can never correctly guess what will come,
          especially since the source of a COPY command need not be a file
          but standard input, i.e. a stream.


              Thanks.


              On Tue, Mar 17, 2009 at 10:51 PM, Martin Kersten
              <Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>
       <mailto:Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>>
              <mailto:Martin.Kersten@cwi.nl
       <mailto:Martin.Kersten@cwi.nl> <mailto:Martin.Kersten@cwi.nl
       <mailto:Martin.Kersten@cwi.nl>>>>

              wrote:

                 Yue Sheng wrote:

                     Martin,

                     It almost worked...

                     This is what I did and what have happened:

                     I have 322 files to insert into data base,
       totaling 650
              million rows

                     I divided the file list into two, then for each
       sub list

                     (a) I insert first file in the list with N set to
       650milllion
                     rows, (b) all subsequent files have N set to the
       number
              of lines
                     in *that* file

                     once first list done, then

                     (c) I insert first file in the second list with N
       set to
                     650million rows,
                     (d) all subsequent files have N set to the number
       of lines in
                     *that* file

                     Then the same problem happened: it stucked at file
       number
              316.


                 ok. using the 650M enables MonetDB to allocate enough
       space
              and does
                 not have to fall back on guessing. Guessing is
       painful, because
                 when a file of N records has been created and it needs
       more,
              it makes
                 a file of size 1.3xN. This leads to memory fragmentation.

                 in your case i would have been a little mode spacious
       and used
                 700M as a start, because miscalculation of 1 gives a
       lot of pain.
                 Such advice is only needed in (a)


                     Note: This is farther then previous tries, which all
              stopped in
                     the region of file number 280 +/- a few.

                     My observation:
                     (i) at (a), the VSIZE went up to around 46GB, then
       after
              first
                     insert, it drops to around 36GB

                 ok fits

                     (ii) at (c), the VSIZE went up to around 130GB, then
              after first
                     insert, it drops to around 45GB

                 you tell the system to extend existing BATs prepare for
              another 650 M,
                 which means it allocates 2*36 G, plus room for the old one
              gives 108GB
                 then during processings some temporary BATs may be
              needed,e.g. to check
                 integrity constraints after each file,.
                 Then it runs out of swapspace.

                     (iii) the "Free Memory", as reported by Activity
       Monitor,
              just
                     before it failed at file number 316, dipped to as
       low as
              7.5MB!

                 yes, you are running out of swapspace on your system.
                 This should not have happened, because the system uses
              mmapped files
                 and may be an issue with the MacOS or relate to a
       problem we
              fixed
                 recently



                     My question:
                     (1) why we need to ramp N up to total number of
       lines (it
              takes
                     along time to do that), then only have it drop down to
              30GB-40GB
                     right after

                 this might indicate that on MacOS, just like Windows,
       mmaped
              files
                 need to be written to disk. With a disk bandwidth of
       50MB/sec it
                 still takes several minutes

                     the first insert and stay roughly there? Does it
       mean we're
                     giving back all the pre-allocation space back to
       the OS? Then
                     should we set N always to total number of lines?
       If so,
              it would
                     take much much longer to process all the files...
                     (2) How come RSIZE never goes above 4GB? (3) Does
       sql log
              file
                     size have some limit, that we need to tweak?

                 no limit

                     (4) Has anyone successfully implemented the 64bit
       version of
                     MondeDB and successfully inserted more than
       1billion rows?

                 you platform may be the first, but Amherst has worked with
              Macs for
                 years

                     (5) when you say you "...The VSIZE of 44G is not too
                     problematic, i am looking at queries letting it tumble
              between
                     20-80 GB....," What does it mean? Mine went up to
       as high as
                     135GB...

                 explained above.

                 regards, Martin


                     Thanks, as always.