All of the threads are suspect, in the sense that they point out either to a deadlock situation or 'mmap' related issues which are both quite difficult to debug.

gdb's "info threads" gives:

  6 process 41456 thread 0x2603 0x00007fff846c6d7e in semaphore_wait_signal_trap ()
  5 process 41456 thread 0x1403 0x00007fff846c6d7e in semaphore_wait_signal_trap ()
  4 process 41456 thread 0x1203 0x00007fff84745886 in msync ()
  3 process 41456 thread 0x1103 0x00007fff84712526 in select$DARWIN_EXTSN ()
  2 process 41456 thread 0xf03 0x00007fff846cdd02 in __semwait_signal ()
* 1 process 41456 thread 0x10b 0x00007fff846cdd02 in __semwait_signal ()

The backtrace for each thread is:

(gdb) thread apply all bt

Thread 6 (process 41456 thread 0x2603):
#0 0x00007fff846c6d7e in semaphore_wait_signal_trap ()
#1 0x00007fff846ce698 in pthread_mutex_lock ()
#2 0x0000000109dc00fe in store_lock ()
#3 0x0000000109d8375c in mvc_create ()
#4 0x0000000109d3e8fa in SQLinitClient (c=0x100115db0) at sql_scenario.c:315
#5 0x000000010005c2c9 in runPhase (c=0x100115db0, phase=5) at mal_scenario.c:352
#6 0x000000010005c332 in runScenarioBody (c=0x100115db0) at mal_scenario.c:375
#7 0x000000010005c5ea in runScenario (c=0x100115db0) at mal_scenario.c:420
#8 0x0000000100017d35 in MSserveClient (dummy=0x100115db0) at mal_session.c:360
#9 0x00007fff846f4dcb in _pthread_start ()
#10 0x00007fff846f4c8d in thread_start ()

Thread 5 (process 41456 thread 0x1403):
#0 0x00007fff846c6d7e in semaphore_wait_signal_trap ()
#1 0x00007fff846ce698 in pthread_mutex_lock ()
#2 0x0000000109dc00fe in store_lock ()
#3 0x0000000109d82ad1 in mvc_trans ()
#4 0x0000000109d3ebe1 in SQLcacheRefresh (m=0x101872208) at sql_scenario.c:386
#5 0x0000000109d3fb00 in SQLparser (c=0x100115b18) at sql_scenario.c:788
#6 0x000000010005c2c9 in runPhase (c=0x100115b18, phase=1) at mal_scenario.c:352
#7 0x000000010005c3c0 in runScenarioBody (c=0x100115b18) at mal_scenario.c:386
#8 0x000000010005c5ea in runScenario (c=0x100115b18) at mal_scenario.c:420
#9 0x0000000100017d35 in MSserveClient (dummy=0x100115b18) at mal_session.c:360
#10 0x00007fff846f4dcb in _pthread_start ()
#11 0x00007fff846f4c8d in thread_start ()

Thread 4 (process 41456 thread 0x1203):
#0 0x00007fff84745886 in msync ()
#1 0x0000000100c38190 in MT_msync (p=0x9ac507000, off=0, len=4391849608, mode=16384) at gdk_posix.c:800
#2 0x0000000100b0ea7c in GDKsave (nme=0x101331d78 "11/1150", ext=0x100d64104 "tail", buf=0x9ac507000, size=4391849608, mode=1) at gdk_storage.c:211
#3 0x000000010089fcc6 in HEAPsave (h=0x10a8ffdb0, nme=0x101331d78 "11/1150", ext=0x100d64104 "tail") at gdk_heap.c:296
#4 0x0000000100b10dfd in BATsave (bd=0x10a02f808) at gdk_storage.c:470
#5 0x000000010089d348 in BBPsync (cnt=250, subcommit=0x10a8fff20) at gdk_bbp.c:2905
#6 0x0000000100886ce7 in TMsubcommit (b=0xc0e057458) at gdk_tm.c:115
#7 0x0000000100c3bc28 in bm_subcommit (list=0x10130ee78, catalog=0x10130ee78, extra=0xc0e07bf38, debug=0) at gdk_logger.c:838
#8 0x0000000100c3eded in bm_commit (lg=0x109f74f08) at gdk_logger.c:1559
#9 0x0000000100c3afc3 in logger_commit (lg=0x109f74f08) at gdk_logger.c:776
#10 0x0000000100c3ca0e in logger_exit (lg=0x109f74f08) at gdk_logger.c:1053
#11 0x0000000100c3cb1d in logger_restart (lg=0x109f74f08) at gdk_logger.c:1080
#12 0x0000000109dca6fd in bl_restart ()
#13 0x0000000109dc0013 in store_manager ()
#14 0x0000000109d82a37 in mvc_logmanager ()
#15 0x00007fff846f4dcb in _pthread_start ()
#16 0x00007fff846f4c8d in thread_start ()

Thread 3 (process 41456 thread 0x1103):
#0 0x00007fff84712526 in select$DARWIN_EXTSN ()
#1 0x00000001087f1e35 in SERVERlistenThread (Sock=0x1094ba0f8) at mal_mapi.c:152
#2 0x00007fff846f4dcb in _pthread_start ()
#3 0x00007fff846f4c8d in thread_start ()

Thread 2 (process 41456 thread 0xf03):
#0 0x00007fff846cdd02 in __semwait_signal ()
#1 0x00007fff84735f27 in nanosleep ()
#2 0x0000000100c3882f in MT_sleep_ms (ms=5000) at gdk_posix.c:1774
#3 0x000000010096c45b in GDKvmtrim (limit=0x100d74868) at gdk_utils.c:1350
#4 0x00007fff846f4dcb in _pthread_start ()
#5 0x00007fff846f4c8d in thread_start ()

Thread 1 (process 41456 thread 0x10b):
#0 0x00007fff846cdd02 in __semwait_signal ()
#1 0x00007fff84735f27 in nanosleep ()
#2 0x0000000100c3882f in MT_sleep_ms (ms=5000) at gdk_posix.c:1774
#3 0x0000000100002dd7 in main (argc=7, av=0x7fff5fbfee80) at mserver5.c:514

On Thu, Mar 19, 2009 at 6:01 PM, Yue Sheng <yuesheng8@gmail.com> wrote:

Managed to COPY 161 files to one table and another 162 files to another table. But when I do a simple select count(*) from table2; it failed (nothing, just hangs)....

Has anyone successfully run this database on the MAC, in 64bit, with table (not database) size larger than 500million rows (10 columns)?

On Thu, Mar 19, 2009 at 8:59 AM, Yue Sheng <yuesheng8@gmail.com> wrote:
Now have tried the same bulk load with COPY on the latest (v5.10.0) with single thread setting.

Failed again!

System:
OS X
MonetDB complied in 64bit

number of files to load: 322
total size of files: 15GB
Max row in file: 3million

On Wed, Mar 18, 2009 at 11:51 AM, Martin Kersten <Martin.Kersten@cwi.nl> wrote:

Yue Sheng wrote:

I'm not sure how "The (parallel) load used scratch area as well" is related to the question.

If you look at the code, you will notice that there is a two phase
loading process involving (possibly) multiple threads

Sorry if I'm a bit slow.

On Wed, Mar 18, 2009 at 11:25 AM, Martin Kersten <Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>> wrote:

Yue Sheng wrote:

Sorry, if I wasn't clear on the first question:

(1) we ramp up N for the first insert to claim sufficient space.
Sure, understand that one.

But:

The claimed space got "given back" *right after* the first
insert. (this is the part I don't understand.)

The (parallel) load used scratch area as well

Question: how does the second, third, .... inserts get the
"benefit" of the ramp up that we did for the first insert?

Is this a bit clearer what my question pertains?

Thanks.

On Wed, Mar 18, 2009 at 10:26 AM, Martin Kersten
<Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>
<mailto:Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>>>
wrote:

Yue Sheng wrote:

Three questions that bothers me are:
(1) why we need to ramp up N to total of all line in
first insert.

to let the kernel claim sufficient space

Reason I ask is that right after first insert, the allocation
drop right down from, say 100GB to 35GB, and stays
roughly there
for *all* subsequent inserts. I totally do not understand
this.
(2) in your opinion, based on this experience, what could
be the
potential problem here?

little to none, as the files are memory mapped, which only
may cause
io on some systems

(3) in your opinion, would the newer version cure the
problem?

a system can never correctly guess what will come,
especially since the source of a COPY command need not be a file
but standard input, i.e. a stream.

Thanks.

On Tue, Mar 17, 2009 at 10:51 PM, Martin Kersten
<Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>
<mailto:Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>>
<mailto:Martin.Kersten@cwi.nl
<mailto:Martin.Kersten@cwi.nl> <mailto:Martin.Kersten@cwi.nl
<mailto:Martin.Kersten@cwi.nl>>>>

wrote:

Yue Sheng wrote:

Martin,

It almost worked...

This is what I did and what have happened:

I have 322 files to insert into data base,
totaling 650
million rows

I divided the file list into two, then for each
sub list

(a) I insert first file in the list with N set to
650milllion
rows, (b) all subsequent files have N set to the
number
of lines
in *that* file

once first list done, then

(c) I insert first file in the second list with N
set to
650million rows,
(d) all subsequent files have N set to the number
of lines in
*that* file

Then the same problem happened: it stucked at file
number
316.

ok. using the 650M enables MonetDB to allocate enough
space
and does
not have to fall back on guessing. Guessing is
painful, because
when a file of N records has been created and it needs
more,
it makes
a file of size 1.3xN. This leads to memory fragmentation.

in your case i would have been a little mode spacious
and used
700M as a start, because miscalculation of 1 gives a
lot of pain.
Such advice is only needed in (a)

Note: This is farther then previous tries, which all
stopped in
the region of file number 280 +/- a few.

My observation:
(i) at (a), the VSIZE went up to around 46GB, then
after
first
insert, it drops to around 36GB

ok fits

(ii) at (c), the VSIZE went up to around 130GB, then
after first
insert, it drops to around 45GB

you tell the system to extend existing BATs prepare for
another 650 M,
which means it allocates 2*36 G, plus room for the old one
gives 108GB
then during processings some temporary BATs may be
needed,e.g. to check
integrity constraints after each file,.
Then it runs out of swapspace.

(iii) the "Free Memory", as reported by Activity
Monitor,
just
before it failed at file number 316, dipped to as
low as
7.5MB!

yes, you are running out of swapspace on your system.
This should not have happened, because the system uses
mmapped files
and may be an issue with the MacOS or relate to a
problem we
fixed
recently

My question:
(1) why we need to ramp N up to total number of
lines (it
takes
along time to do that), then only have it drop down to
30GB-40GB
right after

this might indicate that on MacOS, just like Windows,
mmaped
files
need to be written to disk. With a disk bandwidth of
50MB/sec it
still takes several minutes

the first insert and stay roughly there? Does it
mean we're
giving back all the pre-allocation space back to
the OS? Then
should we set N always to total number of lines?
If so,
it would
take much much longer to process all the files...
(2) How come RSIZE never goes above 4GB? (3) Does
sql log
file
size have some limit, that we need to tweak?

no limit

(4) Has anyone successfully implemented the 64bit
version of
MondeDB and successfully inserted more than
1billion rows?

you platform may be the first, but Amherst has worked with
Macs for
years

(5) when you say you "...The VSIZE of 44G is not too
problematic, i am looking at queries letting it tumble
between
20-80 GB....," What does it mean? Mine went up to
as high as
135GB...

explained above.

regards, Martin

Thanks, as always.