I've been using *nixes pretty much full time since 1994, and sys-adminning my own Linux boxes since 1996, and yet yesterday I learned something completely new.I was messing around stress-testing some Message Passing parallel code using MPICH on a shared memory system (couldn't be bother to mess around with going across the network and set up ssh keys for a simple test, OK), only to discover after some weird errors, that the ch_shmem system of MPICH relies on System V-style inter process communication, which in terms uses semaphores, of which each user is only allocated a small number (32 arrays or something silly). Interestingly, nothing else on my system appears to be using semaphores or message arrays.
If MPI crashes while running it fails to clean up the semaphore(s), and if you do this too many times you end up not being able to run anything that relies on semaphores, which causes much confusion if you didn't know about this stuff. Hence you need to run ipcs -s to see your semaphores (hopefully none, if your MPI program has finished running), and ipcrm -s
 
 


No comments:
Post a Comment