Search my blog
Custom Search

Wednesday, September 10, 2008

Segmentation error occured running WRF with MPICH2

1. Config
[lin@nox src]$ uname -a
Linux nox 2.6.18-92.1.6.el5.centos.plus #1 SMP Thu Jun 26 12:18:07 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[lin@nox src]$ mpich2version
MPICH2 Version: 1.0.7
MPICH2 Release date: Unknown, built on Sun Aug 24 00:17:24 CDT 2008
MPICH2 Device: ch3:sock
MPICH2 configure:
MPICH2 CC: gcc -O2
MPICH2 CXX: c++ -O2
MPICH2 F77: f95 -O2
MPICH2 F90: ifort -O2

2.Error Message
by running the command:
mpirun -np 4 wrf.exe

The following errors occurred:
rank 3 in job 1 nox_48738 caused collective abort of all ranks
exit status of rank 3: killed by signal 11

3.Reason
before you run mpd daemon, you should set stacksize to unlimited.
To solve this problem, read on.

4. My solution memo
--------------------------------------------------------------
1.using root mpd
1)login as root to confirm if mpd is running or not.
if not running then
(1)check current stacksize option, make sure it's unlimited.
$bash(root's default shell):
ulimit -a
(by default, root uses bash as its shell and stacksize is,
already set to unlimited. This is the reason that mpd
running as root didn't failure.)
if stacksize is not unlimited, then
ulimit -s
to set it to unlimited.

$tcsh(or csh):
limit
if stacksize is not unlimited, then
limit stacksize unlimited to set it to unlimited or
unlimit to set all resources to unlimited.

(2)lauch mpd
mpd & or mpd --daemon

#NOTE:you may ask your sysadmin to add (1) routie to startup
scripts so that mpd will be started automatically whenever
machine reboots.

2)logout root and setup ~/.mpd.conf file.
add follow lines to .mpd.conf file in your home dir.
MPD_USE_ROOT_MPD=1

3)run mpirun in your own shell.
example:
mpirun -np 4 wrf.exe
or
mpiexec -n 4 wrf.exe
--------------------------------------------------------------
2.using your own mpd
1)login to confirm if mpd is running or not.
if not running then
(1)check current stacksize option, make sure it's unlimited.
reference(1) section above.

(2)launch mpd
mpd & or mpd --daemon
#NOTE: you can automate this step by modifying your login scripts.
Do remember mpd is resource consuming.

2)check ~/.mpd.conf.
comment out following line.
MPD_USE_ROOT_MPD=1

3)run mpirun in your own shell.
example:
mpirun -np 4 wrf.exe
or
mpiexec -n 4 wrf.exe