MPI and Condor
From Ben's Writing
Contents |
Introduction
Installation
Download a copy of the MPICH2 source from the Argonne site:
$ wget http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.2.1/mpich2-1.2.1.tar.gz
Unpack the source to some temporary location:
$ tar zxvpf mpich2-1.2.1.tar.gz
Now configure and build MPICH.
$ ./configure --prefix=/local/raid1/condor/binaries/mpich2-1.2.1-`uname`-`uname -p` Configuring MPICH2 version 1.2.1 with '--prefix=/local/raid1/condor/binaries/mpich2-Linux-i686' Running on system: Linux tenor 2.6.18-128.1.16.el5.centos.plus #1 SMP Wed Jul 1 13:06:47 EDT 2009 i686 i686 i386 GNU/Linux Executing mpich2prereq in /tmp/mpich2-1.2.1/src/mpid/ch3 with Executing mpich2prereq in /tmp/mpich2-1.2.1/src/mpid/ch3/channels/nemesis sourcing /tmp/mpich2-1.2.1/src/pm/mpd/mpich2prereq sourcing /tmp/mpich2-1.2.1/src/pm/hydra/mpich2prereq sourcing /tmp/mpich2-1.2.1/src/pm/gforker/mpich2prereq sourcing /tmp/mpich2-1.2.1/src/pm/mpd/setup_pm checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes ...
$ make Beginning make Using variables CC='gcc' CFLAGS=' -O2' AR='ar' FC='gfortran' F90='f95' FFLAGS=' \ -O2' F90FLAGS=' -O2' CXX='c++' CPPFLAGS=' -I/tmp/mpich2-1.2.1/src/openpa/src \ -I/tmp/mpich2-1.2.1/src/openpa/src -DUSE_PROCESS_LOCKS \ ...
$ make install if [ ! -d /local/raid1/condor/binaries/mpich2-Linux-i686 ] ; then mkdir -p //local/raid1/condor/binaries/mpich2-Linux-i686 ; fi if [ ! -d /local/raid1/condor/binaries/mpich2-Linux-i686/share ] ; then mkdir -p /local/raid1/condor/binaries/mpich2-Linux-i686/share ; fi if [ ! -d /local/raid1/condor/binaries/mpich2-Linux-i686/share/doc/ ] ; then mkdir -p /local/raid1/condor/binaries/mpich2-Linux-i686/share/doc/ ; fi ...
$ cd /local/raid1/condor/binaries $ ln -sf mpich2-1.2.1-`uname`-`uname -p` mpich2-`uname`-`uname -p`
Configuration
The aim of installing MPICH2 is to use it in conjunction with Condor. To this end, we need to make some changes to Condor's configuration, as well as edit some helper scripts.
Condor Configuration
We use the following configuration on the dedicated pool machines:
DAEMON_LIST = MASTER, STARTD WANT_SUSPEND = False WANT_VACATE = False START = True SUSPEND = False CONTINUE = True PREEMPT = False KILL = False RANK = 0 DedicatedScheduler = "DedicatedScheduler@scheduler1.cs.uleth.ca" SUSPEND = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND)) PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT)) RANK_FACTOR = 1000000 RANK = (Scheduler =?= $(DedicatedScheduler) * \ $(RANK_FACTOR)) + $(RANK) START = (Scheduler =?= $(DedicatedScheduler)) || ($(START)) MPI_CONDOR_RSH_PATH = $(LIBEXEC) CONDOR_SSHD = /usr/sbin/sshd CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
This always runs dedicated jobs, but only allow non-dedicated jobs to run on an opportunistic basis.
Helper Scripts
Once Condor has been configured to run parallel jobs, we must modify a helper script such that Condor can use it to interact with the new MPICH2 installation.
Copy the template script to the condor home directory and update it to point to the local Condor installation:
$ cd ~condor/bin $ cp /local/raid1/condor/binaries/`uname`-`uname -p`/etc/examples/mp1script mp2script $ perl -pi -e "s/\/u\/g\/t\/gthain\/mpich-1.2.6/\/local\/raid1\/condor\/binaries\/mpich2-\`uname\`-\`uname -p\`/g" mp2script
Make sure the Condor home directory is in your path before you try following the instructions for submitting an MPI job to Condor.