MPI and Condor

From Ben's Writing

Jump to: navigation, search

Contents

Introduction

Installation

Download a copy of the MPICH2 source from the Argonne site:

$ wget http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.2.1/mpich2-1.2.1.tar.gz

Unpack the source to some temporary location:

$ tar zxvpf mpich2-1.2.1.tar.gz

Now configure and build MPICH.

$ ./configure --prefix=/local/raid1/condor/binaries/mpich2-1.2.1-`uname`-`uname -p`
Configuring MPICH2 version 1.2.1 with  '--prefix=/local/raid1/condor/binaries/mpich2-Linux-i686'
Running on system: Linux tenor 2.6.18-128.1.16.el5.centos.plus #1 SMP Wed Jul 1 13:06:47 EDT 2009 i686 i686 i386 GNU/Linux
Executing mpich2prereq in /tmp/mpich2-1.2.1/src/mpid/ch3 with 
Executing mpich2prereq in /tmp/mpich2-1.2.1/src/mpid/ch3/channels/nemesis
sourcing /tmp/mpich2-1.2.1/src/pm/mpd/mpich2prereq
sourcing /tmp/mpich2-1.2.1/src/pm/hydra/mpich2prereq
sourcing /tmp/mpich2-1.2.1/src/pm/gforker/mpich2prereq
sourcing /tmp/mpich2-1.2.1/src/pm/mpd/setup_pm
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
...
$ make
Beginning make
Using variables CC='gcc' CFLAGS=' -O2' AR='ar' FC='gfortran' F90='f95' FFLAGS=' \
 -O2' F90FLAGS=' -O2' CXX='c++' CPPFLAGS=' -I/tmp/mpich2-1.2.1/src/openpa/src \
 -I/tmp/mpich2-1.2.1/src/openpa/src -DUSE_PROCESS_LOCKS  \
...
$ make install
if [ ! -d /local/raid1/condor/binaries/mpich2-Linux-i686 ] ; then mkdir -p //local/raid1/condor/binaries/mpich2-Linux-i686 ; fi
if [ ! -d /local/raid1/condor/binaries/mpich2-Linux-i686/share ] ; then mkdir -p /local/raid1/condor/binaries/mpich2-Linux-i686/share ; fi
if [ ! -d /local/raid1/condor/binaries/mpich2-Linux-i686/share/doc/ ] ; then mkdir -p /local/raid1/condor/binaries/mpich2-Linux-i686/share/doc/ ; fi
...
$ cd /local/raid1/condor/binaries
$ ln -sf mpich2-1.2.1-`uname`-`uname -p` mpich2-`uname`-`uname -p`

Configuration

The aim of installing MPICH2 is to use it in conjunction with Condor. To this end, we need to make some changes to Condor's configuration, as well as edit some helper scripts.

Condor Configuration

We use the following configuration on the dedicated pool machines:

DAEMON_LIST = MASTER, STARTD

WANT_SUSPEND	= False
WANT_VACATE	= False
START		= True
SUSPEND		= False
CONTINUE	= True
PREEMPT		= False
KILL		= False
RANK		= 0

DedicatedScheduler = "DedicatedScheduler@scheduler1.cs.uleth.ca"

SUSPEND		= Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND))
PREEMPT		= Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT))
RANK_FACTOR	= 1000000
RANK		= (Scheduler =?= $(DedicatedScheduler) * \
		  $(RANK_FACTOR)) + $(RANK)
START		= (Scheduler =?= $(DedicatedScheduler)) || ($(START))

MPI_CONDOR_RSH_PATH = $(LIBEXEC)
CONDOR_SSHD = /usr/sbin/sshd
CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

This always runs dedicated jobs, but only allow non-dedicated jobs to run on an opportunistic basis.

Helper Scripts

Once Condor has been configured to run parallel jobs, we must modify a helper script such that Condor can use it to interact with the new MPICH2 installation.

Copy the template script to the condor home directory and update it to point to the local Condor installation:

$ cd ~condor/bin
$ cp /local/raid1/condor/binaries/`uname`-`uname -p`/etc/examples/mp1script mp2script
$ perl -pi -e "s/\/u\/g\/t\/gthain\/mpich-1.2.6/\/local\/raid1\/condor\/binaries\/mpich2-\`uname\`-\`uname -p\`/g" mp2script

Make sure the Condor home directory is in your path before you try following the instructions for submitting an MPI job to Condor.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox