MPI

Cluster has openmpi-4.0.3rc4 that is bundled with InfiniBand. To make it the default MPI, copy /sw/condor/openmpi/mpi-selector to home directory:

cp /sw/condor/openmpi/mpi-selector ~/.mpi-selector

Log out and login again, verify it by execute mpirun or mpicc, it should in search PATH now.

There are two ways to run MPI jobs. Either in Vanilla Universe, like a regular condor job, or run in the Parallel Universe.

MPI in Parallel Universe

After setting the default MPI, one shall able to build MPI applications on bl-0 node. The next step is to submit it to cluster:

#############################################
##  INPAC condor parallel universe job
#############################################
universe = parallel
executable = /sw/condor/openmpi/openmpiscript
arguments = the_mpi_application arg1 arg2

# think machine_count as the number of processes requested
machine_count = 8

# each process default has one CPU. Use request_cpus to request more
# than one CPUs per process. request_cpus must <= 18. Jobs with large
# request_cpus may take very long time to find matching machine.
request_cpus = 1

# INPAC has shared filesystem, do NOT transfer file
should_transfer_files = NO


Log    = mpi.log
Output = mpi.out.$(NODE)
Error  = mpi.err.$(NODE)

queue

MPI in Vanilla Universe

If a MPI job only need few CPUs that can fit on one machine, it is possible to run it in vanilla universe.

First, prepare the job script example_mpi.sh:

#!/bin/bash
# an example MPI script

# setup MPI environment
source /var/mpi-selector/data/openmpi-4.0.3rc4.sh

mpirun -np 2 path_to_the_mpi_application

The job file:

#############################################
##  INPAC MPI job in Vanilla universe
#############################################
universe = vanilla
executable = example_mpi.sh
arguments =

# number of CPUs needed by the job. Must <= 18
request_cpus = 1

should_transfer_files = NO

# only some nodes can run MPI job
requirements = (IsMPI =?= True)

Log    = mpi.log
Output = mpi.out
Error  = mpi.err

queue