User Tools

Site Tools


difx:run_prototype

Prototype Run File for Espresso

espresso automatically creates a run file (with associated .machines and .threads files) for launching correlator jobs based on the job parameters and the cluster definition file. An example default run file looks like this:

mpirun -np 19 --mca rmaps seq -machinefile aust14_1.machines $DIFXROOT/bin/mpifxcorr aust14_1.input

The default run file will work for many installations, but is not compatible with SLURM, SGE or PBS batch schedulers which may be in use at some sites. espresso will therefore take a customised prototype run file. If there is an environment variable $DIFX_RUNFILE then espresso will use the file pointed to by that environment variable as a prototype run file. The prototype file may contain a number of keywords which espresso will substitute with appropriate values before using it to launch a correlator job. Keywords currently recognised by espresso include:

{JOBNAME}           # the job name
{NTASKS}            # total number of MPI processes (determined by espresso based on contents of $DIFX_MACHINES)
{NTASKS-PER-NODE}   # number of MPI processes per node (determined by espresso heuristically)
{NTHREADS}          # number of threads per process (determined by espresso from $DIFX_MACHINES)
{TIME}              # estimated run time of job (based on assumed speedup factor, which should be chosen conservatively)
{DIFX_MESSAGE_PORT} # the DIFX_MESSAGE_PORT for multicast/unicast messages (set by espresso)

An example prototype run file might look like this:

#!/bin/bash -l
#SBATCH --job-name={JOBNAME}
#SBATCH --ntasks={NTASKS}
#SBATCH --tasks-per-node={NTASKS-PER-NODE}
#SBATCH --cpus-per-task={NTHREADS}
#SBATCH --time={TIME}
#SBATCH --output={JOBNAME}.slurmlog
#SBATCH --error={JOBNAME}.slurmlog

export OMP_NUM_THREADS={NTHREADS}
export DIFX_MESSAGE_PORT={DIFX_MESSAGE_PORT}

srun $DIFXROOT/bin/mpifxcorr {JOBNAME}.input

which espresso will modify, using an example of v558a_1, to produce the following before launching the script:

#!/bin/bash -l
#SBATCH --job-name=v558a_1
#SBATCH --ntasks=32
#SBATCH --tasks-per-node=2
#SBATCH --cpus-per-task=10
#SBATCH --time=00:12:59
#SBATCH --output=v558a_1.slurmlog
#SBATCH --error=v558a_1.slurmlog

export OMP_NUM_THREADS=10
export DIFX_MESSAGE_PORT=50201
difx/run_prototype.txt · Last modified: 2018/06/14 16:46 by cormac