espresso
automatically creates a run file (with associated .machines and .threads files) for launching correlator jobs based on the job parameters and the cluster definition file. An example default run file looks like this:
mpirun -np 19 --mca rmaps seq -machinefile aust14_1.machines $DIFXROOT/bin/mpifxcorr aust14_1.input
The default run file will work for many installations, but is not compatible with SLURM, SGE or PBS batch schedulers which may be in use at some sites. espresso
will therefore take a customised prototype run file. If there is an environment variable $DIFX_RUNFILE
then espresso
will use the file pointed to by that environment variable as a prototype run file. The prototype file may contain a number of keywords which espresso will substitute with appropriate values before using it to launch a correlator job. Keywords currently recognised by espresso include:
{JOBNAME} # the job name {NTASKS} # total number of MPI processes (determined by espresso based on contents of $DIFX_MACHINES) {NTASKS-PER-NODE} # number of MPI processes per node (determined by espresso heuristically) {NTHREADS} # number of threads per process (determined by espresso from $DIFX_MACHINES) {TIME} # estimated run time of job (based on assumed speedup factor, which should be chosen conservatively) {DIFX_MESSAGE_PORT} # the DIFX_MESSAGE_PORT for multicast/unicast messages (set by espresso)
An example prototype run file might look like this:
#!/bin/bash -l #SBATCH --job-name={JOBNAME} #SBATCH --ntasks={NTASKS} #SBATCH --tasks-per-node={NTASKS-PER-NODE} #SBATCH --cpus-per-task={NTHREADS} #SBATCH --time={TIME} #SBATCH --output={JOBNAME}.slurmlog #SBATCH --error={JOBNAME}.slurmlog export OMP_NUM_THREADS={NTHREADS} export DIFX_MESSAGE_PORT={DIFX_MESSAGE_PORT} srun $DIFXROOT/bin/mpifxcorr {JOBNAME}.input
which espresso
will modify, using an example of v558a_1, to produce the following before launching the script:
#!/bin/bash -l #SBATCH --job-name=v558a_1 #SBATCH --ntasks=32 #SBATCH --tasks-per-node=2 #SBATCH --cpus-per-task=10 #SBATCH --time=00:12:59 #SBATCH --output=v558a_1.slurmlog #SBATCH --error=v558a_1.slurmlog export OMP_NUM_THREADS=10 export DIFX_MESSAGE_PORT=50201