User Tools

Site Tools


difx:notes_on_optimizing_the_machine_file

OpenMPI

The most simple form of an OpenMPI machine file consists of one line per machine:

node01
node02
node03

this would tell MPI to start processes on machines node01, node02, node03.
Additionally one can tell MPI the number of cores it should use on each of the machines by supplying the slots parameter:

node01 slots=2
node02 slots=4
node03 slots=6

When issuing mpirun -np 12 –byslot it would spawn 2 processes on node01, 4 processes on node02 and 6 processes on node03 which is probably what you want ( the –byslot option is default to OpenMPI so you don't need to explicitely specify it).
However, when running with the –bynode option things look differently:
MPI would start 1 process on node01, 1 on node02, 1 on node03 and then wrap around, start 1 process on node01, 1 on node02 a so forth. So eventually we would start 4 processes on node01 (overbooking it), 4 on node02 (which is OK) and 4 on node03 (underbooking it).
To prevent overbooking/underbooking of particular machines when using the –bynode option one can supply the max_slots parameter:

node01 slots=2 max_slots=2
node02 slots=4 max_slots=4
node03 slots=6 max_slots=6

so now mpirun -np 12 –byslot would spawn a total of 2 processes on node01, 4 processes on node02 and 6 processes on node03. If the number of processes requested with the -np option exceeds the sum of all max_slots mpirun will abort with an error.

MPICH

to be done

difx/notes_on_optimizing_the_machine_file.txt · Last modified: 2015/10/21 10:08 (external edit)