The most simple form of an OpenMPI machine file consists of one line per machine:
node01 node02 node03
this would tell MPI to start processes on machines node01, node02, node03.
Additionally one can tell MPI the number of cores it should use on each of the machines
by supplying the slots parameter:
node01 slots=2 node02 slots=4 node03 slots=6
When issuing mpirun -np 12 –byslot
it would spawn 2 processes on node01, 4 processes on node02 and 6 processes on node03 which
is probably what you want ( the –byslot option is default to OpenMPI so you don't need to explicitely specify it).
However, when running with the –bynode
option things look differently:
MPI would start 1 process on node01, 1 on node02, 1 on node03 and then wrap around, start 1 process on node01, 1 on node02 a so forth. So eventually
we would start 4 processes on node01 (overbooking it), 4 on node02 (which is OK) and 4 on node03 (underbooking it).
To prevent overbooking/underbooking of particular machines when using the –bynode option one can supply the max_slots parameter:
node01 slots=2 max_slots=2 node02 slots=4 max_slots=4 node03 slots=6 max_slots=6
so now mpirun -np 12 –byslot
would spawn a total of 2 processes on node01, 4 processes on node02 and 6 processes on node03. If the
number of processes requested with the -np
option exceeds the sum of all max_slots mpirun will abort with an error.
to be done