User Tools

Site Tools


difx:layout:fxmanager

FxManager

This shows you the execution of the code from the FxManager's point of view, and also the memory layout of the main Visibility buffer used by the FxManager.

Buffer information

The FxManager has one buffer of consequence: the Visibility buffer. The length of this buffer can be set in vex2difx with “visBufferLength = XX” in the global settings; it appears in the .input file in the line “VIS BUFFER LENGTH” in the COMMON table at the top of the file.

The depth of the visibility buffer becomes important in jobs with very large numbers of Core processes. As can be seen in the visbuffer illustration above, the FxManager can only lock and access a finite window of time in order to accumulate subintegrations into the correct Visibility slot. This is constrained to be at most (visBufferLength - 3) Visibilities, to prevent deadlocking with the thread which writes the visibilities out. When the FxManager receives a new subintegration, it first tries to accumulate it into the correct Visibility, if the main thread already “owns” this Visibility. If the subintegration comes from a future time (relative to the most recent Visibility locked by the FxManager main thread), then the main thread will attempt to lock the following Visibility, until it 1) finds the correct Visibility to accumulate into, or 2) already owns (visBufferLength-3) Visibilities.

In the latter situation, the main thread has a dilemma. It must either throw away this latest subintegration, or abandon the oldest Visibility (which is obviously not yet completely populated with all of its subintegrations). It chooses the latter, releasing (at least) its oldest Visibility for the write thread to write out. (It is possible that the second oldest, third oldest etc Visibility were already complete, in which case releasing the oldest Visibility actually frees up 2, 3 or even more slots). When this happens, the following message will appear in the log:

Error - data was received which is too recent (scan X, YY sec + ZZZZZZZ ns)! Will force existing data to be dropped until we have caught up

Later, when the tardy subintegration finally arrives, there will be no place to put it, since the old Visibility has already been released and written out. Thus, the FxManager will simply discard it, and the following message will be seen:

Error - stale data was received from core XX regarding scan Y, time Z.ZZ seconds - it will be ignored!!!

Under what circumstances does this problem occur? If the depth of the Visibility buffer at the FxManager is too small in relation to the number of Core nodes. Subintegrations are distributed in a round robin fashion amongst Core nodes, but the time at which they arrive back at the FxManager is subject to small random variations. An example is probably useful:

If a correlation has an integration time of 1 second, and a subintegration time of 100ms, then there will be 10 subintegrations per integration. Say the visBufferLength is 32. This means that the main thread of the FxManager can lock at most 32-3 = 29 Visibilities, corresponding to 290 subintegrations. If a job is run with 100 Core nodes (each of which has a buffer with 4 subintegrations - this is a fixed value, see the core documentation) then there are 400 subintegrations sloshing around the processing nodes at any given time. If one node is a little slow, and has not yet sent a subintegration from time X seconds, and a different node is already up to time X + 30 seconds, the manager will run out of buffer space in the way described above.

Thus: if you have a job with many Core nodes, and/or few subintegrations per integration, consider increasing the size of visBufferFactor. Apart from memory requirements, there is no real downside to increasing this value.

difx/layout/fxmanager.txt · Last modified: 2011/12/09 02:47 by adamdeller