difx:difxdft

Previously mpifxcorr utilised FFT(Fast Fourier Transform) functions from Intel's IPP(Intel Performance Primitives) libraries however this restricted the user to setting the channels to a power of two. However the dft functions (which allow arbitrary-length FFTs) are highly optimised (see "Short benchmark of Multi-threaded FFTW vs Intel IPP" for a comparison with IPP FFTs and fftw). For the specific case of DiFX there is little difference in the time taken to perform a DFT compared with an FFT, particularly for powers of 10 and presumably for powers of 3, 5 etc (see below). The trunk version of mpifxcorr now supports the calling of the dft functions when nFFTChan is set to a number which is not a power of two. The behaviour of DiFX should be entirely unchanged from previous versions if nFFTChan is set to a power of 2 (the fft routines are used as before).

The default values of stride length (strideLength) and xmac stride length (xmacLength) are based on the assumption that the number of channels (nFFTChan) is a power of 2. At the moment there is no automated feature to set these values based on the number of channels. Thus they have to be set manually in the SETUP default section of the v2d file. However they must all satisfy the constraint that they are factors of the nFFTChan. Setting all four of the variables below is advised for now.

eg

SETUP default { nFFTChan=503 nChan=503 strideLength=503 xmacLength=503 } ...

[NB DiFX-2.0 terminology used in this section]. For performance a number of tests were performed using vlbi_fake to simulate 6 data streams. These tests used nChan's of 16, 500, 503, 512, 1000, 1024. Since vlbi_fake sends data limited to the speed mpifxcorr can process it, thus can be used to benchmark mpifxcorr's performance for different nChans and specAvg. The data can be organised into three sets, first where there are no real comparable features, next there was no averaging (specAvg = 1), and finally where specAvg was set such that the number of output channels was comparable.

**Set 1**

nChan | specAvg | nChanOut | Mean data transfer rate |
---|---|---|---|

16 | 16 | 1 | 442.39 Mb/s |

500 | 10 | 50 | 278.52 Mb/s |

512 | 16 | 32 | 277.71 Mb/s |

1000 | 20 | 20 | 279.19 Mb/s |

1024 | 16 | 64 | 237.46 Mb/s |

**Set 2**

nChan | specAvg | nChanOut | Mean data transfer rate |
---|---|---|---|

16 | 1 | 16 | 438.25 Mb/s |

500 | 1 | 500 | 277.84 Mb/s |

503 | 1 | 503 | 214.82 Mb/s |

512 | 1 | 512 | 276.31 Mb/s |

1000 | 1 | 1000 | 234.61 Mb/s |

1024 | 1 | 1024 | 235.05 Mb/s |

**Set 3**

nChan | specAvg | nChanOut | Mean data transfer rate |
---|---|---|---|

16 | 1 | 16 | 438.25 Mb/s |

500 | 25 | 20 | 281.63 Mb/s |

512 | 32 | 16 | 279.21 Mb/s |

1000 | 50 | 20 | 239.19 Mb/s |

1024 | 64 | 16 | 238.87 Mb/s |

It was not surprising that the prime number was processed the slowest though it was expected that there would be a much sharper drop in performance than was observed. The results also indicate (at least for this many datastreams) that the processing is dominated by the Fourier transforms. It also show that there is no discernible loss in performance between using a DFT or an FFT.

A basic test of accuracy was carried out by comparing the output visibilities from datasets generated different FFT sizes after averaging each down to 64 channels within mpifxcorr (only the central 32 were used for the analysis). After averaging down to the same number of channels, the plots of comparing the phase for 512 vs. 1024 channels are comparable with those for 512 vs. 640 channels. Scatter seems to increase somewhat. Amplitude is not shown here but they agree to ~1%.

nFFTChan | nChan | Channels averaged together |
---|---|---|

512 | 64 | 8 |

1024 | 64 | 16 |

640 | 64 | 10 |

difx/difxdft.txt · Last modified: 2015/10/21 10:08 (external edit)