Main /
## T2psrparms## Fitting a timing model in the presence of non-white noise**Lead authors:**G. Hobbs, W. Coles**Contributing authors:****Versions:**v1: 13 Aug 2010
Jump to: red dominated tutorial | red comparable with white tutorial | Linear Least Squares Theory | Spectrum Analysis Daniel's queries are in red ## IntroductionThe primary use of tempo2 is to fit a precision timing model, including pulsar, astrometric, and dispersion parameters, to observations of pulse arrival time. This is done with a weighted or an unweighted least-squares-fit of the timing model to the pre-fit timing residuals. Neither weighted nor unweighted fits provide accurate parameter estimates or reliable estimates of the parameter uncertainties when the assumption that the timing errors [this can be confused with TOA errors - use the word "residuals" instead?] are uncorrelated is violated. Unfortunately it is quite common for the pre-fit timing residuals to contain correlated (non-white) "timing noise." This is a well-known problem and pulsar observers have dealt with it by "pre-whitening" the pre-fit residuals by subtracting a high order polynomial or harmonically related sinusoids prior to carrying out the least squares fit. Coles et al. (in preparation) have shown that such techniques are not optimal and often lead to severe underestimation of the parameter errors. They also showed how to apply the optimal solution developed by Cholesky to this problem. Cholesky's solution uses a linear transformation derived from the covariance matrix of the residuals to pre-whiten both the residuals and the timing model. The application of the Cholesky method is a straightforward generalization of the weighted least squares technique once the covariance matrix is known (see theory section). In pulsar timing analysis we seldom have For pulsar observers, the main problem will be estimating the covariance function of the timing noise from the observations. However there will be cases, such as Monte Carlo simulations, in which the tempo2 user does know the covariance function of the data It is important to note that the observer need not worry about whether the estimate of the covariance matrix is adequate, it is easy to check. One simply pre-whitens the data and confirms by a covariance test and/or a spectral test, that it is sufficiently white! ## Data dominated by red noiseIn such cases one cannot estimate the covariance function C(τ) = < (X(t) - <X(t)>)(X(t+τ) - <X(t)>) > because the mean <X(t)> cannot be reliably estimated. If the spectrum is approximately power law P(f) = f To deal with steep power-law spectra we proceed iteratively. First we separate the red noise from the white noise as best we can, using a smoothing procedure to estimate the red noise and subtracting it from the original data to obtain the white noise. We then interpolate the red noise onto a regularly sampled grid and estimate its power spectrum in three ways: (1) with no pre-whitening; (2) with first difference pre-whitening; (3) with second difference pre-whitening. Interpolation tends to transfer power from high frequencies to low frequencies and overestimate the low frequencies unless the low-frequency red noise is dominant. That is why this method only applies to pulsars with strong red noise. Tempo2 also computes least squares estimates of the spectrum of the original data and that of the high-frequency data. A comparison of these least squares spectral estimates will alert the user to a problem. If the former is much larger than the latter then red noise is dominant. If the former is not much larger than the latter then one should use the second method to estimate the covariance. The user must select the amount of pre-whitening to use in this "pilot spectral estimation". Spectral leakage will tend to flatten the unpre-whitened spectrum to a spectral exponent of -2; it will flatten a spectrum pre-whitened with a first difference to an exponent of -4; and one with a second difference pre-whitening to an exponent of -6. One should choose the minimum pre-whitening necessary, i.e. if the methods 2 and 3 both show similar spectra and both are steeper than method 1, then one should choose method 2 (first difference pre-whitening). The user must then assist tempo2 in fitting a spectral model to the selected spectral estimate. The model we find most generally useful has the form P(f) = A / ( 1 + (f / f The user can test the solution by examining the covariance and the spectrum of the pre-whitened data. The covariance should consist of a unit delta function at the origin plus white noise. The spectrum should be (on the average) flat. It is common to have a small low frequency excess in the power spectrum due to incomplete modeling of the fitting of period and period derivative. If it does not exceed a factor of 10 it is probably OK. If the red noise has been overestimated the low frequencies in the pre-whitened spectrum will be suppressed. This should be corrected by adjusting the spectral model. ## Step-by-step tutorialThis tutorial assumes that the arrival time data are time sorted and that the pulse times-of-arrival (TOAs) have realistic uncertainties (see the fixData plugin for help in checking the reliability of the TOA uncertainties). Different methods are used when the timing residuals exhibit a large amount of non-white noise (this part of the tutorial) or only a small amount of non-white noise (second part of the tutorial). The data presented here for PSR J1539-5626 were obtained using the Parkes radio telescope and were processed by Meng Yu. - Check the data quality:
`tempo2 -gr plk -f J1539-5626.par J1539-5626.tim`
A small amplitude annual sinusoid can be seen which implies an incorrect determination of the pulsar's position and/or proper motion. If we wish to obtain improved estimates of the pulsar's position and proper motion we cannot simply select RA, DEC, PMRA and PMDEC and click on re-fit. If we incorrectly do this then the resulting fit implies that the proper motion in RA is -30.1(4) mas/yr and the proper motion in DEC is -506.8(6) mas/yr. - Run the spectralModel plugin:
`tempo2 -gr spectralModel -f J1539-5626.par J1539-5626.tim`
The first screen that appears when the plugin is run is shown above. In the top panel the original post-fit timing residuals are shown in white and an unweighted cubic fit to the residuals shown in red. The cubic fit is used to obtain a smooth model of the low-frequency component of the timing residuals (see algorithm details below). The smooth model (using a default smoothing time of 20 days) is shown as the green dashed line. The middle panel shows, without uncertainties, the differences between the smooth model and the actual timing residuals (known as the "high-frequency residuals"; these are the high-pass filtered residuals). In this case it is clear that the smooth model does a good job of describing the low frequency structures, but there are still some remaining higher-frequency features. The covariance function of these high-frequency residuals are shown in the bottom panel. The user then has the option of changing the smoothing timescale or continuing. For this example the smoothing time was not changed from the default 20 days. The next plot shows power spectral density plots for the original residuals (white), high frequency residuals (light blue), the smooth low-frequency model interpolated onto a 14d regularly sampled grid (red) and the same interpolated data after first (green) and second (dark blue) pre-whitening and subsequent post-darkening. The white, yellow and orange dotted lines represent slopes of -2, -4 and -6 respectively. The user is asked to select the amount of pre-whitening required. In this case the spectra without pre-whitening and with 1st and 2nd order pre-whitening all align and therefore the user will choose not to use any prewhitening. The spectra are re-drawn showing only the spectra of the high-frequency residuals and the low-frequency residuals with the selected pre-whitening/post-darkening. The user is then asked to input a simple model of the low-frequency spectra. Currently the form of the model is P(f) = A/(1+(f/fc)^2)^(alpha/2) where the user inputs a corner frequency (fc) and spectral exponent (alpha). A fit is carried out to the first N points in the spectrum (where N is input by the user) to determine the amplitude A. For these data values of fc = 0.2 yr^-1, alpha = 5 and N = 12 gave a reasonable fit. The plot is redrawn with the model overlaid and the user is allowed to choose different model parameters or to continue with the current model. As described in the Coles et al. paper and in the details of the algorithms below this model is used to construct the covariance function of the timing residuals. From this covariance function a pre-whitening matrix can be determined which, when applied to the timing residuals, results in normalised, white residuals. In the figure above the orginal timing residuals are shown in the top panel. The central panel contains the pre-whitened, normalised timing residuals obtained using the data covariance function. The bottom panel shows the covariance function of these whitened residuals. If the covariance of the whitened residuals is not pretty flat then the spectral model should be adjusted. The spectrum of the whitened residuals will also be displayed, but this has not been implemented yet. Finally the plugin recalculates the spectrum of the original data using only the whitening matrix. This is shown as the yellow line overlaid on the previous spectra. The model is refit to the new spectrum and the output covariance function is written out to disk (default name is covarFunc.dat). The data covariance function can be included in the fit for the pulsar parameters as follows:
## Data for which the red noise is comparable with the white noiseIn such cases it may be possible to estimate the covariance function directly. It is much easier to estimate a covariance function of irregularly spaced data than a power spectrum of such data because each pair of samples provides an estimate of the covariance at one time lag. One can simply bin these estimates at a convenient interval and these intervals need not be equal. It is often the case for ms pulsars that the covariance estimate obtained in this way consists of a delta function at the origin, plus a smaller component showing an exponential decay towards zero, i.e. C(tau) = A delta(tau) + B exp (-abs(tau/tscale)). In this case we simply need to find A, B, and tscale. Tempo2 will do this and ask the user to confirm that the solution appears reasonable. Finally tempo2 will compute the covariance of the prewhitened data and the user should confirm that the exponential component has disappeared. ## Step-by-step tutorialThe data presented here are of PSR J1713+0474 from the Parkes radio telescope and processed by Joris Verbiest. The "plk" plot shown below shows that the data may not be perfectly white, but there is not a large amount of red noise. `tempo2 -gr spectralModel -f 1713.par 1713.tim`
produces the figure shown below which has the same panels as in the example above (the original residuals, a cubic fit and the smoothed curve are shown in the top panel, the HF residuals in the middle panel and the normalised covariance of the HF residuals in the lower panel): The smoothing time can be modified, but this is not actually used when the covariance function is estimated from the original residuals. Pressing '0' to continue produces the plot below which again is the same as in the first example. In this figure the power spectra of the original residuals (white) is a good estimate of the power spectrum. Therefore the user inputs '-1' to obtain the covariance function from the data. An exponential fit for C(tau) is carried out and the first local minimum in the chisq is determined and indicated by a red star symbol. This clearly is a poor estimate of the minimum in the covariance function and so the user inputs a value of 120 by hand. The plot is then updated. Using this model for the covariance function the data are normalised and whitened and plotted (middle panel). The covariance function of the white data is also shown (lower panel). Finally the data covariance function is written to disk and can be included in a subsequent fit for the pulsar parameters:
## Frequently asked questions**Why does first order prewhitening/postdarkening produce spectra with slopes steeper than -4?****How do I know if my data are white enough?**
## Theory of Linear Least Squares Fitting in Matrix NotationThe linear least squares fit can be developed as follows. The data (column) vector D is modeled as D = M*P + E, where E is the (column) error vector, P is the (column) parameter vector, and M is the model matrix. The error is a random process and we need a statistical description of it. Let E = {E If <E If the error process is white but Var(E If the error process can be characterized by a covariance matrix C = <E E The transformation can be done using the Cholesky decomposition of the covariance matrix U U This is exactly the same as the previous formal solution, but can be implemented with any least squares solver, such as the SVD. The covariance matrix of the estimated parameters is also exactly the same as that of the formal solution. Here σ One of the attractive features of the Cholesky algorithm is that one can test that the assumed covariance matrix C is adequate by checking that the prewhitened data Dpw are actually white. It is much easier to confirm that a data vector is white, than it is to estimate its power spectrum. If the spectrum is really white, then the least squares spectral estimate is unbiassed, i.e. one need not worry about spectral leakage. ## Spectrum Analysis of Irregularly Sampled Steep Red Time Series with Variable NoiseIf a time series is more or less white and regularly sampled its power spectral density can be estimated using the periodogram. Assuming that there are n points distributed over Tobs then the spectral density estimator is Pest(fk) = |DFT(x(ti)|^2 Tobs/n^2 where fk = k/Tobs for k = 0, ... , n This estimate is related to the true spectral density P(f) by a convolution with a window function W(f), i.e. P(fk) = P(f) * W(f). Normally one needs maximum spectral resolution and the time series is not weighted, so W(f) = sinc^2(fTobs). The zeros of W(f) are spaced by 1/Tobs, so the estimates of P(fk) are independent. If the time series is gaussian then the spectral estimates will each be chi-squared with two degrees of freedom, i.e. an exponential distribution. In most cases this will be true anyway because the central limit theorem will drive the spectral estimates in this direction. If the time series is not regularly sampled one is led to consider the problem as a matter of least squares fitting of sine and cosine components to the data. This leads to the widely-used Lomb Scargle periodogram which provides the same statistics for nearly white time series by choosing the phase of the sine and cosine correctly. In fact the Lomb Scargle periodogram is not necessary, one can simply fit a constant, a sine and a cosine component of each frequency estimated separately. The sum of the sine^2 and cosine^2 will have an exponential distribution regardless of the phase chosen. However one cannot fit all the sine and cosine components simultaneously because they are not statistically independent when the sampling is irregular. This is more computationally convenient since a standard LSQ fitter can be used. This approach leads one to consider a weighted fit when the errors on the data are not constant. In fact this is the optimal approach under such conditions, and it is true that the distribution of the spectral estimates remains exponential with a weighted fit. Indeed a weighted fit should be used when the errors are variable even if the sampling is regular. It will provide a better signal to noise ratio in the spectrum, but it will make the spectral estimates partially dependent. In practice one should replace the Lomb Scargle analysis with a weighted least squares analysis. However, if the time series contains power at frequencies besides fk the spectral window will cause these to leak into adjacent frequencies. This is shown in Figure 1. Here there are two 256-point regularly sampled time series in the two left panels. The blue one is exactly a harmonic of Tobs and the red one is not. The blue spectra exactly recover the original power spectral density, but the red ones show the window function. At low frequencies the effect is particularly obvious on a log-log plot where the sidelobes of W(f) fall as f^-2. One will also see that the red spectra are smoothed. In fact the spectral estimates should be chi-squared random variables with two degrees of freedom, but the red ones are dominated by leakage from the peak point and they don't show any random variation until they fall to the white noise level. If the spectral density is a power-law with an exponent flatter than -2, the low frequency leakage will not be a problem, but if the exponent is steeper than -2, the spectral estimate will be dominated by leakage. This is shown in Figure 2. Here the upper left plot displays a random process with a spectral exponent of -1 and the lower left shows one of -3. The blue spectra on the right side are the corresponding periodograms. The red spectra are prewhitened and postdarkened spectra. On the upper right both are the same, this series does not need prewhitening, but on the lower right only the red prewhitened spectrum is correct. Here the prewhitening filter was a first difference, i.e. y(t(k)) = x(t(k)) - x(t(k-1)). The corresponding postdarkening filter is H(f) = 1/2 sin(pi f/n) and the DFT of y(t) must be multiplied by |H(f)|^2. This simple process works well and is widely used. If the spectral exponent is steeper than -4 then one can use the difference filter twice and correct with |H(f)|^4. In pulsar timing analysis is that the data are seldom regularly sampled. Furthermore the data are not accompanied by stationary white noise, but by sampling noise with variable standard deviation. So it is not clear how to create an optimal prewhitening filter like the first difference filter used for regularly sampled data. Fortunately one can solve this problem using the Cholesky prewhitening transformation. This is not a filter, but it can be applied to both the data and the sine and cosine components, so one can obtain a prewhitened spectral estimate. The spectrum analysis of pulsar timing data is further complicated by the fact that a best-fit quadratic is always removed from the timing observations because the period and its rate of change are always unknown a priori. This significantly distorts the first few frequency components in the power spectrum. As a result there is a small uncertainty in the lowest frequency components of the covariance from which the prewhitening transformation is derived. The Cholesky spectral analysis has been tested by simulating a power-law spectrum with additive white noise, subtracting a quadratic, and averaging 100 estimates of the power spectrum for comparison with the known spectral densities of the inputs. The first case tested has regular sampling and equal errors. An example of the input after quadratic removal, and the same input after prewhitening is shown in Figure 3. The mean spectral estimate is shown in Figure 4. Here the least squares estimate is in red and the Cholesky estimate in blue. The theoretical spectrum added to the actual spectrum of the white noise is shown in green. It is clear that the Cholesky method is much better but it does overestimate the lowest frequency point by a factor of two. It shows very little leakage into higher frequencies. The overestimate of the lowest frequencies is related to the imperfect compensation for quadratic removal. If, instead of removing a quadratic, we remove a cubic, the blue spectral estimate exactly matches the green theoretical spectrum. The covariance matrix of the first 50 spectral estimates is shown in Figure 5. One can see that the first two are slightly correlated, due to the quadratic removal, but the rest are uncorrelated. Exactly the same simulation was repeated with irregular sampling. The sampling was chosen to be exactly the same as that of the PPTA pulsar J1713+0747. The results are shown in Figures 6, 7 and 8 in the same format as Figures 3, 4, and 5. Perhaps the most interesting result is that the red least squares spectrum in Figure 7 has much higher far-sidelobes. These sidelobes do not average out with the simulations because they are due to the sampling which stays the same. Evidently the Cholesky prewhitening flattens the spectrum quite well, but the entire spectrum is 20 or 30% higher than expected. The same simulation was then repeated with regular sampling but variable errors. The errors were chosen to be exactly those of the PPTA pulsar J1713+0747. The results, which are shown in Figures 9, 10, and 11, are quite similar to the effects of irregular sampling. The irregular sampling and the variable errors create pseudo-random window sidelobes (which interferometrists would call a "dirty beam"). Finally the simulation was repeated with both irregular sampling and variable errors taken from J1713+0747. The results are shown in Figure 12, 13, and 14. The spectral bias seems to be the sum of the two effects seen separately. The net effect is that the spectral estimate is about twice as high as it should be throughout. The spectral estimates also show a little more cross correlation than in the previous cases. Clearly both irregular sampling and variable errors do stress the spectral estimation algorithm. However the red spectrum we have tested is very steep and the power spectra have 6 decades of dynamic range. We tested the same case, but with the red noise 100 times weaker. This did not change the "dirty beam" at all, it simply lowered the red component but the spectral estimate remained about a factor of two above the expected value. Evidently this bias is more due to the effective spectral window than the amplitude of the red noise. Finally we have estimated the average power spectrum of the prewhitened residuals for the final case with irregular sampling and variable errors. The result is shown in Figure 15. The spectrum is quite flat, but it is depressed about 25% in the range where the red noise is dominant. This suggests that we might be overcorrecting slightly. More experimentation is called for here. ## Known issues/bugs## Details of algorithms## Publications using/discussing this technique |

Page last modified on August 20, 2010, at 04:41 PM