Search:

# Fitting a timing model in the presence of non-white noise

• Lead authors: G. Hobbs, W. Coles
• Contributing authors:
• Versions: v1: 13 Aug 2010

Daniel's queries are in red

## Introduction

The primary use of tempo2 is to fit a precision timing model, including pulsar, astrometric, and dispersion parameters, to observations of pulse arrival time. This is done with a weighted or an unweighted least-squares-fit of the timing model to the pre-fit timing residuals. Neither weighted nor unweighted fits provide accurate parameter estimates or reliable estimates of the parameter uncertainties when the assumption that the timing errors [this can be confused with TOA errors - use the word "residuals" instead?] are uncorrelated is violated. Unfortunately it is quite common for the pre-fit timing residuals to contain correlated (non-white) "timing noise." This is a well-known problem and pulsar observers have dealt with it by "pre-whitening" the pre-fit residuals by subtracting a high order polynomial or harmonically related sinusoids prior to carrying out the least squares fit. Coles et al. (in preparation) have shown that such techniques are not optimal and often lead to severe underestimation of the parameter errors. They also showed how to apply the optimal solution developed by Cholesky to this problem. Cholesky's solution uses a linear transformation derived from the covariance matrix of the residuals to pre-whiten both the residuals and the timing model.

The application of the Cholesky method is a straightforward generalization of the weighted least squares technique once the covariance matrix is known (see theory section). In pulsar timing analysis we seldom have a priori information on the covariance matrix of the residuals so it must usually be estimated from the observations. We have used a simple statistical model in which the residuals are assumed to be the sum of two components: (1) timing noise which is wide-sense stationary (i.e. has a covariance function and a power spectrum); and (2) sampling noise which is white (i.e. uncorrelated) but not time stationary (i.e. the error bars are variable). We assume that the sampling noise is known although the observer may wish to apply the fixData plugin to improve the estimated sampling noise. Tempo2 will accept an estimate of the covariance function of the timing noise (using the new -dcf option); combine it with the sample errors to create a combined covariance matrix; determine the pre-whitening transformation and complete the optimal least squares fit.

For pulsar observers, the main problem will be estimating the covariance function of the timing noise from the observations. However there will be cases, such as Monte Carlo simulations, in which the tempo2 user does know the covariance function of the data a priori and can simply input it with the -dcf option. Two techniques for finding the covariance function from the observations have been implemented in tempo2 and are described below. Note that these techniques provide optimal estimates of the power spectrum of the timing residuals (see tutorial on spectral estimation) and are thus of more general interest. The first case, which presently requires some manual intervention, must be used when the residuals are dominated by red noise. Some examples of such residuals are shown here Attach:red_dominated.png. In this case a power spectral estimate is obtained first and fit with a simple analytical model. The covariance is obtained by a Fourier transformation of the analytical model. The second case, which is simpler and almost automatic, can be used when the variance of the red noise is comparable with the variance of the sampling noise (or smaller). Some examples of such residuals are shown here Attach:red-white.png.In this case the covariance can be estimated directly and one only needs to make a simple analytical model of the covariance.

It is important to note that the observer need not worry about whether the estimate of the covariance matrix is adequate, it is easy to check. One simply pre-whitens the data and confirms by a covariance test and/or a spectral test, that it is sufficiently white!

## Data dominated by red noise

In such cases one cannot estimate the covariance function C(τ) = < (X(t) - <X(t)>)(X(t+τ) - <X(t)>) > because the mean <X(t)> cannot be reliably estimated. If the spectrum is approximately power law P(f) = f where the spectral exponent is steeper than -2, the power spectrum cannot be estimated directly either. Unfortunately this is often the case with young pulsars and occasionally also occurs in ms pulsars (e.g. J1939+2134). For steep power law spectra, one must pre-whiten the data with a first or second difference operation, i.e. Xpw(t) = X(t) - X(t-δt). This is easily done with equally-spaced data with negligible white noise, but pulsar observations are usually irregular and have significant and variable white noise.

To deal with steep power-law spectra we proceed iteratively. First we separate the red noise from the white noise as best we can, using a smoothing procedure to estimate the red noise and subtracting it from the original data to obtain the white noise. We then interpolate the red noise onto a regularly sampled grid and estimate its power spectrum in three ways: (1) with no pre-whitening; (2) with first difference pre-whitening; (3) with second difference pre-whitening. Interpolation tends to transfer power from high frequencies to low frequencies and overestimate the low frequencies unless the low-frequency red noise is dominant. That is why this method only applies to pulsars with strong red noise.

Tempo2 also computes least squares estimates of the spectrum of the original data and that of the high-frequency data. A comparison of these least squares spectral estimates will alert the user to a problem. If the former is much larger than the latter then red noise is dominant. If the former is not much larger than the latter then one should use the second method to estimate the covariance.

The user must select the amount of pre-whitening to use in this "pilot spectral estimation". Spectral leakage will tend to flatten the unpre-whitened spectrum to a spectral exponent of -2; it will flatten a spectrum pre-whitened with a first difference to an exponent of -4; and one with a second difference pre-whitening to an exponent of -6. One should choose the minimum pre-whitening necessary, i.e. if the methods 2 and 3 both show similar spectra and both are steeper than method 1, then one should choose method 2 (first difference pre-whitening).

The user must then assist tempo2 in fitting a spectral model to the selected spectral estimate. The model we find most generally useful has the form P(f) = A / ( 1 + (f / fc)2 )α/2. The user must enter fc, α, and the number of points to be fit, then tempo2 will find the amplitude A. When a suitable match is obtained the user must tell tempo2 to proceed. The code will then calculate the covariance matrix and the Cholesky pre-whitening matrix then use them to estimate the power spectrum and finally overplot this spectral estimate on the pilot estimate. This should be a better estimate. It should match the red spectrum at low frequencies and the high-frequency spectrum at high frequencies. It can then be used to recalculate the covariance matrix and a self-consistent solution is obtained.

The user can test the solution by examining the covariance and the spectrum of the pre-whitened data. The covariance should consist of a unit delta function at the origin plus white noise. The spectrum should be (on the average) flat. It is common to have a small low frequency excess in the power spectrum due to incomplete modeling of the fitting of period and period derivative. If it does not exceed a factor of 10 it is probably OK. If the red noise has been overestimated the low frequencies in the pre-whitened spectrum will be suppressed. This should be corrected by adjusting the spectral model.

### Step-by-step tutorial

This tutorial assumes that the arrival time data are time sorted and that the pulse times-of-arrival (TOAs) have realistic uncertainties (see the fixData plugin for help in checking the reliability of the TOA uncertainties). Different methods are used when the timing residuals exhibit a large amount of non-white noise (this part of the tutorial) or only a small amount of non-white noise (second part of the tutorial).

The data presented here for PSR J1539-5626 were obtained using the Parkes radio telescope and were processed by Meng Yu.

• Check the data quality: `tempo2 -gr plk -f J1539-5626.par J1539-5626.tim`

A small amplitude annual sinusoid can be seen which implies an incorrect determination of the pulsar's position and/or proper motion. If we wish to obtain improved estimates of the pulsar's position and proper motion we cannot simply select RA, DEC, PMRA and PMDEC and click on re-fit. If we incorrectly do this then the resulting fit implies that the proper motion in RA is -30.1(4) mas/yr and the proper motion in DEC is -506.8(6) mas/yr.

• Run the spectralModel plugin: `tempo2 -gr spectralModel -f J1539-5626.par J1539-5626.tim`

The first screen that appears when the plugin is run is shown above. In the top panel the original post-fit timing residuals are shown in white and an unweighted cubic fit to the residuals shown in red. The cubic fit is used to obtain a smooth model of the low-frequency component of the timing residuals (see algorithm details below). The smooth model (using a default smoothing time of 20 days) is shown as the green dashed line. The middle panel shows, without uncertainties, the differences between the smooth model and the actual timing residuals (known as the "high-frequency residuals"; these are the high-pass filtered residuals). In this case it is clear that the smooth model does a good job of describing the low frequency structures, but there are still some remaining higher-frequency features. The covariance function of these high-frequency residuals are shown in the bottom panel. The user then has the option of changing the smoothing timescale or continuing. For this example the smoothing time was not changed from the default 20 days.

The next plot shows power spectral density plots for the original residuals (white), high frequency residuals (light blue), the smooth low-frequency model interpolated onto a 14d regularly sampled grid (red) and the same interpolated data after first (green) and second (dark blue) pre-whitening and subsequent post-darkening. The white, yellow and orange dotted lines represent slopes of -2, -4 and -6 respectively. The user is asked to select the amount of pre-whitening required. In this case the spectra without pre-whitening and with 1st and 2nd order pre-whitening all align and therefore the user will choose not to use any prewhitening.

The spectra are re-drawn showing only the spectra of the high-frequency residuals and the low-frequency residuals with the selected pre-whitening/post-darkening. The user is then asked to input a simple model of the low-frequency spectra. Currently the form of the model is P(f) = A/(1+(f/fc)^2)^(alpha/2) where the user inputs a corner frequency (fc) and spectral exponent (alpha). A fit is carried out to the first N points in the spectrum (where N is input by the user) to determine the amplitude A. For these data values of fc = 0.2 yr^-1, alpha = 5 and N = 12 gave a reasonable fit.

The plot is redrawn with the model overlaid and the user is allowed to choose different model parameters or to continue with the current model.

As described in the Coles et al. paper and in the details of the algorithms below this model is used to construct the covariance function of the timing residuals. From this covariance function a pre-whitening matrix can be determined which, when applied to the timing residuals, results in normalised, white residuals. In the figure above the orginal timing residuals are shown in the top panel. The central panel contains the pre-whitened, normalised timing residuals obtained using the data covariance function. The bottom panel shows the covariance function of these whitened residuals. If the covariance of the whitened residuals is not pretty flat then the spectral model should be adjusted.

The spectrum of the whitened residuals will also be displayed, but this has not been implemented yet.

Finally the plugin recalculates the spectrum of the original data using only the whitening matrix. This is shown as the yellow line overlaid on the previous spectra. The model is refit to the new spectrum and the output covariance function is written out to disk (default name is covarFunc.dat).

The data covariance function can be included in the fit for the pulsar parameters as follows: `tempo2 -gr plk -f J1539-5626.par J1539-5626.tim -dcf covarFunc.dat`. This produces the plot shown above after fitting for position and proper motion. The output proper motions are PMRA = 17(9)mas/yr and PMDEC = -14(14)mas/yr.

## Data for which the red noise is comparable with the white noise

In such cases it may be possible to estimate the covariance function directly. It is much easier to estimate a covariance function of irregularly spaced data than a power spectrum of such data because each pair of samples provides an estimate of the covariance at one time lag. One can simply bin these estimates at a convenient interval and these intervals need not be equal.

It is often the case for ms pulsars that the covariance estimate obtained in this way consists of a delta function at the origin, plus a smaller component showing an exponential decay towards zero, i.e. C(tau) = A delta(tau) + B exp (-abs(tau/tscale)). In this case we simply need to find A, B, and tscale. Tempo2 will do this and ask the user to confirm that the solution appears reasonable. Finally tempo2 will compute the covariance of the prewhitened data and the user should confirm that the exponential component has disappeared.

### Step-by-step tutorial

The data presented here are of PSR J1713+0474 from the Parkes radio telescope and processed by Joris Verbiest. The "plk" plot shown below shows that the data may not be perfectly white, but there is not a large amount of red noise.

• `tempo2 -gr spectralModel -f 1713.par 1713.tim`

produces the figure shown below which has the same panels as in the example above (the original residuals, a cubic fit and the smoothed curve are shown in the top panel, the HF residuals in the middle panel and the normalised covariance of the HF residuals in the lower panel):

The smoothing time can be modified, but this is not actually used when the covariance function is estimated from the original residuals. Pressing '0' to continue produces the plot below which again is the same as in the first example.

In this figure the power spectra of the original residuals (white) is a good estimate of the power spectrum. Therefore the user inputs '-1' to obtain the covariance function from the data. An exponential fit for C(tau) is carried out and the first local minimum in the chisq is determined and indicated by a red star symbol. This clearly is a poor estimate of the minimum in the covariance function and so the user inputs a value of 120 by hand. The plot is then updated.

Using this model for the covariance function the data are normalised and whitened and plotted (middle panel). The covariance function of the white data is also shown (lower panel).

Finally the data covariance function is written to disk and can be included in a subsequent fit for the pulsar parameters:

` tempo2 -gr plk -f 1713.par 1713.tim -dcf covarFunc.dat `

## Frequently asked questions

• Why does first order prewhitening/postdarkening produce spectra with slopes steeper than -4?
• How do I know if my data are white enough?

## Theory of Linear Least Squares Fitting in Matrix Notation

The linear least squares fit can be developed as follows. The data (column) vector D is modeled as D = M*P + E, where E is the (column) error vector, P is the (column) parameter vector, and M is the model matrix. The error is a random process and we need a statistical description of it. Let E = {Ei}, then the expected value of Ei = 0 for all i.

If <EiEj> = 0 for all i not equal to j the error samples are statistically independent and we refer to the error process as white. If the variance(Ei) is the same for all samples, then the solution is simple. You minimize <EtE> which gives Pest = (Mt M)-1 Mt D. If the probability density function of E is gaussian this is a maximum likelihood estimator as well as least squares solution. One can estimate the covariance matrix of Pest using the variance of the residuals. This must be corrected for the number of parameters estimated. If the data length is nd and the parameter length is np, then σ2 = (nd/(nd-np)) var(D - M Pest). Then the covariance of the parameters is Cov(Pest) = σ2 (Mt M)-1.

If the error process is white but Var(Ei) depends on i, then one can normalize the errors by dividing the data and the model by σi. This can be expressed in terms of the diagonal matrix V which has Var(Ei) on the diagonal. We minimize <Et V-1 E> obtaining Pest = (Mt V-1 M)-1 Mt V-1 D. Here σ2 = ((D-MPest)t V-1 (D-MPest))/(nd-np) and Cov(Pest) = σ2 (Mt V-1 M)-1.

If the error process can be characterized by a covariance matrix C = <E Et> then a formal solution exists. Essentially one creates a linear transformation of the error vector Epw = T E, such that the covariance matrix of Epw is diagonal. This can always be done. The transformation is not necessarily unique. One then fits the transformed model to the transformed data. The result is equivalent to minimizing <Et C-1 E> and yields Pest = (Mt C-1 M)-1 Mt C-1 D. Here σ^2 = ((D-M'Pest)t C-1 (D-MPest))/(nd-np) and Cov(Pest) = σ2 (Mt C-1 M)-1.

The transformation can be done using the Cholesky decomposition of the covariance matrix U Ut = C. We use this transformation and refer to the transformation as prewhitening. However the transformation is somewhat more general than a simple filter. It creates a transformed error vector which is not only white, but unit variance for all samples. This can be written Dpw = U-1 D; Mpw = U-1 M; and Pest = (Mpwt Mpw)-1 Mpwt Dpw.

This is exactly the same as the previous formal solution, but can be implemented with any least squares solver, such as the SVD. The covariance matrix of the estimated parameters is also exactly the same as that of the formal solution.

Here σ2 = ((Dpw-Mpw Pest)t (Dpw-Mpw Pest))/(nd-np) and Cov(Pest) = σ2 (Mpwt Mpw)-1.

One of the attractive features of the Cholesky algorithm is that one can test that the assumed covariance matrix C is adequate by checking that the prewhitened data Dpw are actually white. It is much easier to confirm that a data vector is white, than it is to estimate its power spectrum. If the spectrum is really white, then the least squares spectral estimate is unbiassed, i.e. one need not worry about spectral leakage.

## Spectrum Analysis of Irregularly Sampled Steep Red Time Series with Variable Noise

If a time series is more or less white and regularly sampled its power spectral density can be estimated using the periodogram. Assuming that there are n points distributed over Tobs then the spectral density estimator is

Pest(fk) = |DFT(x(ti)|^2 Tobs/n^2

where fk = k/Tobs for k = 0, ... , n

This estimate is related to the true spectral density P(f) by a convolution with a window function W(f), i.e. P(fk) = P(f) * W(f). Normally one needs maximum spectral resolution and the time series is not weighted, so W(f) = sinc^2(fTobs). The zeros of W(f) are spaced by 1/Tobs, so the estimates of P(fk) are independent. If the time series is gaussian then the spectral estimates will each be chi-squared with two degrees of freedom, i.e. an exponential distribution. In most cases this will be true anyway because the central limit theorem will drive the spectral estimates in this direction.

If the time series is not regularly sampled one is led to consider the problem as a matter of least squares fitting of sine and cosine components to the data. This leads to the widely-used Lomb Scargle periodogram which provides the same statistics for nearly white time series by choosing the phase of the sine and cosine correctly. In fact the Lomb Scargle periodogram is not necessary, one can simply fit a constant, a sine and a cosine component of each frequency estimated separately. The sum of the sine^2 and cosine^2 will have an exponential distribution regardless of the phase chosen. However one cannot fit all the sine and cosine components simultaneously because they are not statistically independent when the sampling is irregular. This is more computationally convenient since a standard LSQ fitter can be used. This approach leads one to consider a weighted fit when the errors on the data are not constant. In fact this is the optimal approach under such conditions, and it is true that the distribution of the spectral estimates remains exponential with a weighted fit. Indeed a weighted fit should be used when the errors are variable even if the sampling is regular. It will provide a better signal to noise ratio in the spectrum, but it will make the spectral estimates partially dependent. In practice one should replace the Lomb Scargle analysis with a weighted least squares analysis.

However, if the time series contains power at frequencies besides fk the spectral window will cause these to leak into adjacent frequencies. This is shown in Figure 1. Here there are two 256-point regularly sampled time series in the two left panels. The blue one is exactly a harmonic of Tobs and the red one is not. The blue spectra exactly recover the original power spectral density, but the red ones show the window function. At low frequencies the effect is particularly obvious on a log-log plot where the sidelobes of W(f) fall as f^-2. One will also see that the red spectra are smoothed. In fact the spectral estimates should be chi-squared random variables with two degrees of freedom, but the red ones are dominated by leakage from the peak point and they don't show any random variation until they fall to the white noise level.

If the spectral density is a power-law with an exponent flatter than -2, the low frequency leakage will not be a problem, but if the exponent is steeper than -2, the spectral estimate will be dominated by leakage. This is shown in Figure 2. Here the upper left plot displays a random process with a spectral exponent of -1 and the lower left shows one of -3. The blue spectra on the right side are the corresponding periodograms. The red spectra are prewhitened and postdarkened spectra. On the upper right both are the same, this series does not need prewhitening, but on the lower right only the red prewhitened spectrum is correct.

Here the prewhitening filter was a first difference, i.e. y(t(k)) = x(t(k)) - x(t(k-1)). The corresponding postdarkening filter is H(f) = 1/2 sin(pi f/n) and the DFT of y(t) must be multiplied by |H(f)|^2. This simple process works well and is widely used. If the spectral exponent is steeper than -4 then one can use the difference filter twice and correct with |H(f)|^4. In pulsar timing analysis is that the data are seldom regularly sampled. Furthermore the data are not accompanied by stationary white noise, but by sampling noise with variable standard deviation. So it is not clear how to create an optimal prewhitening filter like the first difference filter used for regularly sampled data. Fortunately one can solve this problem using the Cholesky prewhitening transformation. This is not a filter, but it can be applied to both the data and the sine and cosine components, so one can obtain a prewhitened spectral estimate.

The spectrum analysis of pulsar timing data is further complicated by the fact that a best-fit quadratic is always removed from the timing observations because the period and its rate of change are always unknown a priori. This significantly distorts the first few frequency components in the power spectrum. As a result there is a small uncertainty in the lowest frequency components of the covariance from which the prewhitening transformation is derived. The Cholesky spectral analysis has been tested by simulating a power-law spectrum with additive white noise, subtracting a quadratic, and averaging 100 estimates of the power spectrum for comparison with the known spectral densities of the inputs. The first case tested has regular sampling and equal errors. An example of the input after quadratic removal, and the same input after prewhitening is shown in Figure 3. The mean spectral estimate is shown in Figure 4. Here the least squares estimate is in red and the Cholesky estimate in blue. The theoretical spectrum added to the actual spectrum of the white noise is shown in green. It is clear that the Cholesky method is much better but it does overestimate the lowest frequency point by a factor of two. It shows very little leakage into higher frequencies. The overestimate of the lowest frequencies is related to the imperfect compensation for quadratic removal. If, instead of removing a quadratic, we remove a cubic, the blue spectral estimate exactly matches the green theoretical spectrum. The covariance matrix of the first 50 spectral estimates is shown in Figure 5. One can see that the first two are slightly correlated, due to the quadratic removal, but the rest are uncorrelated.

Exactly the same simulation was repeated with irregular sampling. The sampling was chosen to be exactly the same as that of the PPTA pulsar J1713+0747. The results are shown in Figures 6, 7 and 8 in the same format as Figures 3, 4, and 5. Perhaps the most interesting result is that the red least squares spectrum in Figure 7 has much higher far-sidelobes. These sidelobes do not average out with the simulations because they are due to the sampling which stays the same. Evidently the Cholesky prewhitening flattens the spectrum quite well, but the entire spectrum is 20 or 30% higher than expected.

The same simulation was then repeated with regular sampling but variable errors. The errors were chosen to be exactly those of the PPTA pulsar J1713+0747. The results, which are shown in Figures 9, 10, and 11, are quite similar to the effects of irregular sampling. The irregular sampling and the variable errors create pseudo-random window sidelobes (which interferometrists would call a "dirty beam").

Finally the simulation was repeated with both irregular sampling and variable errors taken from J1713+0747. The results are shown in Figure 12, 13, and 14. The spectral bias seems to be the sum of the two effects seen separately. The net effect is that the spectral estimate is about twice as high as it should be throughout. The spectral estimates also show a little more cross correlation than in the previous cases. Clearly both irregular sampling and variable errors do stress the spectral estimation algorithm. However the red spectrum we have tested is very steep and the power spectra have 6 decades of dynamic range. We tested the same case, but with the red noise 100 times weaker. This did not change the "dirty beam" at all, it simply lowered the red component but the spectral estimate remained about a factor of two above the expected value. Evidently this bias is more due to the effective spectral window than the amplitude of the red noise.

Finally we have estimated the average power spectrum of the prewhitened residuals for the final case with irregular sampling and variable errors. The result is shown in Figure 15. The spectrum is quite flat, but it is depressed about 25% in the range where the red noise is dominant. This suggests that we might be overcorrecting slightly. More experimentation is called for here.

## Publications using/discussing this technique

Page last modified on August 20, 2010, at 04:41 PM