A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013

Similar documents
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Signal Modeling, Statistical Inference and Data Mining in Astrophysics

Metropolis Algorithm

A523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. False Positives in Fourier Spectra. For N = DFT length: Lecture 5 Reading

Stochastic Processes. A stochastic process is a function of two variables:

7 The Waveform Channel

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

System Identification & Parameter Estimation

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

F & B Approaches to a simple model

Communication Systems Lecture 21, 22. Dong In Kim School of Information & Comm. Eng. Sungkyunkwan University

Utility of Correlation Functions

OSE801 Engineering System Identification. Lecture 05: Fourier Analysis

Chapter 2: Problem Solutions

ENSC327 Communications Systems 2: Fourier Representations. Jie Liang School of Engineering Science Simon Fraser University

Correlator I. Basics. Chapter Introduction. 8.2 Digitization Sampling. D. Anish Roshi

13. Power Spectrum. For a deterministic signal x(t), the spectrum is well defined: If represents its Fourier transform, i.e., if.

Time Series Analysis: 4. Linear filters. P. F. Góra

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

The Hilbert Transform

ESS Finite Impulse Response Filters and the Z-transform

Time Series Analysis: 4. Digital Linear Filters. P. F. Góra

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

The Hilbert Transform

Notes on Fourier Analysis

A6523 Linear, Shift-invariant Systems and Fourier Transforms

PART 1. Review of DSP. f (t)e iωt dt. F(ω) = f (t) = 1 2π. F(ω)e iωt dω. f (t) F (ω) The Fourier Transform. Fourier Transform.

EE401: Advanced Communication Theory

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

Sensors. Chapter Signal Conditioning

Discrete-Time Signals and Systems. Efficient Computation of the DFT: FFT Algorithms. Analog-to-Digital Conversion. Sampling Process.

Organization. I MCMC discussion. I project talks. I Lecture.

Random Processes Handout IV

2A1H Time-Frequency Analysis II

A SuperDARN Tutorial. Kile Baker

Digital Baseband Systems. Reference: Digital Communications John G. Proakis

LINEAR SYSTEMS. J. Elder PSYC 6256 Principles of Neural Coding

Problem Sheet 1 Examples of Random Processes

EE 435. Lecture 30. Data Converters. Spectral Performance

A=randn(500,100); mu=mean(a); sigma_a=std(a); std_a=sigma_a/sqrt(500); [std(mu) mean(std_a)] % compare standard deviation of means % vs standard error

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

EE 574 Detection and Estimation Theory Lecture Presentation 8

Fourier Analysis Linear transformations and lters. 3. Fourier Analysis. Alex Sheremet. April 11, 2007

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Frequentist-Bayesian Model Comparisons: A Simple Example

Tutorial Sheet #2 discrete vs. continuous functions, periodicity, sampling

Experimental Fourier Transforms

EE 435. Lecture 28. Data Converters Linearity INL/DNL Spectral Performance

Fourier Analysis and Sampling

EE303: Communication Systems

IB Paper 6: Signal and Data Analysis

SEISMIC WAVE PROPAGATION. Lecture 2: Fourier Analysis

Laplace Transform Part 1: Introduction (I&N Chap 13)

Sound & Vibration Magazine March, Fundamentals of the Discrete Fourier Transform

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Signals and Systems. Lecture 14 DR TANIA STATHAKI READER (ASSOCIATE PROFESSOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE LONDON

OSE801 Engineering System Identification. Lecture 09: Computing Impulse and Frequency Response Functions

Correlation, discrete Fourier transforms and the power spectral density

Statistical signal processing

A523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Functions not limited in time

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

IV. Covariance Analysis

Chapter 5 Random Variables and Processes

Fourier Analysis and Power Spectral Density

13.7 Power Spectrum Estimation by the Maximum Entropy (All Poles) Method

Sampling. Alejandro Ribeiro. February 8, 2018

The Sampling Theorem for Bandlimited Functions

Module 3 : Sampling and Reconstruction Lecture 22 : Sampling and Reconstruction of Band-Limited Signals

BME 50500: Image and Signal Processing in Biomedicine. Lecture 2: Discrete Fourier Transform CCNY

Time and Spatial Series and Transforms

Periodogram of a sinusoid + spike Single high value is sum of cosine curves all in phase at time t 0 :

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013 Lecture 4 For next week be sure to have read Chapter 5 of

Lecture Hilbert-Huang Transform. An examination of Fourier Analysis. Existing non-stationary data handling method

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013

Module 4. Signal Representation and Baseband Processing. Version 2 ECE IIT, Kharagpur

Analysis and Design of Analog Integrated Circuits Lecture 14. Noise Spectral Analysis for Circuit Elements

Ch. 15 Wavelet-Based Compression

GATE EE Topic wise Questions SIGNALS & SYSTEMS

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

E2.5 Signals & Linear Systems. Tutorial Sheet 1 Introduction to Signals & Systems (Lectures 1 & 2)

EE 230 Lecture 43. Data Converters

Fundamentals of the Discrete Fourier Transform

UCSD ECE 153 Handout #46 Prof. Young-Han Kim Thursday, June 5, Solutions to Homework Set #8 (Prepared by TA Fatemeh Arbabjolfaei)

Es e j4φ +4N n. 16 KE s /N 0. σ 2ˆφ4 1 γ s. p(φ e )= exp 1 ( 2πσ φ b cos N 2 φ e 0

Massachusetts Institute of Technology

Review of Fourier Transform

FOURIER TRANSFORMS. f n = 1 N. ie. cyclic. dw = F k 2 = N 1. See Brault & White 1971, A&A, , for a tutorial introduction.

Laplace Transforms Chapter 3

w n = c k v n k (1.226) w n = c k v n k + d k w n k (1.227) Clearly non-recursive filters are a special case of recursive filters where M=0.

DESIGN OF CMOS ANALOG INTEGRATED CIRCUITS

Various signal sampling and reconstruction methods

VIII. Coherence and Transfer Function Applications A. Coherence Function Estimates

Communication Theory II

EE 16B Final, December 13, Name: SID #:

State Space Representation of Gaussian Processes

Generation of bandlimited sync transitions for sine waveforms

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Transcription:

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013 Lecture 26 Localization/Matched Filtering (continued) Prewhitening Lectures next week: Reading Bases, principal component analysis, wavelets, Bayesian blocks. Article on Bayesian Blocks (Scargle) Article on wavelets

ROC Curves: A plot of P d vs. P fa is called a receiver operating characteristics curve, named after radar detection schemes. Figure 5: ROC plot for a pulse with SNR 5 (peak to rms noise) and width of 10.3 samples. Matched filtering is used. 16

Localization Using Matched Filtering This handout describes localization of an object in a parameter space. For simplicity we consider localization of a pulse in time. The same formalism applies to localization of a spectral feature in frequency or to an image feature in a 2D image. The results can be extrapolated to a space of arbitrary dimensionality. I. First consider finding the amplitude of a pulse when the shape and location are known. Let the data be I(t) =aa(t)+n(t), where a = the unknown amplitude and n(t) is zero mean noise. The known pulse shape is A(t). Let the model be Î(t) =âa(t). Define the cost function to be the integrated squared error, 2 Q = dt I(t) Î(t). 17

Taking a derivative, we can solve for the estimate of the amplitude, â: âq = 2 dt I(t) Î(t) âî(t) =0 dt I(t) Î(t) A(t) =0 dtî(t)a(t) = dti(t)a(t) â dta 2 (t) = dti(t)a(t) â = dti(t)a(t) dta2 (t). Note that: a. The model is linear in the sole parameter, â b. The numerator is the zero lag of the crosscorrelation function (CCF) of I(t) and A(t). c. The denominator is the zero lag of the autocorrelation function (ACF) of A(t). 18

II. Now consider the case where we don t know the location of the pulse in time (the time of arrival, TOA) and that it is the TOA we wish to estimate. We still know the pulse shape, a priori. Let the data, model and cost function be I(t) = aa(t t 0 )+n(t) Î(t) = âa(t ˆt 0 ). Q = dt I(t) Î(t) 2. Note that the model is linear in â but is nonlinear in ˆt 0. 19

Minimizing Q with respect to â, we have âq = 2 dt I(t) Î(t) âî(t) =0 dt I(t) Î(t) A(t ˆt 0 )=0 dtî(t)a(t ˆt 0 )= dti(t)a(t ˆt 0 ) â dta 2 (t ˆt 0 )= dti(t)a(t ˆt 0 ) (4) â = dti(t)a(t ˆt 0 ) dta2 (t ˆt 0 ). This last equation has the same form as in I. except that the estimate for the arrival time ˆt 0 is involved. 20

Now, minimizing Q with respect to ˆt 0, we have ˆt 0 Q = 2 dt I(t) Î(t) ˆt 0 Î(t) =0 â dt Î(t) âa(t ˆt 0 ) A (t ˆt 0 )= â dt I(t)A (t ˆt 0 ) â dt A(t ˆt 0 )A (t ˆt 0 )= dt I(t)A (t ˆt 0 ). (5) Grid Search: One approach to finding the arrival time is to search over a 2D grid of â, ˆt 0 to find the values that satisfy equations 4 and 5. This approach is inefficient. Instead, one can search over a 1D space for the single nonlinear parameter, ˆt 0, and then solve for â using either equation 4 or 5. Linearization + Iteration: Another method is to find solutions for â and ˆt 0, we can linearize the equations in ˆt 0 t 0 by using Taylor-series expansions for A(t ˆt 0 ) and A (t ˆt 0 ). Let ˆt 0 = t 0 + δˆt 0. Then, to first order in δˆt 0 : This isn t a particular good way to proceed so we will skip this (pp 21-24) A(t ˆt 0 ) A(t t 0 ) A (t t 0 )δˆt 0 A (t ˆt 0 ) A (t t 0 ) A (t t 0 )δˆt 0 A 2 (t ˆt 0 ) A 2 (t t 0 ) 2A (t t 0 )A(t t 0 )δˆt 0. 21

Now equations (4) and (5) become â dt [A 2 (t t 0 ) 2δˆt 0 A(t t 0 )A (t t 0 )] = dt I(t)[A(t t 0 ) δˆt 0 A (t t 0 )] â dt [A(t t 0 )A (t t 0 ) δˆt 0 A(t t 0 )A (t t 0 ) δˆt 0 A 2 (t t 0 )] = dt I(t)[A (t t 0 ) δˆt 0 A (t t 0 )]. Consider the integral dt A(t t 0 )A (t t 0 ). The integrand may be written as and so the integral equals A(t t 0 )A (t t 0 )= 1 d 2 dt A2 (t t 0 ) 1 2 A2 (t t 0 ) t 2 t 1 0 in the limit of (e.g.) t 1,2 = T/2 with T pulse width. 22

We then have δˆt 0 â â dt A 2 (t t 0 ) = dt I(t)[A(t t 0 ) δˆt 0 A (t t 0 )] dt [A(t t 0 )A (t t 0 ) + A 2 (t t 0 )] = dt I(t)[A (t t 0 ) δˆt 0 A (t t 0 )]. Solving for â in both cases we have â = â = dt [I(t)A(t t0 ) δˆt 0 I(t)A (t t 0 )] dt A2 (t t 0 ) dt I(t) A (t t 0 )+δˆt 0 A (t t 0 ) δˆt 0 dt A(t t0 )A (t t 0 )+A 2 (t t 0 ). (6) (7) 23

Using the notation i 0 i 1 i 2 i 3 i 4 dt I(t)A(t t 0 ) dt I(t)A (t t 0 ) dt I(t)A 2 (t t 0 ) dt I(t)A (t t 0 ) dt I(t) A(t t 0 )A (t t 0 )+A 2 (t t 0 ). (8) we have Solving for δˆt 0 (to first order) we have â = i 0 δˆt 0 i 1 i 2 â = i 1 + δˆt 0 i 3 δˆt 0 i 4. δˆt 0 = i 1i 2 i 0 i 4 + i 2 i 3. 24

Iterative Solution for ˆt 0 This equation can be solved iteratively for δˆt 0 : 0. choose a starting value for ˆt 0. 1. calculate δˆt 0 using the linearized equations. 2. is δˆt 0 =0? 3a. if yes, stop. 3b. if no, update ˆt 0 ˆt 0 + δˆt 0 and go back to step 1. For the best fit value for ˆt 0, the change is zero, δˆt 0 =0(top of the hill) and â can be calculated using one of the equations 6 or 7. Correlation Function Approach The iterative solution for ˆt 0 is similar to the following procedure that uses a crosscorrelation approach more directly: 1. cross correlate the template A(t) with I(t) to get a CCF. 2. find the lag of peak correlation as an estimate for the arrival time, ˆt 0 = τ max. 3. calculate â if needed. Subtleties of the Cross Correlation Method The CCF is calculated using sampled data and therefore is itself a discrete quantity. Often one wants greater precision on the arrival time than is given by the sample interval. I.e. we want a 25

floating point number for ˆt 0, not an integer index. Therefore we want to calculate the peak of the CCF by interpolating near its peak. The interpolation should be done properly by using the appropriate interpolation formula for sampled data (using the sinc function). Using parabolic interpolation yields excessive errors for the arrival time. In practice, the proper interpolation is effectively done in the frequency domain by calculating the phase shift of the Fourier transform of the CCF, which is the product of the Fourier transform of the template and the Fourier transform of the data. Arrival Time Errors Here we wish to localize the occurrence of a function A(t). We will consider this to be a pulse whose arrival time t 0 we want to estimate, along with its expected error. Let the signal be I(t) =a 0 A(t t 0 )+n(t), where a 0 is the amplitude and n(t) is zero mean noise. We will find the time of arrival (TOA) by cross correlating the presumed known pulse shape A(t) with the signal: C AI (τ) = dt I(t)A(t τ). First, assume that the signal has been coarsely shifted so that the template is already aligned with the signal and that template is centered on t =0. This way we can assume that the arrival time estimate is a small correction to the coarse estimate. 26

We can expand the signal as I(t) a 0 A(t) a 0 t 0 A (t)+n(t). We also expand the template, but to second order in τ because we will be taking a derivative to find the lag of maximum correlation: Then we can write An estimator for τ is then A(t τ) A(t) τa (t)+ τ 2 A (t τ) A (t) τa (t). C IA(τ) = = d dτ C AI(τ) =0 dt I(t)A (t τ) C IA (0) τc IA (0) 2 A (t) τ = C IA (0) C IA (0) = a 0C AA (0) a 0 t 0 C A A (0) + C na (0) a 0 C AA a 0 t 0 C A A (0) + C na (0). Using previous approximations we encounter the terms C AA and C A A(0) that vanish for pulses that are zero at ±. Also the C na term in the denominator yields a second-order term that can be ignored. Then the TOA estimator becomes τ = a 0t 0 C A A (0) + C na (0). a 0 C AA (0) 27

Noiseless case: When there is no noise we have τ = t 0 C A A (0) C AA (0). Using a trial function, such as a Gaussian shape, it can be shown that τ = t 0, as we would expect! Even better, the denominator can be integrated by parts to show that C AA (0) = C A A(0) for pulses that vanish at ±, so the equality is general. With noise: We can now write τ = t 0 + C na (0) a 0 C AA (0). 28

Then the mean-square TOA error is, using δτ = τ t 0, (δτ) 2 = C2 na (0) a 2 0 C2 AA (0) = C2 na (0) a 2 0 C2 A A (0) dtdt n(t)n(t )A(t)A(t ) = a 2. 0 C2 A A (0) 29

Now assume white noise (for specificity) so that n(t)n(t ) = σ 2 nw n δ(t t ) where w n is a short characteristic time scale (such as an inverse bandwidth) to keep the units correct. Then (δτ) 2 2 σn w n dta 2 (t) = CA 2 A (0) We can then write this out as a TOA error σ τ = = σn a 0 = 1 SNR a 0 2 σn a 0 w n C A A (0) 1/2 w n dt [A (t)] 2 1/2 w n dt [A (t)] 2. We see that the error scales as the inverse of the signal to noise ratio (SNR). The denominator also involves the integral of the squared derivative of the pulse shape, suggesting that sharper pulses with larger derivatives will produce smaller arrival time errors. 30

Localization: Time vs. Frequency Domains For a Nyquist sampled, bandlimited process with bandwidth B the sampling theorem implies that the continuous-time signal can be reconstructed from the sampled data as x(t) = x n sinc(t n t)/ t where t =1/B and, as usual, sinc x (sin πx)/x. Consider a model appropriate for matched filtering, x(t) =aa(t t 0 )+n(t), where a is the amplitude, A(t) is the known template, and n is noise. Suppose we have sampled versions of the data and template, x n,a n. One approach is to calculate the discrete CCF C xa () = 1 x n A n N and find the maximum to determine an esimate ˆt 0 = max. However, we really want the CCF of the continuous time quantities, x(t) and A(t), C xt (τ) = dt x(t)a(t τ). n 31

We cannot get the true lag of maximum correlation, τ max, by interpolating the sampled correlation function unless we interpolate according to the sampling theorem. If we interpolate differently, we will get biased results. Another approach: the frequency domain Take the FT of the model equation to get X(f) =aã(f)e 2πift 0 + ñ(f). No noise: Write the FT in terms of its real and imaginary parts and find the phase: X(f) = X r (f)+i X i (f) = X(f) e iφ(f) See example. Xi φ(f) = tan 1 (f) sin =tan X 1 2πft0 = 2πft 0. r (f) cos 2πft 0 With noise: The phase will have a noise-like error that is nonlinearly related to ñ(f). In the limit of large SNR, the rms phase error will scale as 1/SNR. Working directly with the phase to determine t 0, however, is numerically problematic because the phase will wrap for large offsets and for low SNR. 32

33

34

35

36

37

38

Best approach: Fit to the complex FFT rather than to the phase. Use as a model Then define the product M(f) = Ã(f)e 2πifˆt 0 J(f) = X(f) M (f) =a Ã(f) 2 e 2πif(t ˆt 0 ) + ñ(f)ã (f)e 2πifˆt 0 and integrate over frequency: S df J(f) f = a df Ã(f) 2 e 2πif(t 0 ˆt 0 ) + f f df ñ(f)ã (f)e 2πifˆt 0. This quantity can be maximized vs. ˆt 0. For no noise, we expect ˆt 0 = t 0. The maximum can be found using standard search methods over the nonlinear parameter ˆt 0 (grid search, linearization, etc.). Note that the integrand is naturally weighted by the actual signal. An equivalent approach is to use a different test statistic, S = df X(f) M(f) 2, that we would minimize to find ˆt 0. f 39

Note that we have used continuous notation here. With sampled data we can reconstruct the continuous FT as X(f) =Π(f/B) x n e 2πifn t, n which can be implemented using the DFT. 40

Prewhitening What is Prewhitening? Prewhitening is an operation that processes a time series (or some other data sequence) to make it behave statistically like white noise. The pre means that whitening precedes some other analysis that likely works better if the additive noise is white. These operations can be viewed in either the time domain or the frequency domain: 1. Make the ACF of the time series appear more like a delta function. 2. Make the spectrum appear flat. Example data sets that may require prewhitening: 1. A well behaved noise process with an additive low frequency (or polynomial) trend added to it. 2. A deterministic signal with an additive red-noise process. Viewed in the frequency domain, prewhitening means that the dynamic range of the measured data is reduced. 1

Why bother? Recall from our discussions of spectral analysis the issues of leakage and bias. These arise from sidelobes inherent to spectral estimation. We can minimize leakage in two ways: (1) make sidelobes smaller and (2) minimize the power that is prone to leaking into sidelobes. Spectral windows address the former while prewhitening mitigates the latter. Leakage into sidelobes also constitutes bias in spectral estimates. However bias appears in other data analysis procedures. Consider least-squares fitting of a sinusoid to a signal of the form x(t) =A cos(ωt + φ)+r(t)+n(t), where n(t) is WSS white noise and r(t) is red noise with a steep power spectrum. Red noise can strongly bias fitting of a model ˆx(t) =Â cos(ˆω+ ˆφ) because its power can leak across the underlying spectrum causing a least-square fit to give highly discrepant values of Â, ˆω, and ˆφ. Prewhitening of the time series ideally would yield a transformed time series of the form x (t) =A cos(ωt + φ)+n (t) to which fitting a sinusoidal model will be less biased. 2

Procedures: We have already seen one analysis that is related to prewhitening: the matched filter (MF). The MF doesn t whiten the spectrum of the output but it does weights the frequency components of the measured quantity to maximize the S/N of the signal. The signal model in this case is x(t) =a 0 A(t) +n(t). Recall for an arbitrary spectrum S n (f) for additive noise that the frequency-domain MF for a signal A(t) is h(f) Ã(f) S n (f). Taking equality for simplicity, when the filter is applied to the measurements x(t), we have ỹ(f) = x(f) h (f) a 0 Ã(f) 2 S n (f) + ñ(f)ã (f). S n (f) This means that the ensemble-average spectrum of the filter output is ỹ(f) 2 = a20 Ã(f) 4 ñ(f) 2 Ã(f) 2 + Sn(f) 2 Sn(f) 2 = a2 0 Ã(f) 4 S 2 n(f) + S n(f) Ã(f) 2 S 2 n(f) = a20 Ã(f) 4 Sn(f) 2 = Ã(f) 2 S n (f) 3 + Ã(f) 2 S n (f) a 2 0 Ã(f) 2 +1 S n (f)

Signals with trends: A common situation is where a quantity of the form a 0 A(t)+n(t) is superposed with a strong trend, such as a baseline variation. Similar issues arise in measurements of spectra. Consequences of trends include: 1. Bias in estimating parameters of A(t t 0 ) or its spectral analog A(ν ν 0 ). 2. Erroneous estimates of cross correlations between two time series such as x(t) =s 1 (t)+n 1 (t) and y(t) =s 2 (t)+n 2 (t), where s 1,2 are signals of interest and n 1,2 are measurement errors. I.e. we may be interested in the correlation C = 1 s 1 (t)s 2 (t) or C = 1 [s 1 (t) s 1 ][s 2 (t) s 2 ] N t N t t where s 1,2 =(1/N t ) t s 1,2(t) are the sample means. If there are trends p 1,2 (t) added to x(t) and y(t) the correlation Ĉ of x and y used to estimate C may be dominated completely by the trends and not the signal parts of the measurements. A fix: Trends can often be modeled as a polynomial of some order that can be fitted to the measurements. The order of the polynomial needs to be chosen wisely. For a pulse or spectral line confined to some range of t or ν this is straight forward. But for a detection problem where the signal location is not known, the situation is very tricky. t 4

Prewhitening filter: Consider again x(t) =a 0 A(t)+n(t) and let s trivially construct a frequencydomain filter that whitens the measurements. We want a filter h(t) that flattens the noise n(t) in the frequency domain. Let y(t) =x(t) h(t) where means convolution. All we need is h(f) = S n (f). Then the ensemble spectrum of the output ỹ(f) is ỹ(f) 2 = x(f) 2 h(f) 2 = x(f) 2 S n (f) = a2 0 Ã(f) 2 S n (f) Note how this differs from the result for a matched filter. But the result is that in the mean the spectrum of the additive noise has been flattened. Prewhitening is important in both detection and estimation applications. +1 5

Leakage and Bias

Prewhitening in the least-squares estimation context: Consider our standard linear model y = Xθ + n, which has a least-squares solution for the parameter vector θ = X C 1 n X 1 X C 1 n y, where the covariance matrix of the noise vector n is C n = nn. This is also the maximum likelihood solution in the right circumstances (which are?). As with any covariance matrix, C n is Hermitian and positive, semi-definite. This means that the quadratic form for an arbitrary vector z satisfies z C n z 0. Such matrices can always be factored according to the Cholesky decomposition: where L is a lower-diagonal matrix; e.g. L = C n = LL a 0 0 0 b c 0 0 d e f 0 g h i j. 6

Utility: we can transform the model as follows using L: y = Ly w X = LX w. Substituting into the solution vector for θ and using yields y =(Ly w ) = y wl, X =(LX w ) = X wl, and C 1 n θ = X C 1 n X 1 X C 1 n y =(LL ) 1 = L 1 L 1 = (X w L C 1 n L I = X wx w 1 X w y. X w ) 1 X w L C 1 n L y I So what? The solution is identical to the least-squares case where the noise covariance matrix is diagonal; i.e. the noise vector n w = Ln has been transformed to white noise. We have whitened the data. When is this useful? An example is the fitting of a sinusoidal function amid red noise where leakage effects are important just as they are for spectral analysis. A specific example is the fitting of astrometric parameters or periodicities in radial velocity data. What s the catch? You need to know the covariance matrix of the noise n to do the Cholesky decomposition. This can be easier said than done! 7