A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Size: px

Start display at page:

Download "A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University"

Lindsay Bates
5 years ago
Views:

1 A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 18 Op0mal localiza0on Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral es0ma0on; also prewhitening methods Op0miza0on methods (needed for posterior PDFs, Bayes factors) Reading: Ch 10, 11 Gregory Assignment 3: now on web page References: Webpage: 1 Arrival Time Estimation with Matched Filtering Preliminary approaches 1

2 Localization Using Matched Filtering This handout describes localization of an object in a parameter space. For simplicity we consider localization of a pulse in time. The same formalism applies to localization of a spectral feature in frequency or to an image feature in a 2D image. The results can be extrapolated to a space of arbitrary dimensionality. I. First consider finding the amplitude of a pulse when the shape and location are known. Let the data be I(t) =aa(t)+n(t), where a = the unknown amplitude and n(t) is zero mean noise. The known pulse shape is A(t). Let the model be Î(t) =âa(t). Define the cost function to be the integrated squared error, 2 Q = dt I(t) Î(t). 20 Taking a derivative, we can solve for the estimate of the amplitude, â: âq = 2 dt I(t) Î(t) âî(t) =0 dt I(t) Î(t) A(t) =0 dtî(t)a(t) = dti(t)a(t) â dta 2 (t) = dti(t)a(t) Note that: â = dti(t)a(t) dta 2 (t). a. The model is linear in the sole parameter, â b. The numerator is the zero lag of the crosscorrelation function (CCF) of I(t) and A(t). c. The denominator is the zero lag of the autocorrelation function (ACF) of A(t). 21 2

3 II. Now consider the case where we don t know the location of the pulse in time (the time of arrival, TOA) and that it is the TOA we wish to estimate. We still know the pulse shape, a priori. Let the data, model and cost function be I(t) = aa(t t 0 )+n(t) Î(t) = âa(t ˆt 0 ). Q = dt I(t) Î(t) 2. Note that the model is linear in â but is nonlinear in ˆt 0. A nonlinear model requires an iterative solution, generally. 22 Minimizing Q with respect to â, we have âq = 2 dt dt â I(t) I(t) Î(t) dtî(t)a(t ˆt 0 )= dta 2 (t ˆt 0 )= â = dti(t)a(t ˆt 0 ) dta 2 (t ˆt 0 ). Î(t) âî(t) =0 A(t ˆt 0 )=0 dti(t)a(t ˆt 0 ) dti(t)a(t ˆt 0 ) (4) This last equation has the same form as in I. except that the estimate for the arrival time ˆt 0 is involved. 23 3

4 Now, minimizing Q with respect to ˆt 0, we have ˆt0 Q = 2 dt I(t) Î(t) ˆt0 Î(t) =0 â dt Î(t) A (t ˆt 0 )= â dt I(t)A (t ˆt 0 ) âa(t ˆt 0 ) â dt A(t ˆt 0 )A (t ˆt 0 )= dt I(t)A (t ˆt 0 ). (5) Grid Search: One approach to finding the arrival time is to search over a 2D grid of â, ˆt 0 to find the values that satisfy equations 4 and 5. This approach is inefficient. Instead, one can search over a 1D space for the single nonlinear parameter, ˆt 0, and then solve for â using either equation 4 or 5. Linearization + Iteration: Another method is to find solutions for â and ˆt 0, we can linearize the equations in ˆt 0 t 0 by using Taylor-series expansions for A(t ˆt 0 ) and A (t ˆt 0 ). Let ˆt 0 = t 0 + ˆt 0. Then, to first order in ˆt 0 : A(t ˆt 0 ) A(t t 0 ) A (t t 0 ) ˆt 0 A (t ˆt 0 ) A (t t 0 ) A (t t 0 ) ˆt 0 A 2 (t ˆt 0 ) A 2 (t t 0 ) 2A (t t 0 )A(t t 0 ) ˆt Now equations (4) and (5) become â â Consider the integral dt [A 2 (t t 0 ) 2 ˆt 0 A(t t 0 )A (t t 0 )] = dt I(t)[A(t t 0 ) ˆt 0 A (t t 0 )] dt [A(t t 0 )A (t t 0 ) ˆt 0 A(t t 0 )A (t t 0 ) ˆt 0 A 2 (t t 0 )] = dt I(t)[A (t t 0 ) ˆt 0 A (t t 0 )]. The integrand may be written as and so the integral equals dt A(t t 0 )A (t t 0 ). A(t t 0 )A (t t 0 )= 1 d 2 dt A2 (t t 0 ) 1 2 A2 (t t 0 ) t 2 t 1 0 in the limit of (e.g.) t 1,2 = T/2 with T pulse width. 25 4

5 We then have ˆt 0 â â dt A 2 (t t 0 ) = dt I(t)[A(t t 0 ) ˆt 0 A (t t 0 )] dt [A(t t 0 )A (t t 0 ) + A 2 (t t 0 )] = dt I(t)[A (t t 0 ) ˆt 0 A (t t 0 )]. Solving for â in both cases we have dt [I(t)A(t t0 ) ˆt 0 I(t)A (t t 0 )] â = dt A2 (t t 0 ) dt I(t) A (t t 0 )+ ˆt 0 A (t t 0 ) â = ˆt 0 dt A(t t0 )A (t t 0 )+A 2 (t t 0 ). (6) (7) 26 Using the notation we have Solving for i 0 dt I(t)A(t t 0 ) i 1 dt I(t)A (t t 0 ) i 2 dt I(t)A 2 (t t 0 ) i 3 dt I(t)A (t t 0 ) i 4 dt I(t) A(t t 0 )A (t t 0 )+A 2 (t t 0 ). ˆt 0 (to first order) we have â = i 0 ˆt 0 i 1 i 2 â = i 1 + ˆt 0 i 3 ˆt 0 i 4. ˆt 0 = i 1 i 2 i 0 i 4 + i 2 i (8) 5

6 Iterative Solution for ˆt 0 This equation can be solved iteratively for ˆt 0 : 0. choose a starting value for ˆt calculate ˆt 0 using the linearized equations. 2. is ˆt 0 =0? 3a. if yes, stop. 3b. if no, update ˆt 0 ˆt 0 + ˆt 0 and go back to step 1. For the best fit value for ˆt 0, the change is zero, ˆt 0 =0(top of the hill) and â can be calculated using one of the equations 6 or 7. Correlation Function Approach The iterative solution for ˆt 0 is similar to the following procedure that uses a crosscorrelation approach more directly: 1. cross correlate the template A(t) with I(t) to get a CCF. 2. find the lag of peak correlation as an estimate for the arrival time, ˆt 0 = max. 3. calculate â if needed. Subtleties of the Cross Correlation Method The CCF is calculated using sampled data and therefore is itself a discrete quantity. Often one wants greater precision on the arrival time than is given by the sample interval. I.e. we want a 28 floating point number for ˆt 0, not an integer index. Therefore we want to calculate the peak of the CCF by interpolating near its peak. The interpolation should be done properly by using the appropriate interpolation formula for sampled data (using the sinc function). Using parabolic interpolation yields excessive errors for the arrival time. In practice, the proper interpolation is effectively done in the frequency domain by calculating the phase shift of the Fourier transform of the CCF, which is the product of the Fourier transform of the template and the Fourier transform of the data. Arrival Time Errors 6

7 7

8 Arrival Time Estimation with Matched Filtering Least-squares approach 8

9 Arrival Time Estimation from Matched Filtering Least-squares solution and localization error Consider discrete sampling of a template and measured profile from which we want to determine the amplitude b and location through matched filtering (MF). The time and frequency domain quantities are: s t = template s k m t = model = a + bs t m k p t = measured profile = m t + n t p k n t = noise ñ k, N 1 where the Fourier transform is defined as s k = s t e 2itk/N. Frequency-domain approach: Using weights w k, minimize 2 = = t=0 N 1 w k p k m k 2 k=0 N 1 w k p k an k0 b s k e 2ik /N 2 k=0 1 We ignore the k =0term and consider only the first half of the array because time-domain quantities are real. So the cost function is (rewritten as Q) Q = w k p k 2 + b 2 s k 2 2bR e { p k s ke +2 ik/n }. (1) Taking derivatives we can solve for b and find an implicit equation for. b Q = 2b b = w k s k 2 2 w k R e p k s ke +2 ik/n w k s k 2 w k R e p k s ke +2 ik/n =0 2 9

10 is the solution of Q = 4( ib/n)b w k I m p k s k e +2 ik/n =0 w k I m p k s k e +2 ik/n =0 (2) Another approach: The same implicit equation for (Equation 2) can be gotten by multiplying the weighted DFT of the cross correlation, w k s k p k by the phasor e 2 ikˆ/n and finding the best value of ˆ: Q(ˆ) = = b When ˆ = the sum is maximized and real, w k s k p k e 2 ikˆ/n Q max = Q() =b w k s k 2 e 2 ikˆ/n (3) w k s k 2 3 Another solution for the scale factor is b = Q max w k s k 2 ˆ is the solution of ˆ R e {Q(ˆ )} =0or the solution of I m {Q(ˆ )} =0. Either one gives the same equation as Equation 2. Parameter errors: Expand Q from Equation 1 to second order, Q(b, ) Q min + b Q b + Q + 2 b Q ( b) Q ( ) b Q b = 2 b Q ( b) Q ( ) b Q b. Scale factor: Defining b as the error along the b axis in the b- plane and using b 2 Q =2 w k s k 2, 4 10

11 we have Q = Q min + 2 b Q ( b) 2 yielding Q = Q min + 2 b Q ( b) 2 Q min + 2 b Q 2 b b 2 = Q Q min b 2Q = Q Q min = 2 w k s k w k s k 2, where the rightmost equation results from defining the 1 error as the contour of Q that is one unit above the minimum. It is reasonable to define the weights in terms of the additive noise, so w k =1/k 2 and if they are all the same (as for white noise), then 2 b = 2 2 s k 2 = Nt 2 2 s k 2, where all k have been set equal to and then related to the rms noise level t in the time series. 5 Arrival time: Using the same approach we have From earlier expressions we get 2 = 2 2 = Q Q min 2 Q = 1 2 Q b w k k 2 R e { p k s N ke +2 ik /N } Note that the units of are in sample numbers. To get time units, we need to multiply by the sample interval, t. 6 11

12 Template Fitting of Millisecond Pulsars Pulse widths range from ~25 to 1000 µs In the best cases, timing precision is ~ 50 ns (a factor of 1/500 of the pulse width) Templates (average pulse shapes) appear to be stable over decades so pulsars can be used as precise clocks Departures from the templates occur because of fluctuations at the single pulse level Templates for MSPs 12

13 13

14 14

15 Pulsar P (ms) 1.56 B J B J J J J J J J B J J J J J J J J J J J J J J J J J J J J J J J B J J J Frequency Band = 1400 MHz Characterization of all MSPs used by NANOGrav for precision timing. Most MSPs are limited by template fitting errors (finite S/ N) but a few are limited by departures of pulse shapes from the template by pulsar-intrinsic effects min [µs] J,1/FWHM Implementation The method is easily implemented in python. Specifically, a nonlinear solver is used to minimize the cost function written here as Q = p k m k 2 = p k b s k e 2 ik/n 2, where we ignore the k =0term so that any mean offset of the template is irrelevant. We also use only the first half of the array because data and template are real. The two parameter fit yields the scale factor b and time offset. Errors are calculated as above. import scipy.optimize as spo tfft = fft(template) pfft = fft(profile) bhat0 = bccf tauhat0 = tauccf+ishift paramvec0 = array((bhat0, tauhat0)) paramvec = spo.minpack.leastsq(tfresids, paramvec0, args=(tfft, pfft)) bhat = paramvec[0][0] tauhat = paramvec[0][1] The module leastsq in scipy.optimize.minpack starts with an initial guess of the parameters paramvec0. As such, the returned result can be at a local minimum rather than the global minimum. How to check: try different starting values for paramvec

def tfresids(params, tfft, pfft): """ Calculates residuals between scaled and rotated template and data. """ b=params[0] tau=params[1] Nfft = size(pfft) Nsum = Nfft/2 arg=(2.

16 def tfresids(params, tfft, pfft): """ Calculates residuals between scaled and rotated template and data. """ b=params[0] tau=params[1] Nfft = size(pfft) Nsum = Nfft/2 arg=(2.*pi*tau/float(nfft)) * arange(0., Nfft, 1.) phasevec = cos(arg) - 1j*sin(arg) resids = abs(pfft[1:nsum] - b*tfft[1:nsum]*phasevec[1:nsum]) return resids def toa_errors_additive(tfft, b, sigma_t): """ Calculates error in b = scale factor and tau = TOA due to additive noise. input: fft of template b = fit value for scale factor sigma_t = rms additive noise in time domain output: sigma_b sigma_tau """ Nfft = size(tfft) Nsum = Nfft / 2 kvec = arange(1,nsum) sigma_b = sigma_t*sqrt(float(nfft) / (2.*sum(abs(tfft[1:Nsum])**2))) sigma_tau = (sigma_t*nfft/(2.*pi*abs(b))) \ * sqrt(float(nfft) \ / (2.*sum(kvec**2*abs(tfft[1:Nsum])**2))) return sigma_tau, sigma_b 8 About 700 lines of code. 16

17 17

18 Analysis of template fitting errors for simulated Gaussian profile Fourier RMS Fourier Total CCF RMS CCF Total MF Prediction Template fitting errors for Gaussian pulse with width of 10 bins vs. S/N TOA (bins) realizations S/N Parabolic interpolation of the CCF gives the same standard deviation as the Fourier approach and the theoretical MF error. But there is a systematic deviation from the true TOA that causes the large total error for parabolic interpolation RMS = std(best fit) Total = total mean-square difference from true TOA MF prediction = predicted RMS error for matched filtering. 18

19 Pulse timing Taylor 1992 Pulsar timing and relativistic gravity BY J. H. TAYLOR Joseph Henry Laboratories and Physics Department, Princeton University, Princeton, New Jersey 08544, U.S.A. In addition to being fascinating objects to study in their own right, pulsars are exquisite tools for probing a variety of issues in basic physics. Recycled pulsars, thought to have been spun up in previous episodes of mass accretion from orbiting companion stars, are especially well suited for such applications. They are extraordinarily stable clocks, approaching and perhaps exceeding the long-term stabilities of the best terrestrial time standards. Most of them are found in binary systems, with orbital velocities as large as 10-3 c. They provide unique opportunities for measuring neutron star masses, thereby yielding fundamental astrophysical data difficult to acquire by any other means. And they open the way for high precision tests of the nature of gravity under conditions much more 'relativistic' than found anywhere within the Solar System. Among other results, pulsar timing observations have convincingly established the existence of quadrupolar gravitational waves propagating at the speed of light. They have also placed interesting limits on possible departures of the strong-field nature of gravity from general relativity, on the rate of change of Newton's constant, G, and on the energy density of low-frequency gravitational waves in the universe. extending over several decades could lead to important new results in cosmology and Phil. Trans R. Soc. Lond. A (1992) 341, ? 1992 The Royal Society 1. Introduction and 19

20 Appendix A. Determining pulse times of arrival The applications of pulsar timing discussed in this paper depend crucially on the precision of measuring pulse times of arrival, so it is important to make the best possible use of all information in the data. Let us assume that observed and standard profiles, p(t) and s(t), have been obtained as described in?2a: the profiles are sampled and recorded at equally spaced intervals of time, tj = jat, j = 0,1,...,N- 1, with P = NAt. Before sampling, the detected signals will have been low-pass filtered at a cutoff frequency fc < (2At)-1. To avoid filtering out useful data, At is chosen small enough that fc can exceed the highest frequencies significantly present in the data. (Obviously what is 'significant' will depend on the available signal-to-noise ratio.) If the foregoing criteria are met, the finite sampling theorem (Bracewell 1965) ensures that all potentially available information is fully and unambiguously contained in the discretely sampled values pj = p(tj). If p(t) is equal to a shifted and scaled replica of s(t) plus random noise, as defined in (1), their Fourier transforms are also related in a simple way. Discrete Fourier transforms of the two profiles can be written as.n-1 Pkexp (iok,) = E pjei2zjk/n, (A 1) j=o N-i S exp (ik) = E Sjei2jk/N, (A 2) j=o where the frequency index k runs from 0 to N-1. Thus, the real quantities Pk and Sk are the amplitudes of the complex Fourier coefficients, and 0k and 5k are the phases. Linearity of the transform relationship implies that Pkexp (i)k) = an+bskexp [i( +? kr)] + Gk, k = 0,..., (N- 1), (A 3) where Gk represents random noise equal to the Fourier transform of the sampled noise in the time-domain profile, g(tj). Note that the bias a and scale factor b assume similar roles in the time and frequency domain (cf. equation (1)). As a consequence of the 'shift theorem' (Bracewell 1965), the time offset T appears in the frequency domain as the slope of a linear ramp, kr, added to the phases of the standard profile's Fourier coefficients. After the transforms have been computed, the value of a can be obtained immediately from the relation a = (P-bSo)/N. (A 4) The desired pulse time of arrival r, as well as the gain factor b, can be obtained by minimising the goodness-of-fit statistic Phil. Trans. R. Soc. Lond. A (1992) pk bsk exp [i(sx(b,t ) T= E. k + kt)] 2(A 5) (A 5) k=l k Pulsar timing and relativistic gravity 133 In this equation o(j is the root-to-mean-square amplitude of the noise at frequency k, and presumably the anti-aliasing low-pass filter will make the ok,s fall off somewhat at larger values of k. In practice, however, this subtlety is usually unimportant since the amplitudes P, and Sk decrease even faster than S0. Owing to inherent symmetries in the transforms, the limits of summation in (A 5) can be taken as 1 to VN, rather than 0 to N-1. For convenience of notation, in the remaining equations the summation limits have been omitted and the 0oks treated as constant. By replacing the complex exponential in (A 5) with trigonometric equivalents and expanding the indicated squared modulus, one obtains a more convenient expression for X2, namely X2 = k-2 (P+ b2s )-2b -~2 P Sk cos (k- Ok + kt). (A 6) At the global minimum of the two-dimensional function X2(T, b), its derivatives with respect to r and b must vanish. This requirement yields two equations in the two unknowns, namely ax2 2b = 02 E kpk Sk sin (k - 0k + k) = (A 7) ax2 2b - 2 ab =2 -S -2 E Pk Sk cos = 0. (A 8) (Okk-kr) Equation (A 7) can be solved for T by a straightforward iterative procedure such as Brent's method (see, for example, Press et al. 1986). Equation (A 8) then yields b = PSk COS (k- ok + kt)/ Sk. (A 9) Uncertainties in the estimated values of T and b may be found by approximating X2 near its minimum by the leading terms of a Taylor series, and determining the excursions of b or T required to increase the value of X2 by 1. This procedure leads to 2 22 = = - (A10) (J ( at2) 2b E k2p Sk cos (O+ - 1k + k7()' -1 _(02,yX2 2'3 (- ab2)= 2 2 (A 11) Notice that the data sampling interval, At = P/N, appears nowhere in (A 1)-(A 10). Our formalism for fitting TOAS in the Fourier transform domain places no limits on accuracy expressed as a fraction of At. In contrast, experience has shown that timedomain methods widely in use for determining TOAS do not readily produce arrivaltime accuracies smaller than about O.1At (see, for example, Rawley 1986). 20

21 Arrival Time Errors Here we wish to localize the occurrence of a function A(t). We will consider this to be a pulse whose arrival time t 0 we want to estimate, along with its expected error. Let the signal be I(t) =a 0 A(t where a 0 is the amplitude and n(t) is zero mean noise. t 0 )+n(t), We will find the time of arrival (TOA) by cross correlating the presumed known pulse shape A(t) with the signal: C AI ( )= dt I(t)A(t ). First, assume that the signal has been coarsely shifted so that the template is already aligned with the signal and that template is centered on t =0. This way we can assume that the arrival time estimate is a small correction to the coarse estimate. 29 We can expand the signal as I(t) a 0 A(t) We also expand the template, but to second order in find the lag of maximum correlation: Then we can write a 0 t 0 A (t)+n(t). A(t ) A(t) A (t)+ 2 A (t) A (t ) A (t) A (t). C IA( ) = d d C AI( )=0 = dt I(t)A (t ) C IA (0) because we will be taking a derivative to 2 C IA (0) An estimator for is then = C IA (0) C IA (0) = a 0C AA (0) a 0 t 0 C A A (0) + C na (0) a 0 C AA a 0 t 0 C A A (0) + C na (0). Using previous approximations we encounter the terms C AA and C A A (0) that vanish for pulses that are zero at ±. Also the C na term in the denominator yields a second-order term that can be ignored. Then the TOA estimator becomes = a 0t 0 C A A (0) + C na (0). a 0 C AA (0) 30 21

22 Noiseless case: When there is no noise we have = t 0 C A A (0) C AA (0). Using a trial function, such as a Gaussian shape, it can be shown that expect! Even better, the denominator can be integrated by parts to show that C AA (0) = pulses that vanish at ±, so the equality is general. With noise: We can now write = t 0 + C na (0) a 0 C AA (0). = t 0, as we would C A A (0) for 31 Then the mean-square TOA error is, using = t 0, ( ) 2 = C2 na (0) a 2 0 C2 AA (0) = C2 na (0) a 2 0 C2 A A (0) dtdt n(t)n(t ) A(t)A(t ) = a 2. 0 C2 A A (0) 32 22

23 Now assume white noise (for specificity) so that n(t)n(t ) = 2 nw n (t t ) where w n is a short characteristic time scale (such as an inverse bandwidth) to keep the units correct. Then ( ) 2 = 2 n w n dta 2 (t) a 0 CA 2 A (0) = 2 n w n a 0 C A A (0) We can then write this out as a TOA error = n a 0 = 1 SNR 1/2 w n dt [A (t)] 2 1/2 w n dt [A (t)] 2. We see that the error scales as the inverse of the signal to noise ratio (SNR). The denominator also involves the integral of the squared derivative of the pulse shape, suggesting that sharper pulses with larger derivatives will produce smaller arrival time errors. 33 Localization using CCFs :time and frequency domains 23

24 Localization: Time vs. Frequency Domains For a Nyquist sampled, bandlimited process with bandwidth B the sampling theorem implies that the continuous-time signal can be reconstructed from the sampled data as x(t) = where t =1/B and, as usual, sinc x (sin x)/x. Consider a model appropriate for matched filtering, x(t) =aa(t x n sinc(t n t)/ t t 0 )+n(t), where a is the amplitude, A(t) is the known template, and n is noise. Suppose we have sampled versions of the data and template, x n,a n. One approach is to calculate the discrete CCF C xa ( ) = 1 N n x n A n and find the maximum to determine an esimate ˆt 0 = max. However, we really want the CCF of the continuous time quantities, x(t) and A(t), C xt () = dt x(t)a(t ). 34 We cannot get the true lag of maximum correlation, max, by interpolating the sampled correlation function unless we interpolate according to the sampling theorem. If we interpolate differently, we will get biased results. Another approach: the frequency domain Take the FT of the model equation to get X(f) =aã(f)e 2 ift 0 + ñ(f). No noise: Write the FT in terms of its real and imaginary parts and find the phase: See example. X(f) = X r (f)+i X i (f) = X(f) e i (f) Xi (f) = tan 1 (f) sin 2 =tan X 1 ft0 = 2 ft 0. r (f) cos 2 ft 0 With noise: The phase will have a noise-like error that is nonlinearly related to ñ(f). In the limit of large SNR, the rms phase error will scale as 1/SNR. Working directly with the phase to determine t 0, however, is numerically problematic because the phase will wrap for large offsets and for low SNR

28 Best approach: Fit to the complex FFT rather than to the phase. Use as a model Then define the product M(f) = Ã(f)e 2 ifˆt 0 J(f) = X(f) M (f) =a Ã(f) 2 e 2 if(t ˆt 0) + ñ(f)ã (f)e 2 ifˆt 0 and integrate over frequency: S df J(f) f = a df Ã(f) 2 e 2 if(t 0 ˆt 0) + f df ñ(f)ã (f)e 2 ifˆt 0. f This quantity can be maximized vs. ˆt 0. For no noise, we expect ˆt 0 = t 0. The maximum can be found using standard search methods over the nonlinear parameter ˆt 0 (grid search, linearization, etc.). Note that the integrand is naturally weighted by the actual signal. An equivalent approach is to use a different test statistic, S 2 = df X(f) M(f), f that we would minimize to find ˆt Note that we have used continuous notation here. With sampled data we can reconstruct the continuous FT as X(f) = (f/b) x n e 2 ifn t, which can be implemented using the DFT. n 43 28

29 Modeling Criteria for Modeling See slides 58 29

30 Suppose we have the following y = data vector x = independent variable Criteria for Modeling ŷ( ) = model for data with parameters f y (y; ) = multivariate PDF for the data. ˆ = vector of parameters that yield the best fit of the model to the data according to some criterion or ˆ = parameters of a probability density function for the data. Suppose we have a model for the data that consists of parameters. These might be parameters of a time series or parameters of a PDF for the data. 1 (1) Least squares minimize with respect to : Q( ) j [y i ŷ i ( )] 2 (2) Maximum likelihood: For the data {y i,i =1,N} suppose the joint PDFis modeled as or known to be f y (y; ) After obtaining the N data points, we view the PDF as the likelihood of getting those actual data points given a choice of the parameters,. i.e. L ( )=f y (y; ) and those values of that maximize L ( ) are maximum likelihood estimators for. 2 30

31 Measures of Optimality For a given parameter and data set, there may be several or many possible estimators. We need quantitative guidelines on how to choose the best one. The criteria often used include: Consistency: (convergence) Let n = sample size lim ˆ =0 n Unbiased: ˆ = or B = ˆ ; consistency unbiased because for finite n, the estimator can be biased even though the bias 0 as n. (e.g. maximum likelihood estimators) Minimum variance: Var (ˆ) ˆ2 ˆ 2 if Var ˆ is minimized, the resultant estimator yields the least variation of estimates (note, however, that the MV estimator may be biased) 3 Mean-square error: MSE = ˆ 2 = [ˆ ˆ + ˆ ] 2 =Var(ˆ)+B 2 Efficiency: A well-designed experiment yields data that are all used in an estimate. An inefficient experiment yields superfluous data. Thus, experiments should be designed vis á vis an estimator. As an example, an estimator is said to be mean-square efficient if no other estimator has a smaller cost function: (ˆ ) 2 < (ˆ ) 2 where is any other estimator of. 4 31

32 Sufficiency: if ˆ is a sufficient statistic it contains all information obtainable from a sample on. Formally, consider an estimator ˆ and another ˆ and the conditional distribution F (ˆ ˆ) P {ˆ < some number ˆ} If ˆ is not a function of ˆ and if F (ˆ ˆ) is independent of (actual value) then ˆ is a sufficient statistic. More transparent is an example! Suppose you have N data points, {x n,n=1,n} that you know are distributed as N(µ, ). You calculate the sample mean and standard deviation, ˆµ = N N 1 n=1 x n ˆ 2 = 1 N (x n N n=1 ˆµ) 2. 5 In calculating the likelihood function L (µ, ) for µ and it can be shown that L depends only on ˆµ and ˆ 2. Thus ˆµ and ˆ 2 contain all the information needed for likelihood inference and are thus sufficient statistics. 6 32

33 Least squares fitting Linear Minimum chi^2 (or ML) solu0on is easy and unique Nonlinear Solving the NL problem (a plethora of approaches, including MCMC) can be challenging with large dimensionality General Want solu0ons for parameter vectors Want errors and covariances of parameters à covariance matrix for the parameters to be derived from (e.g.) covariance matrix of the data Confidence intervals on parameter values Comparison of models Hypothesis tes0ng 65 33

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013 Lecture 26 Localization/Matched Filtering (continued) Prewhitening Lectures next week: Reading Bases, principal