A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Size: px

Start display at page:

Download "A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring"

Meagan Wright
5 years ago
Views:

1 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Lecture 12 Applications: Model comparison Some Least-squares lessons Reading: Assignment 3 - some least-squares problems - analyze F test for M1, M2 - analyze Bayesian odds ratio for M1, M2 - a maximum entropy problem

2 Next few lectures Revisit spectral analysis Issues with Fourier based estimators 100% errors unless degrees of freedom increased Nonuniform sampling: Lomb-Scargle method» Still suffers from large sidelobles» Can apply CLEAN algorithm Poor performance with red processes (leakage = bias) Alternative approaches: Bayesian method Cholesky decomposition Maximum entropy method Designer methods for specific criteria (e.g. no bias) Etc.

3 Data and fit D= x Line Fit = x Constant Fit = Data S/N = 1 Line fit Constant fit Residuals 1 2 N = 100 χ 2 l = 90.2 χ 2 r l =0.9 χ 2 c = χ 2 r c = x

4 D= x N = 100 SNR = 0.0 D= x N = 100 SNR = 1.0 <F 12 > = <F 12 > = F 12 (constant fit):(line fit) F 12 (constant fit):(line fit) D= x N = 10 SNR = 10.0 <F 12 > = D= x N = 100 SNR = <F 12 > = F 12 (constant fit):(line fit) (constant fit):(line fit) F 12

5 Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal covariance matrix: Linear model: n = 0 C = diag(σ 2 j). y = Xθ X = design matrix θ = parameter vector with M elements For any given choice of θ, we can estimate the noise vector as n = D y. Cost function for least squares: χ 2 (θ) =n 1 C 1 n =(D y) C 1 (D y) [sometimes written as C(θ) or Q(θ)] Minimizing the cost function: θ χ 2 (θ) =0 gives a system of equations with solution θ = X C 1 X 1 X C 1 D. The existence of a solution depends on whether matrices are invertible. A bad choice of basis functions in X may lead to an ill-conditioned matrix. Singular value decomposition or a better choice of basis functions may be needed. For a noise PDF that is Gaussian, the likelihood function L(θ) e χ2 (θ)/2. The covariance matrix of the parameters is P θ = θθ = X C 1 X 1. 1

6 Comments For a linear model, the solution is unique if there is no noise. The least-squares solution with non-zero noise is unique: there is only one minimum in the cost function. The cost function is quadratic in δθ = θ θ true. However, different realizations of the data will yield different solutions whose range is quantified by P θ. 2

7 Consider data D where each element is d i = y i + n i = a + bx i + n i, i.e. a straight line. The design matrix is then a 2-column N-row array and θ is a 2-element vector: 1 x 1 1 x X = 2.., θ = θ1 a θ 2 b 1 x N Now suppose we do not know that the form of the data but we wish to find the best model. As the universe of models, consider just the pair: M 1 : M 2 : y i = θ 1 = constant y i = θ 1 + θ 2 x i = line For a given realization of a data set, we can estimate the parameters of each model. We then want to test how good each model is and test them against each other. These are the questions of statistical inference: 1. How do we decide whether each model is a good fit or not? 2. Given that M1 is a subset of M2 when θ 2 =0, how do we gauge that the extra parameter in M2 is warranted (demanded) by the data? 3. What are acceptable values for estimates of the parameters of the better model? The answers to these questions are different for frequentist vs. Bayesian approaches. 3

8 Gaussian Noise Model Let the noise be N(0, C): 1 f n (n) = (2π) N/2 (det C) 1/2 e 1 2 n C 1 n Note that the argument in the exponential is a quadratic form. The likelihood function for the parameters is obtained by using the estimate n = D y(θ) that depends on a particular choice of the parameter vector θ: L(θ) = 1 (2π) N/2 (det C) 1/2 e 1 2 n C 1 n For this case and this case only, minimizing χ 2 yields identical results to maximizing L: Different situations are encountered: Least squares estimate= maximum likelihood estimate 1. The noise covariance matrix C is known (in shape and element by element values) 2. The form of C is known but the values are not. 3. Nothing is known about C a priori. In the example here we assume case 1, that C is known. 4

9 Frequentist Approach Testing a model: Calculate the minimum of the cost function, χ 2 min = χ2 ( θ). For shorthand, we will call the minimum just χ 2. If the model is a good (if not perfect) match to the true underlying model, we expect that over an ensemble of realizations χ 2 = N M = number of degrees of freedom σ χ 2 = 2(N M) = σ 1/2 χ 2 2 χ 2 =. N M = degrees of freedom matter (could have N and M both large) Note that χ 2 (θ) varies quadratially in δθ = θ θ. Why? Reduced chi-square: χ 2 r = χ 2 /(N M). A good model has χ 2 r =1and σχ 2 r =[2/(N M)]1/2 A model can be assessed by calculating the probability that the estimated χ 2 r is statistically consistent with 1 given N M. See 7.2 of Gregory. 5

10 Frequentist Approach (continued) Parameter estimation errors: Use P θ to obtain the variances of each parameter (diagonal elements) and correlations between parameters (off-diagonal elements). The interpretation of the parameter values is like this: these are the variations in estimates of the parameters expected from different realizations of the data chosen from an ensemble. Model comparison with the F test: While models can be individually assessed as above, they can also be compared by using the quantitiy: F 12 = χ2 r1 χ 2 r2 = χ2 1/(N M 1 ) χ 2 2/(N M 2 ). The PDF of F 12 has an analytical form (see 6.5, Equation 6.36 of Gregory). One can calculate whether one model is better than another by calculating the probability that the value of F 12 would be obtained by chance if the reduced χ 2 is the same in both models. Generally one would say that model M2 is preferred over M1 if the probability of obtaining F 12 is smaller than some selected amount, like 5% or 1%. 6

11 h"p://en.wikipedia.org/wiki/f- distribu7on

12 F test in python: from scipy import stats Fmean = dof2 / (dof2-2.) Fsig = (dof2 /(dof2-1)) * sqrt( (2.*dof2+2.*dof1-4.) / (dof1*(dof2-4)) ) Fmode = dof2*(dof1-2.) / (dof1 * (dof2+2.)) Fmin = 0. Fmax = Fmean + 4.*Fsig Fvec = arange(fmin, Fmax, 0.01) Fpdf = stats.f.pdf(fvec, dof1, dof2) Fcdf = stats.f.cdf(fvec, dof1, dof2) The F distribution gives the probability that the ratio of two chi 2 RVs can occur by chance This can be used to determine at some probability level that two values of chi 2 are actually different For 99 and 98 degrees of freedom, F needs to be >? (Homework!)

13 Data and fit D= x Line Fit = x Constant Fit = Data S/N = 1 Line fit Constant fit Residuals 1 2 N = 100 χ 2 l = 90.2 χ 2 r l =0.9 χ 2 c = χ 2 r c = x

14 D= x N = 100 SNR = 0.0 D= x N = 100 SNR = 1.0 <F 12 > = <F 12 > = F 12 (constant fit):(line fit) F 12 (constant fit):(line fit) D= x N = 10 SNR = 10.0 <F 12 > = D= x N = 100 SNR = <F 12 > = F 12 (constant fit):(line fit) (constant fit):(line fit) F 12

15 Two examples of F distributions and the locations of the 95% and 99% probability areas. PDF of F F Distribution N dof1 =9, N dof2 =8 F95 = F99 = PDF of F F Distribution N dof1 = 99, N dof2 = 98 F95 = F99 = CDF of F CDF of F F F 7

16 F Distribution N dof1 = 99, N dof2 = 98 PDF of F F95 = F99 = CDF of F F

17 PDF of F F Distribution N dof1 =9, N dof2 =8 F95 = F99 = CDF of F F

18 Bayesian Approach For the Bayesian approach we calculate the likelihood function L(θ) and multiply by a prior for the parameters to get the posterior PDF: P (θ I)L(θ) P (θ DI) = dθ P (θ I)L(θ) From the posterior PDF we obtain the M-dimensional PDF for the parameters. This has a different interpretation than the frequentist approach: the posterior PDF expresses our uncertainty about the parameters for a specific data set and given background and prior information. We compare models using the odds ratio ( 3.5 of Gregory and notes on website, Bayesian Model Comparison. Here we assume that the two models have equal priors (we have no a priori preference) and thus concentrate on the Bayes factor B 21 = L(M 2) L(M 1 ) = ratio of global likelihoods of M 2,1 The global likelihoods are just the denominators in the posteriod PDFs, so B 21 = L(M 2) dθ P L(M 1 ) = (θ M2 ) L(θ D, M 2 ) dθ P (θ M1 ) L(θ D, M 1 ) Use flat priors with width for M1: a 1 and for M2: a 2 b 2. Also assume the likelihood functions are narrower than the priors and have widths for M1: δa 1 and for M2: δa 2 δb 2. Then B 21 L( ˆθ 2 D, M 1 ) δa2 δb 2 a1 L( ˆθ 1 D, M 1 ) a 2 b 2 δa 1 For M2 to be superior to M1 (higher odds ratio), the likelihood function has to be sufficiently large to offset the penalty of the extra parameter contained in the larger volume in parameter space. 8

19 Bayesian Approach (continued) For the particular M1 and M2 models (constant vs. line fits), the ratio of likelihoods is just L( ˆθ 2 D, M 1 ) L( ˆθ 1 D, M 1 ) = e χ2 2 /2 e = χ1 2 /2 e(χ1 1 χ1)2 2 = e χ2 /2 and the Bayes factor becomes B 21 e χ2 /2 δa2 δb 2 a1 a 2 b 2 δa 1 9

20 THE ASTROPHYSICAL JOURNAL, 505:315È338, 1998 September 20 ( The American Astronomical Society. All rights reserved. Printed in U.S.A. NEUTRON STAR POPULATION DYNAMICS. II. THREE-DIMENSIONAL SPACE VELOCITIES OF YOUNG PULSARS J. M. CORDES Department of Astronomy and NAIC, Space Sciences Building, Cornell University, Ithaca, NY ; cordes=spacenet.tn.cornell.edu AND DAVID F. CHERNOFF Department of Astronomy, Space Sciences Building, Cornell University, Ithaca, NY ; cherno =spacenet.tn.cornell.edu Received 1997 July 21; accepted 1998 April 24 ABSTRACT We use astrometric, distance, and spindown data on pulsars to (1) estimate three-dimensional velocity components, birth distances from the Galactic plane, and ages of individual objects; (2) determine the distribution of space velocities and the scale height of pulsar progenitors; (3) test spindown laws for pulsars; (4) test for correlations between space velocities and other pulsar parameters; and (5) place empirical requirements on mechanisms than can produce high-velocity neutron stars. Our approach incorporates measurement errors, uncertainties in distances, deceleration in the Galactic potential, and di erential Galactic rotation. We focus on a sample of proper motion measurements of young (\10 Myr) pulsars whose trajectories may be accurately and simply modeled. This sample of 49 pulsars excludes millisecond pulsars and other objects that may have undergone accretion-driven spinup. We estimate velocity components and birth z distance on a case-by-case basis assuming that the actual age equals the conventional spindown age for a braking index n \ 3, no torque decay, and birth periods much shorter than present-day periods. Every sample member could have originated within 0.3 kpc of the Galactic plane while still having reasonable present-day peculiar radial velocities. For the 49 object sample, the scale height of the progenitors is D0.13 kpc, and the three-dimensional velocities are distributed in two components with characteristic speeds of 175 `19 km s~1 and 700 `300 km s~1, representing D86% and D14% of the population, respectively. The sample ~24 velocities are inconsistent ~132 with a singlecomponent Gaussian model and are well described by a two-component Gaussian model but do not require models of additional complexity. From the best-ðt distribution, we estimate that about 20% of the known pulsars will escape the Galaxy, assuming an escape speed of 500 km s~1. The best-ðt, dualcomponent model, if augmented by an additional, low-velocity (\50 km s~1) component, tolerates, at most, only a small extra contribution in number, less than 5%. The best three-component models do not show a preference for Ðlling in the probability distribution at speeds intermediate to 175 and 700 km s~1 but are nearly degenerate with the best two-component models. We estimate that the high-velocity tail ([1000 km s~1) may be underrepresented (in the observed sample) by a factor D2.3 owing to selection e ects in pulsar surveys. The estimates of scale height and velocity parameters are insensitive to the explicit relation of chronological and spindown ages. A further analysis starting from our inferred velocity distribution allows us to test spindown laws and age estimates. There exist comparably good descriptions of the data involving di erent combinations of braking index and torque decay timescale. We Ðnd that a braking index of 2.5 is favored if torque decay occurs on a timescale of D3 Myr, while braking indices D4.5 ^ 0.5 are preferred if there is no torque decay. For the sample as a whole, the most probable chronological ages are typically smaller than conventional spindown ages by factors as large as 2. We have also searched for correlations between three-dimensional speeds of individual pulsars and combinations of spin period and period derivative. None appears to be signiðcant. We argue that correlations identiðed previously between velocity and (apparent) magnetic moment reñect the di erent evolutionary paths taken by young, isolated (nonbinary), high-ðeld pulsars and older, low-ðeld pulsars that have undergone accretion-driven spinup. We conclude that any such correlation measures di erences in spin and velocity selection in the evolution of the two populations and is not a measure of processes taking place in the core collapse that produces neutron stars in the Ðrst place. We assess mechanisms for producing high-velocity neutron stars, including disruption of binary systems by symmetric supernovae and neutrino, baryonic, or electromagnetic rocket e ects during or shortly after the supernova. The largest velocities seen (D1600 km s~1), along with the paucity of low-velocity pulsars, suggest that disruption of binaries by symmetric explosions is insufficient. Rocket e ects appear to be a necessary and general phenomenon. The required kick amplitudes and the absence of a magnetic ÐeldÈ velocity correlation do not yet rule out any of the rocket models. However, the required amplitudes suggest that the core collapse process in a supernova is highly dynamic and aspherical and that the impulse delivered to the neutron star is larger than existing simulations of core collapse have achieved. Subject headings: binaries: close È pulsars: general È stars: distances È stars: evolution È stars: kinematics È stars: neutron 315 Neutron stars = runaway popula7on <V> ~ 500 km/s Escape velocity from the Milky Way ~ 500 km/s at the solar circle Bayesian analysis of a pulsar sample from the MW popula7on of NS Measurements = proper mo7ons and distance proxies Model takes into account decelera7on in the Galac7c poten7al No a priori model for the likelihood func7on

21 4.3. Parameterization and L ikelihood Function The likelihood function for the parameters is the product over N pulsars, psr Npsr L \ < Lk, (24) k/1 where the likelihood factor, L 4 f (l8 for each k Òz k o nü k, D k *, t k ), pulsar is simply equation (23) evaluated using the measured proper motions (and errors) along with the distance constraints D, D, the direction nü, and the age t. To apply L equation U (23), we adopt a parametric approach, where we assume particular forms for the PDFs in z and 0 V(P). SpeciÐcally, we assume that z and V(P) have a multicomponent 0 Gaussian PDF of the form 0 0 ng f (z, V(P)) \ ; wj g (z, h )g (V (P), p ). (25) 0 0 1d 0 zj 3d 0 Vj j/1 In equation (25), g (p, p) is a standard one-dimensional 1d Gaussian PDF with zero mean and standard deviation p, g (p, p) \ (2np2)~1@2 exp ([p2/2p2), while g is a three- dimensional 1d Gaussian function, g (q, p) \ (2np2)~3@2 3d exp ([q2/2p2). The weights w sum to unity. 3d The PDF is deðned so that f dz d3v (P) is the inðnitesimal j probability. 0 0 We have chosen a multicomponent Gaussian model because its analytical properties allow it to Ðt a wide range of shapes for the actual distributions in z and V(P). There is 0 not necessarily an implied physical basis for this choice of form: the di erent Gaussian velocity components need not correspond to di erent population components.

22 4.4. Results and Comparison of Models Using Odds Ratios The parameters to be determined for a pulsar sample are (1) n standard deviations for velocities, p, (2) n ¹ n scale heights g for the birth altitude, h, and (3) n Vj [ 1 weights, h g w, for a total of n ] n [ 1 parameters. zj g j g h We considered a set of models with increasingly complex velocity and birth height distributions. We label the models using n.n as the model number, where the Ðrst number g h represents the value of n and the second represents the g value of n, and brieñy describe them below: h Model 1.1.Èa single-component model (n \ n \ 1), Model 2.1.Èa two-component velocity g model h with a single scale height (n \ 2, n \ 1), g h Model 2.2.Èa model with two scale height and two velocity components (n \ 2, n \ 2), and Model 3.1.Èa g three-component h velocity model with a single scale height (n \ 3, n \ 1). g h We assume Ñat priors for the parameters in selected ranges that are listed in Table 2. We generated the posterior probability distribution of parameters, which is just the

23 No. 1, 1998 NEUTRON STAR POPULATION DYNAMICS. II. 323 TABLE 2 PARAMETER SEARCH RANGES Parameter Minimum Maximum Range w *w 1 \ 1 w *w 2 \ 1 h z1 (kpc) *h z1 \ 0.5 h z2 (kpc) *h z2 \ 0.5 p V1 (km s~1) *p V1 \ 2000 p V2 (km s~1) *p V2 \ 1800 p V3 (km s~1) *p V3 \ 300 evaluation of the likelihood in the selected ranges. The modes of the posterior, i.e., the maximum likelihood results, and log L appear in Table 3. These are the best ÏÏ values of parameters for each model. The analysis was repeated for several age cuts for the sample: q \ 1, 5, and 10 Myr (for Smax n \ 3 and no torque decay). The mode of the distribution is typically well deðned. For example, in Figure 4 we display contours of log L (log likelihood) for model 1.1 with q \ 10 Myr. The mode is identiðed by the cross. In Smax Figure 5 we display contours for model 2.1 in twodimensional surfaces that pass through the mode. The results in Table 3 clearly suggest the presence of a signiðcant high-velocity component when the models are sufficiently complex. At the same time, examination of the likelihood contours in the simplest model does not provide any obvious indications that its single 300 km s~1 component is inadequate. To make such a determination, we must compare the models in a systematic manner. To compare models, we use the odds ratio (cf. Gregory & Loredo 1992) to quantify goodness of Ðt while penalizing models with more parameters. We assume that all models are equally probable, a priori. The odds ratio reduces to the Bayes factor,ïï which is the ratio of global likelihoods of two models (Gregory & LoredoÏs eq. [2.12]). The global likelihood for a model M given data is P FIG. 4.ÈContour plot of the log likelihood function for the singleè Gaussian component model (n \ 1, n \ 1) as a function of the rms velocity and scale height. Contours are g spaced h by log 2. P

24 FIG. 5.ÈContour plots of the log likelihood function for the doubleègaussian comp parameters and for two-dimensional slices through the best-ðt model. Contours are spaced We take the singleègaussian component model with two parameters as our reference model: M The odds ratio for the Mth model relative to M becomes f (D o M) O 4 M,M1 f (D o M ) 4 " V M M1, (28) " V 1 M1 M which we evaluate through numerical integration of the likelihood function for each model over a uniform grid in parameter space with bounds given in Table 2. The associated volumes for the four models are V 1.1 \ *h z1 *p V1, V 2.1 \ V 1.1 *w 1 *p V2 A 2.1, V 2.2 \ V 1.1 *w 1 *p V2 *h z2 A 2.2, and V 3.1 \ V 1.1 *w 1 *w 2 *p V2 *p V3 A 3.1, (29) where A, A, and A ¹ 1 are factors that account for the overlap 2.1 in 2.2 our search 3.1 of parameter space for the maxim models The the odd the da (model (model 2.2 and This co sidered numbe power these re scale h ponent D175 k To r tion of margin.ècontour plots of the log likelihood function for the doubleègaussian component model (n \ 2, n \ 1) as a function of values for pairs of rs and for two-dimensional slices through the best-ðt model. Contours are spaced by log 2. g h

25 Bayes factor,ïï which is the ratio of global likelihoods of two models (Gregory & LoredoÏs eq. [2.12]). The global likelihood for a model M given data D is f (D o M) \ P dhf h (h o M)L(h), (26) where f (h o M) is the prior PDF for model parameters h and L(h) is h the likelihood function as we have used it in this paper. We have already assumed the prior PDF is Ñat, and we have seen that L(h) is strongly peaked. Thus, letting hü be ity and scale height. Contours are spaced by log 2. the parameters that maximize L(h), f (D o M) D f h (hü o M) P dhl(h) 4 " M VM, (27) where the last equation deðnes the integrated likelihood, " 4/ dhl(h), and the volume in parameter space that is searched, M V \ [ f (hü o M)]~1. M h TABLE 3 THREE-DIMENSIONAL VELOCITY PDF MODELS Model h z1 h z2 p V1 p V2 p V3 (n g.n h ) w 1 w 2 (kpc) (kpc) (km s~1) (km s~1) (km s~1) N parms log L Odds q S \10 Myr; N psr \ [ [ [ [ q S \5 Myr; N psr \ [ [ q S \1 Myr; N psr \ [ [

26 NEUTRON STAR POPULATION DYNAMICS. II. TABLE 4 BEST-FIT PARAMETERS FOR MODEL 2.1 Parameter Value w ` ~0.12 h (kpc) `0.04 z1 ~0.02 p (km s~1) `19 V1 ~24 p (km s~1) `295 V2 ~132 Marginalized PDFs for individual parameters of the best- fit model È n \

27 Least Squares Examples I. A polynomial model for a times series y i with errors i : k k y i = X ij θ j + i = t j 1 i θ j + i. j=1 j=1 The model is linear in the parameters even though it is nonlinear in the independent variable, t i. If the polynomial order is p, then k = p +1and The design matrix and parameter vector are: X = 1 t 1 t 2 1 t p 1 1 t 2 t 2 2 t p 2. 1 t n t 2 n t p n θ = θ 1 θ 2. θ p+1 Define T k = n t k j and tk y = i n t k i y i i=1 Need 1/n in front of last sum 1

28 Then the product of the design matrix with itself is the k k = (p +1) (p +1)matrix and T 0 T 1 T 2 T p T 1 T 2 T 3 T p+1 X X = T 2 T 3 T 4 T p T p T p+1 T p+2 T 2p X y = X y 1 y 2. y n = n i=1 y i n t i y i i=1. n t p i y i i=1 n The least-squares solution ˆθ =(X X) 1 X y requires the inverse of X X that will exist if the determinant is nonzero. y tỵ. t p y 2

29 First-order polynomial: something we can easily solve. y i = θ 1 + θ 2 t i X X = T0 T 1 T 1 T 2 X 1 1 X = det(x (matrix of cofactors) X) 1 T2 T = 1 (T 0 T 2 T1 2) T 1 T 0 3

30 Then ˆθ =(X X) 1 X y = n (T 0 T 2 T 2 1 ) T2 T 1 T 1 T 0 y ty = n (T 0 T 2 T 2 1 ) yt 2 tyt 1 yt 1 + tyt 0 n (nt 2 T 2 1 ) yt 2 tyt 1 yt 1 + tyn So the individual parameters are ˆθ 1 = n yt 2 tyt 1 (nt 2 T1 2) and ˆθ2 = n nty yt1 (nt 2 T1 2). 4

31 Assuming the errors i are stationary and statistically independent with variance 2 i = σ2, the covariance matrix of the parameters is where P = σ 2 X X 1 = σ θ1 σ θ2 ρ θ1 θ 2 θ 2 1 θ 1 θ 2 T 2 = σ nt 2 T1 2 n = σ nt 2 T1 2 θ 1 θ 2 σ 2 θ 1 σ θ1 σ θ2 ρ θ1 θ 2 θ1 2 σ θ1 σ θ2 ρ θ1 θ 2 σθ 2 2 1/2 1/2 = θ 1θ 2 σ θ1 σ θ2 = T 1 nt2 (negatively correlated) 5

32 For n 1 and uniform sampling t i = i, i =1,...,n, so σ θ1 2σ n T 1 n 2 /2 and T 2 n 3 /3 σ θ2 12σ n 3/2 ρ θ1 θ 2 n2 /2 n n 3 /3 = (highly anticorrelated) The anticorrelation means that any error in one parameter is compensated by the error in the other. 6

33 Better parameterization for the first-order polynomial: orthogonal polynomials. E.g. y i = θ 1 + θ 2 (t i t) where t = 1 n n i=1 Now the design matrix and the various products are X = 1 t 1 t 1 t 2 t. 1 t n t and the solution is now, T0 T X X = 1 T0 0 =, T 1 T 2 0 T 2 t i X X 1 1 T2 0 = T 0 T 2 0 T 0 n(t t)y θ 1 = y, θ 2 = T 2 The errors on θ 1,2 are the same but the parameters are now uncorrelated, ρ θ1 θ2 =0. 7

34 II. Sinusoids Consider the linear model y = Xθ + where X comprises complex exponentials and the parameter vector θ comprises Fourier amplitudes: X nm = e 2πinm/N, n =0,...,N 1, m =0,...,k 1 WN nm where W N e 2πi/N is the N th root of 1 on the unit circle. for k = N we have X = W N WN k 1 WN 2 WN 2k W N 1 N W (N 1)k N k = N W N WN N 1 1 WN 2 W 2(N 1) N WN N 1 W (N 1)2 N 8

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal