F & B Approaches to a simple model

Size: px

Start display at page:

Download "F & B Approaches to a simple model"

Marianna Owens
5 years ago
Views:

1 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Lecture 11 Applications: Model comparison Challenges in large-scale surveys Reading: Gregory chapters 6,7; Hancock article Assignment 3: on web site tomorrow - analyze F test for M1, M2 - analyze Bayesian odds ratio for M1, M2 F & B Approaches to a simple model M1 = constant y = a M2 = line y = a + b x M1 is a subset of M2 (b=) Questions: How do we assess that M2 is a better model than M1? i.e. when is the added complexity of M2 needed? Evaluate χ 2 for each model Frequentist: F test with F 12 = χ 1 2 / χ 2 2 and test significance Bayesian: Odds ratio with Bayes factor 1

2 Synopsis Linear model and additive noise n Weighted least squares Parameter vector estimate (point wise) Likelihood function ç è chi2 surface Covariance matrix of noise C n Calculable from the autocorrelation function (same information) Covariance matrix of parameters Assumptions: C n and PDF for n = noise known Parameter vector, chi2 = random variables F12 = ratio of reduced chi2 for comparing models Example data sets For different number of data points (N) and signal to noise ratio (SNR): Time series, fit, and residuals Chi 2 and Likelihood surfaces for M2 Histograms of residuals for M1 and M2 Histograms of parameter values for M1 and M2 Histogram of F 12 Note how these quantities vary with N, SNR Why do histograms have the shapes they have? Difference between best fit and true parameter values 2

3 Residuals Data and fit D= x Line Fit = x Constant Fit = 1.61 Data S/N = 1 Line fit Constant fit.6 N = 1 χ 2 l = 14.5 χ 2.4 =1.1 rl χ2 c = χ 2 = 15.4 rc x 25 D= x N = 1 SNR = Counts Residual.6..6 Residual 3

D=.51 +.11x N = 1 SNR = 1. 2 <ac > =1.6 2 <al > =.599 2 <bl > =.11 15 15 15 Counts 1 1 1 5 5 5 1.54 1.66.495.525.18.

4 D= x N = 1 SNR = 1. 2 <ac > =1.6 2 <al > = <bl > = Counts ac (constant fit) al (line fit) bl in fit for line χ 2 (θ) a,b =.51,.11 SNR = 1. N = b a

5 log L(θ) a,b =.51,.11 SNR = 1. N = b a D= x N = 1 SNR = 1. <F12 > = F12 (constant fit):(line fit) 5

6 Residuals 1.8 D= x Line Fit = x Constant Fit = Data S/N = Line fit 1.2 Constant fit N = 1 χ 2 l = 99.9 χ 2 =1. rl χ2 c = 157. χ 2 = 1.7 rc Data and fit x 35 D= x N = 1 SNR = Counts Residual.4..5 Residual 6

7 D= x N = 1 SNR = 1. 2 <ac > = <al > =.592 <bl > = Counts ac (constant fit).2 1. al (line fit).4.18 bl in fit for line D= x N = 1 SNR = <χ 2 c > = <χ 2 l > = χ 2 c (constant fit) <χ 2 > = rc 15 3 χ 2 l (line fit) <χ 2 > =.8863 rl Reduced χ 2 (Constant fit) rc Reduced χ 2 (line fit) rl 7

log L(θ) a,b =.51,.11 SNR = 1. N = 1.2 8.18 16 b.16.14.12.1.8.6.3.4.5.6.7.8.9 1. a 24 32 4 48 56 Data and fit Residuals 4 3 2 1 D=.51 +.11x Line Fit =.

8 log L(θ) a,b =.51,.11 SNR = 1. N = b a Data and fit Residuals D= x Line Fit = x Constant Fit = 1.29 Data S/N = 1 Line fit Constant fit 1 2 N = 1 χ 2 l = 9.2 χ 2 =.9 rl χ2 c = 12. χ 2 =1. rc x 8

9 7 D= x N = 1 SNR = Counts Residual 6 6 Residual D= x N = 1 SNR = 1. <ac > =1.657 <al > =.5125 <bl > = Counts ac (constant fit).5 2. al (line fit).1.35 bl in fit for line 9

D=.51 +.11x N = 1 SNR = 1. 15 <χ 2 c > =19.298 15 <χ 2 l > =97.8938 1 1 5 5 2 2 χ 2 c (constant fit) <χ 2 > =1.139 rc 15 2 18 χ 2 l (line fit) 2 <χ 2 > =.

10 D= x N = 1 SNR = <χ 2 c > = <χ 2 l > = χ 2 c (constant fit) <χ 2 > =1.139 rc χ 2 l (line fit) 2 <χ 2 > =.9888 rl Reduced χ 2 (Constant fit) rc.2 2. Reduced χ 2 (line fit) rl log L(θ) a,b =.51,.11 SNR = 1. N = 1 b a

11 Data and fit Residuals D = x Line Fit = x Constant Fit =.51 Data S/N = Line fit Constant fit 2 3 N = 1 χ 2 l = 14.7 χ 2 =1.1 rl χ2 c = 18.6 χ 2 =1.1 rc x 7 D= x N = 1 SNR = Counts Residual 6 6 Residual 11

D=.51 +.11x N = 1 SNR =. <ac > =.8666 2 <al > =1.128 <bl > =-.

12 D= x N = 1 SNR =. <ac > = <al > =1.128 <bl > = Counts ac (constant fit) al (line fit) 2 2 bl in fit for line χ 2 (θ) a,b =.51,.11 SNR =. N = b a

log L(θ) a,b =.51,.11 SNR =. N = 1 5.421 1 1 b.2.18.16.14.12.1.8.6.3.4.5.6.7.8.9 1. a..1.2.3.4.5.6.7.8 D =.51 +.11x N = 1 SNR =. D =.51 +.11x N = 1 SNR = 1. <F12 > =1.11 16 <F12 > =1.

13 log L(θ) a,b =.51,.11 SNR =. N = b a D = x N = 1 SNR =. D = x N = 1 SNR = 1. <F12 > = <F12 > = F12 (constant fit):(line fit) F12 (constant fit):(line fit) 3 25 D = x N = 1 SNR = 1. <F12 > = D = x N = 1 SNR = 1. <F12 > = F12 (constant fit):(line fit) F12 (constant fit):(line fit) 13

14 Modeling Suppose we have the following y = data vector x = independent variable Criteria for Modeling ŷ(θ) = model for data with parameters θ f y (y; θ) = multivariate PDF for the data. ˆθ = vector of parameters that yield the best fit of the model to the data according to some criterion or ˆθ = parameters of a probability density function for the data. Suppose we have a model for the data that consists of parameters θ. These might be parameters of a time series or parameters of a PDF for the data. 1 14

15 (1) Least squares minimize with respect to θ: Q(θ) j [y i ŷ i (θ)] 2 (2) Maximum likelihood: For the data {y i,i =1,N} suppose the joint PDFis modeled as or known to be f y (y; θ) After obtaining the N data points, we view the PDF as the likelihood of getting those actual data points given a choice of the parameters, θ. i.e. L (θ) =f y (y; θ) and those values of θ that maximize L (θ) are maximum likelihood estimators for θ. 2 Measures of Optimality For a given parameter and data set, there may be several or many possible estimators. We need quantitative guidelines on how to choose the best one. The criteria often used include: Consistency: (convergence) Let n = sample size lim n ˆθ θ = Unbiased: ˆθ = θ or B = ˆθ θ; consistency unbiased because for finite n, the estimator can be biased even though the bias as n. (e.g. maximum likelihood estimators) Minimum variance: Var (ˆθ) ˆθ 2 ˆθ 2 if Var ˆθ is minimized, the resultant estimator yields the least variation of estimates (note, however, that the MV estimator may be biased) 3 15

16 Mean-square error: MSE = ˆθ θ 2 = [ˆθ ˆθ + ˆθ θ] 2 =Var(ˆθ)+B 2 Efficiency: A well-designed experiment yields data that are all used in an estimate. An inefficient experiment yields superfluous data. Thus, experiments should be designed vis á vis an estimator. As an example, an estimator is said to be mean-square efficient if no other estimator has a smaller cost function: (ˆθ θ) 2 < (ˆθ θ) 2 where θ is any other estimator of θ. 4 Sufficiency: if ˆθ is a sufficient statistic it contains all information obtainable from a sample on θ. Formally, consider an estimator ˆθ and another ˆθ and the conditional distribution F (ˆθ ˆθ) P {ˆθ < some number ˆθ} If ˆθ is not a function of ˆθ and if F (ˆθ ˆθ) is independent of θ (actual value) then ˆθ is a sufficient statistic. More transparent is an example! Suppose you have N data points, {x n,n=1,n} that you know are distributed as N(µ, σ). You calculate the sample mean and standard deviation, N ˆµ = N 1 x n n=1 ˆσ 2 = 1 N (x n ˆµ) 2. N n=1 5 16

17 In calculating the likelihood function L (µ, σ) for µ and σ it can be shown that L depends only on ˆµ and ˆσ 2. Thus ˆµ and ˆσ 2 contain all the information needed for likelihood inference and are thus sufficient statistics. 6 17

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal