Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1

Parameter estimation

Properties of estimators 3

An estimator for the mean 4

An estimator for the variance 5

The Likelihood function via example We have a data set given by N data pairs (x i, y i ±σ i ) graphically represented below. The goal is to determine the fixed, but unknown, µ = f(x). σ is known or estimated from the data. 6

Gaussian probabilities (least-squares) We assume that at a fixed value of x i, we have made a measurement y i and that the measurement was drawn from a Gaussian probability distribution with mean y(x i ) = a + bx i and variance σ i. f (y i ;a,b) = e (y i y(x i )) 1 σ i L = N 1 πσ i πσ i e (y i a bx i ) σ i y(x i ) = a + bx i χ = ln L + k = N (y i a bx i ) σ i 7

χ minimization χ a = N (y i a bx i ) = 0 σ i χ b = N x i (y i a bx i ) = 0 σ i a N 1 σ i + b N x i σ i = N y i σ i a N x i σ i + b N x i σ i = N x i y i σ i 8

Solution for linear fit For simplicity, assume constant σ = σ i. Then solve two simultaneous equations for unknowns: a = y i x i x i x i y i ( ) N x i x i b = N x y x i i i y i N x i x i ( ) Parameter uncertainties can be estimated from the curvature of the χ function. V[a] χ a V[b] χ b 1 θ = ˆ θ 1 = = θ = ˆ θ σ x i ( ) N x i x i Nσ ( ) N x i x i 9

Parameter uncertainties In a graphical method the uncertainty in the parameter estimator θ 0 is obtained by changing χ by one unit. χ (θ 0 ±σ θ ) = χ (θ 0 ) + 1 In general, using maximum likelihood lnl(θ 0 ±σ θ ) = lnl(θ 0 ) 1/ 10

The Likelihood function via example What does the fit look like? ROOT fit to pol1 Additional information about the fit: χ and probability Were the assumptions for the fit valid? This question is addressed by exploring the significance and the goodness of the fit 11

Testing significance/goodness of fit Quantify the level of agreement between the data and a hypothesis without explicit reference to alternative hypotheses. This is done by defining a goodness-of-fit statistic, and the goodness-of-fit is quantified using the p-value. For the case when the χ is the goodness-of-fit statistic, then the p-value is given by p = f (χ ;n d )dχ χ obs The p-value is a function of the observed value χ obs and is therefore itself a random variable. If the hypothesis used to compute the p-value is true, then p will be uniformly distributed between zero and one. 1

χ distribution Gaussian-like 13

p-value for χ distribution 14

Using the goodness-of-fit Data generated using Y(x) = 6.0 + 0. x, σ = 0.5 Compare three different polynomial fits to the same data. y(x) = p 0 + p 1 x y(x) = p 0 y(x) = p 0 + p 1 x + p x 15

χ /DOF vs degrees of freedom 16

More about the likelihood method Recall likelihood for least squares: L = N 1 πσ i e (y i a bx i ) σ i But the probability density depends on application L = N f (y i ; parameters) Proceed as before maximizing lnl (χ has minus sign). The values of the estimated parameters might not be very different, but the uncertainties can be greatly affected. 17

Applications L = N 1 πσ i e (y i a bx i ) σ i a) σ i = constant b) σ i = y i c) σ i = Y(x) Poisson distribution (see PDG Eq. 3.1) L = N ( a bx i) n i e (a bx i ) n i! Stirling s approx ln(n!) ~ n ln(n) - n ln L = N i 1 [( a + bx ) i n i + n i ln[n i /(a + bx i )]] 18

Exercise 3 - Linear Fits Assume a parent distribution of the form y(x) = a + bx, a=5, b=1 Assume one experiment collects a data set of ten points of the form (x i, y i ±σ), i=0,1,,...9, with the measurements y i following a Gaussian distribution with a fixed width σ=0.5. Invent the data points y i for one experiment. Fit the data y i to the form y = a + bx. Determine y and the uncertainty of y as a function of x from the fit. 19

Linear Fit one experiment Fit for one experiment showing the fitted parameters Uncertainty on σ y can be computed using σ y =σ a +x σ b +xσ ab 0

Linear Fit 1

Linear Fit Covariance Matrix

Fitted Results to 1000 experiments For each fit, plot the fitted value of the intercept and slope. Fit the distributions to Gaussian functions Mean = 5.004 ± 0.010 Sigma = 0.303 What is the relation between these two? Mean = 0.9997 ± 0.0018 Sigma = 0.0566 3

Plot difference between fitted and true values Fit Gaussian to slices y(x)-y fit σ ab =0 Uncertainty on σ y can be computed using σ y =σ a +x σ b +xσ ab Correlation term is important 4

Statistical tests In addition to estimating parameters, one often wants to assess the validity of statements concerning the underlying distribution Hypothesis tests provide a rule for accepting or rejecting hypotheses depending on the outcome of an experiment. [Comparison of H 0 vs H 1 ] In goodness-of-fit tests one gives the probability to obtain a level of incompatibility with a certain hypothesis that is greater than or equal to the level observed in the actual data. [How good is my assumed H 0?] Formulation of the relevant question is critical to deciding how to answer it. 5

Summary of second lecture Parameter estimation Illustrated the method of maximum likelihood using the least squares assumption Use of goodness-of-fit statistics to determine the validity of underlying assumptions used to determine parent parameters 6