Statistics, Data Analysis, and Simulation SS 2015

Size: px

Start display at page:

Download "Statistics, Data Analysis, and Simulation SS 2015"

Reynard Barker
5 years ago
Views:

1 Statistics, Data Analysis, and Simulation SS Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, June 2, 2015 Dr. Michael O. Distler Statistics, Data Analysis, and Simulation SS / 25

2 Least squares method History: Introduced by Legendre, Gauß, and Laplace at the beginning of the 19th century. Therefore the least squares method is older than the more general maximum likelihood method. From now on the measured values which have the property of random variables (data) will be labeled y i. n measurements of x will give y 1, y 2,..., y n : y i = x + ɛ i ɛ i are the deviations y i x (statistical error). Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

3 Least squares method The measured values deviate from the true values. The size of this difference is parameterized by the standard deviation σ. The y i are a statistical sample which can be described by a probability distribution function. We also need a functional description (model) for the true values. This model may depend on additional variables a j called parameters. These parameters cannot be measured directly. The model is given as one or more equations f (a 1, a 2,..., a p, y 1, y 2,..., y n ) = 0 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

4 Least squares method The model can be used to find corrections y i to the measured values y i. The method of least squares requires the sum of squares of the residuals y i to be minimal. For the simplest case of uncorrelated data with equal standard deviation this means: S = n i=1 y 2 i = Minimum indirect measurement of the parameters. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

5 Least squares method The least squares method has a number of optimal statistical properties and often leads to easy solutions. Other rules are possible, but generally lead to complicated solutions. n y i = minimum or max y i = minimum i=1 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

6 Least squares method General case: The Data is written as a n-vector y. Different standard deviations and correlations are taken care of by use of a covariance matrix V. Least squares method using matrices: S = y T V 1 y Here y is the vector of residuals. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

7 Least squares method Example: In wine-growing the amount of wine harvested in autumn is measured in tons per 100 m 2 (t/ar). It is known that the annual yield can be predicted fairly well in July, by determining the average number of berries which have been formed per bunch. year yield (y i ) cluster (x i ) Ertrag/(t/ar) y Clusterzahl x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

8 Least squares method Straight line fit f (x) = a + b x using gnuplot: degrees of freedom (FIT_NDF) : 10 rms of residuals (FIT_STDFIT) = sqrt(wssr/ndf) : variance of residuals (reduced chisquare) = WSSR/ndf : Final set of parameters Asymptotic Standard Error ======================= ========================== a = / (76.23%) b = / (14.11%) correlation matrix of the fit parameters: a b a b Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

9 Parameter estimation Estimation of the parameters a from measured data using a linear model. The parameter vector a consists of p elements a 1, a 2,..., a p. The measured values form the vector y (n random variables, y 1, y 2,..., y n ). The estimation value of y is a function of the variable x: y(x) = f (x, a) = a 1 f 1 (x) + a 2 f 2 (x) a p f p (x). The estimation value of each single measurement y i is E[y i ] = f (x i, ā) = ȳ i Here the elements of ā are the true values of the parameters a. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

10 Parameter estimation The residuals have nice properties if a = ā: r i = y i f (x i, a) E[r i ] = 0 E[r 2 i ] = V [r i ] = σ 2 i. The only requirements are that the p.d.f. of the residuals is unbiased and has finite variance. So it is not required, that the residuals are Gaussian distributed. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

11 Least squares: Normal equations For now, all data has the same variance and is uncorrelated. Following the principle of least squares we minimize the sum of squares of the residuals varying the parameters a 1, a 2,..., a p : S = n i=1 r 2 i = n (y i a 1 f 1 (x i ) a 2 f 2 (x i )... a p f p (x i )) 2 i=1 Conditions for the minimum: S n = 2 f 1 (x i ) (a 1 f 1 (x i ) + a 2 f 2 (x i ) a p f p (x i ) y i ) = 0 a 1 i= S n = 2 f p (x i ) (a 1 f 1 (x i ) + a 2 f 2 (x i ) a p f p (x i ) y i ) = 0 a p i=1 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

12 Least squares: Normal equations We then write down the conditions as normal equations a 1 f1 (x i ) a p f1 (x i )f p (x i ) = a 1 f2 (x i )f 1 (x i ) a p f2 (x i )f p (x i ) = y i f 1 (x i ) y i f 2 (x i )... a 1 fp (x i )f 1 (x i ) a p fp (x i ) 2 = y i f p (x i ) The solution of these normal equations is the least square estimate of the parameters a 1, a 2,..., a p. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

13 Least squares: Matrix formalism Matrix formalism and matrix algebra simplify the computation. The n p values f j (x i ) form a n p matrix. The p parameters a j and the n measured values y i form column vectors. A = f 1 (x 1 ) f 2 (x 1 )... f p (x 1 ) f 1 (x 2 ) f 2 (x 2 )... f p (x 2 ) f 1 (x n ) f 2 (x n )... f p (x n ) a = a 1 a 2... a p y = y 1 y y n Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

14 Least squares: Matrix formalism The n-vector of the residuals is r = y Aa. Considering the sum S we get S = r T r = (y Aa) T (y Aa) = y T y 2a T A T y + a T A T Aa Conditions for the minimum 2A T y + 2A T Aâ = 0 or using normal equations (A T A)â = A T y The solution only requires standard methods of matrix algebra: â = (A T A) 1 A T y Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

15 Least squares: Matrix formalism The covariance matrix is the quadratic n n matrix var(y 1 ) cov(y 1, y 2 )... cov(y 1, y n ) V[y] = cov(y 2, y 1 ) var(y 2 )... cov(y 2, y n ) cov(y n, y 1 ) cov(y n, y 2 )... var(y n ) Here the covariance matrix is just a diagonal matrix V[y] = σ σ σ 2 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

16 Least squares: Matrix formalism Since the parameters are a linear function of the data â = By we can use the standard error propagation: with B = (A T A) 1 A T we get V[â] = BV[y]B T V[â] = (A T A) 1 A T V[y]A(A T A) 1 Here we have equal errors for all data points V[â] = σ 2 (A T A) 1 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

17 Least squares: Matrix formalism The sum Ŝ of squares of the residuals in the minimum is Ŝ = y T y 2â T A T y + â T A T A(A T A) 1 A T y = y T y â T A T y. The expectation value of E[Ŝ] is E[Ŝ] = σ2 (n p). If the variance of the data is not known, we can use Ŝ to get an estimate ˆσ 2 = Ŝ/(n p). This is a good estimate for large values of (n p). Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

18 Least squares: Matrix formalism After estimating the parameters using the linear least squares method we can calculate f (x) for arbitrary x ŷ(x) = f (x, â) = p â j f j (x). j=1 For the values x i which belong to the measured values y i we will get the predicted values using ŷ = Aâ. Using error propagation one gets the associated covariance matrix V[ŷ] = AV[a]A T = σ 2 A(A T A) 1 A T Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

19 Least squares: Matrix formalism If the data points are independent the covariance matrix is σ V[y] = 0 σ σn 2 The sum of squares of the residuals is now S = i r 2 i σ 2 i = Minimum We define a weight matrix W(y) which is the inverse of the covariance matrix 1/σ W(y) = V[y] 1 = 0 1/σ /σn 2 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

20 Least squares: Matrix formalism The sum of squares of the weighted residuals is S = r T W(y)r = (y Aa) T W(y)(y Aa) which has to minimized. One gets â = (A T WA) 1 A T Wy V[â] = (A T WA) 1 The sum of squares of the residuals for a = â is Ŝ = y T Wy â T A T Wy with expectation value E[Ŝ] = n p. The covariance matrix for the predicted values is V[ŷ] = A(A T WA) 1 A T Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

21 Least squares: Linear regression For the linear regression we fit the function y = f (x, a) = a 1 + a 2 x. The data y i has been taken at certain values of x i. A = a = ( a1 a 2 1 x 1 1 x 2 1 x x n ) y = V = y 1 y 2 y 3... y n σ σ σ σ 2 n W = V 1 w ii = 1 σ 2 i Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

22 Least squares: Linear regression Solution: ( ) ( A T wi wi x WA = i S1 S wi x i wi xi 2 = x S x S xx ( ) ( ) A T wi y Wy = i Sy = wi x i y i ( S1 S x S x S xx ) ( a1 a 2 ) = S xy ( Sy S xy ) ) ( S1 S x S x S xx â = (A T WA) 1 A T Wy V[â] = (A T WA) 1 ) 1 = 1 ( Sxx S x D S x S 1 ) with D = S 1 S xx S 2 x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

23 Least squares: Linear regression The estimate for the parameters is â 1 = (S xx S y S x S xy )/D â 2 = ( S x S y S 1 S xy )/D and the covariance matrix is V[â] = 1 ( Sxx S x D S x S 1 ). For the sum of squares of residuals one gets Ŝ = S yy â 1 S y â 2 S xy For the predicted value ŷ = â 1 + â 2 x we get the variance by calculating: V [ŷ] = V [â 1 ] + x 2 V [â 2 ] + 2xV [â 1, â 2 ] = (S xx 2xS x + x 2 S 1 )/D Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

24 Least squares method Example: In wine-growing the amount of wine harvested in autumn is measured in tons per 100 m 2 (t/ar). It is known that the annual yield can be predicted fairly well in July, by determining the average number of berries which have been formed per bunch. year yield (y i ) cluster (x i ) yield/(t/ar) y clusters x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

25 Least squares method: Wine-growing example yield/(t/ar) y clusters x a 1 = ± a 2 = ± Errorband : err(x) = x ± x x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler