Statistics, Data Analysis, and Simulation SS 2015

Size: px
Start display at page:

Download "Statistics, Data Analysis, and Simulation SS 2015"

Transcription

1 Statistics, Data Analysis, and Simulation SS Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, June 2, 2015 Dr. Michael O. Distler Statistics, Data Analysis, and Simulation SS / 25

2 Least squares method History: Introduced by Legendre, Gauß, and Laplace at the beginning of the 19th century. Therefore the least squares method is older than the more general maximum likelihood method. From now on the measured values which have the property of random variables (data) will be labeled y i. n measurements of x will give y 1, y 2,..., y n : y i = x + ɛ i ɛ i are the deviations y i x (statistical error). Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

3 Least squares method The measured values deviate from the true values. The size of this difference is parameterized by the standard deviation σ. The y i are a statistical sample which can be described by a probability distribution function. We also need a functional description (model) for the true values. This model may depend on additional variables a j called parameters. These parameters cannot be measured directly. The model is given as one or more equations f (a 1, a 2,..., a p, y 1, y 2,..., y n ) = 0 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

4 Least squares method The model can be used to find corrections y i to the measured values y i. The method of least squares requires the sum of squares of the residuals y i to be minimal. For the simplest case of uncorrelated data with equal standard deviation this means: S = n i=1 y 2 i = Minimum indirect measurement of the parameters. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

5 Least squares method The least squares method has a number of optimal statistical properties and often leads to easy solutions. Other rules are possible, but generally lead to complicated solutions. n y i = minimum or max y i = minimum i=1 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

6 Least squares method General case: The Data is written as a n-vector y. Different standard deviations and correlations are taken care of by use of a covariance matrix V. Least squares method using matrices: S = y T V 1 y Here y is the vector of residuals. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

7 Least squares method Example: In wine-growing the amount of wine harvested in autumn is measured in tons per 100 m 2 (t/ar). It is known that the annual yield can be predicted fairly well in July, by determining the average number of berries which have been formed per bunch. year yield (y i ) cluster (x i ) Ertrag/(t/ar) y Clusterzahl x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

8 Least squares method Straight line fit f (x) = a + b x using gnuplot: degrees of freedom (FIT_NDF) : 10 rms of residuals (FIT_STDFIT) = sqrt(wssr/ndf) : variance of residuals (reduced chisquare) = WSSR/ndf : Final set of parameters Asymptotic Standard Error ======================= ========================== a = / (76.23%) b = / (14.11%) correlation matrix of the fit parameters: a b a b Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

9 Parameter estimation Estimation of the parameters a from measured data using a linear model. The parameter vector a consists of p elements a 1, a 2,..., a p. The measured values form the vector y (n random variables, y 1, y 2,..., y n ). The estimation value of y is a function of the variable x: y(x) = f (x, a) = a 1 f 1 (x) + a 2 f 2 (x) a p f p (x). The estimation value of each single measurement y i is E[y i ] = f (x i, ā) = ȳ i Here the elements of ā are the true values of the parameters a. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

10 Parameter estimation The residuals have nice properties if a = ā: r i = y i f (x i, a) E[r i ] = 0 E[r 2 i ] = V [r i ] = σ 2 i. The only requirements are that the p.d.f. of the residuals is unbiased and has finite variance. So it is not required, that the residuals are Gaussian distributed. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

11 Least squares: Normal equations For now, all data has the same variance and is uncorrelated. Following the principle of least squares we minimize the sum of squares of the residuals varying the parameters a 1, a 2,..., a p : S = n i=1 r 2 i = n (y i a 1 f 1 (x i ) a 2 f 2 (x i )... a p f p (x i )) 2 i=1 Conditions for the minimum: S n = 2 f 1 (x i ) (a 1 f 1 (x i ) + a 2 f 2 (x i ) a p f p (x i ) y i ) = 0 a 1 i= S n = 2 f p (x i ) (a 1 f 1 (x i ) + a 2 f 2 (x i ) a p f p (x i ) y i ) = 0 a p i=1 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

12 Least squares: Normal equations We then write down the conditions as normal equations a 1 f1 (x i ) a p f1 (x i )f p (x i ) = a 1 f2 (x i )f 1 (x i ) a p f2 (x i )f p (x i ) = y i f 1 (x i ) y i f 2 (x i )... a 1 fp (x i )f 1 (x i ) a p fp (x i ) 2 = y i f p (x i ) The solution of these normal equations is the least square estimate of the parameters a 1, a 2,..., a p. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

13 Least squares: Matrix formalism Matrix formalism and matrix algebra simplify the computation. The n p values f j (x i ) form a n p matrix. The p parameters a j and the n measured values y i form column vectors. A = f 1 (x 1 ) f 2 (x 1 )... f p (x 1 ) f 1 (x 2 ) f 2 (x 2 )... f p (x 2 ) f 1 (x n ) f 2 (x n )... f p (x n ) a = a 1 a 2... a p y = y 1 y y n Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

14 Least squares: Matrix formalism The n-vector of the residuals is r = y Aa. Considering the sum S we get S = r T r = (y Aa) T (y Aa) = y T y 2a T A T y + a T A T Aa Conditions for the minimum 2A T y + 2A T Aâ = 0 or using normal equations (A T A)â = A T y The solution only requires standard methods of matrix algebra: â = (A T A) 1 A T y Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

15 Least squares: Matrix formalism The covariance matrix is the quadratic n n matrix var(y 1 ) cov(y 1, y 2 )... cov(y 1, y n ) V[y] = cov(y 2, y 1 ) var(y 2 )... cov(y 2, y n ) cov(y n, y 1 ) cov(y n, y 2 )... var(y n ) Here the covariance matrix is just a diagonal matrix V[y] = σ σ σ 2 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

16 Least squares: Matrix formalism Since the parameters are a linear function of the data â = By we can use the standard error propagation: with B = (A T A) 1 A T we get V[â] = BV[y]B T V[â] = (A T A) 1 A T V[y]A(A T A) 1 Here we have equal errors for all data points V[â] = σ 2 (A T A) 1 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

17 Least squares: Matrix formalism The sum Ŝ of squares of the residuals in the minimum is Ŝ = y T y 2â T A T y + â T A T A(A T A) 1 A T y = y T y â T A T y. The expectation value of E[Ŝ] is E[Ŝ] = σ2 (n p). If the variance of the data is not known, we can use Ŝ to get an estimate ˆσ 2 = Ŝ/(n p). This is a good estimate for large values of (n p). Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

18 Least squares: Matrix formalism After estimating the parameters using the linear least squares method we can calculate f (x) for arbitrary x ŷ(x) = f (x, â) = p â j f j (x). j=1 For the values x i which belong to the measured values y i we will get the predicted values using ŷ = Aâ. Using error propagation one gets the associated covariance matrix V[ŷ] = AV[a]A T = σ 2 A(A T A) 1 A T Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

19 Least squares: Matrix formalism If the data points are independent the covariance matrix is σ V[y] = 0 σ σn 2 The sum of squares of the residuals is now S = i r 2 i σ 2 i = Minimum We define a weight matrix W(y) which is the inverse of the covariance matrix 1/σ W(y) = V[y] 1 = 0 1/σ /σn 2 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

20 Least squares: Matrix formalism The sum of squares of the weighted residuals is S = r T W(y)r = (y Aa) T W(y)(y Aa) which has to minimized. One gets â = (A T WA) 1 A T Wy V[â] = (A T WA) 1 The sum of squares of the residuals for a = â is Ŝ = y T Wy â T A T Wy with expectation value E[Ŝ] = n p. The covariance matrix for the predicted values is V[ŷ] = A(A T WA) 1 A T Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

21 Least squares: Linear regression For the linear regression we fit the function y = f (x, a) = a 1 + a 2 x. The data y i has been taken at certain values of x i. A = a = ( a1 a 2 1 x 1 1 x 2 1 x x n ) y = V = y 1 y 2 y 3... y n σ σ σ σ 2 n W = V 1 w ii = 1 σ 2 i Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

22 Least squares: Linear regression Solution: ( ) ( A T wi wi x WA = i S1 S wi x i wi xi 2 = x S x S xx ( ) ( ) A T wi y Wy = i Sy = wi x i y i ( S1 S x S x S xx ) ( a1 a 2 ) = S xy ( Sy S xy ) ) ( S1 S x S x S xx â = (A T WA) 1 A T Wy V[â] = (A T WA) 1 ) 1 = 1 ( Sxx S x D S x S 1 ) with D = S 1 S xx S 2 x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

23 Least squares: Linear regression The estimate for the parameters is â 1 = (S xx S y S x S xy )/D â 2 = ( S x S y S 1 S xy )/D and the covariance matrix is V[â] = 1 ( Sxx S x D S x S 1 ). For the sum of squares of residuals one gets Ŝ = S yy â 1 S y â 2 S xy For the predicted value ŷ = â 1 + â 2 x we get the variance by calculating: V [ŷ] = V [â 1 ] + x 2 V [â 2 ] + 2xV [â 1, â 2 ] = (S xx 2xS x + x 2 S 1 )/D Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

24 Least squares method Example: In wine-growing the amount of wine harvested in autumn is measured in tons per 100 m 2 (t/ar). It is known that the annual yield can be predicted fairly well in July, by determining the average number of berries which have been formed per bunch. year yield (y i ) cluster (x i ) yield/(t/ar) y clusters x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

25 Least squares method: Wine-growing example yield/(t/ar) y clusters x a 1 = ± a 2 = ± Errorband : err(x) = x ± x x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS / 25

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2015 Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler

More information

Statistics, Data Analysis, and Simulation SS 2013

Statistics, Data Analysis, and Simulation SS 2013 Statistics, Data Analysis, and Simulation SS 2013 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, May 21, 2013 3. Parameter estimation 1 Consistency:

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Statistics, Data Analysis, and Simulation SS 2017

Statistics, Data Analysis, and Simulation SS 2017 Statistics, Data Analysis, and Simulation SS 2017 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2017 Dr. Michael O. Distler

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2) RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made

More information

Statistics, Data Analysis, and Simulation SS 2013

Statistics, Data Analysis, and Simulation SS 2013 Statistics, Data Analysis, and Simulation SS 213 8.128.73 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 23. April 213 What we ve learned so far Fundamental

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

Economics 620, Lecture 2: Regression Mechanics (Simple Regression) 1 Economics 620, Lecture 2: Regression Mechanics (Simple Regression) Observed variables: y i ; x i i = 1; :::; n Hypothesized (model): Ey i = + x i or y i = + x i + (y i Ey i ) ; renaming we get: y i =

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Bivariate Least Squares Linear Regression: Towards a Unified Analytic Formalism. II. Extreme Structural Models

Bivariate Least Squares Linear Regression: Towards a Unified Analytic Formalism. II. Extreme Structural Models Applied Mathematical Sciences, Vol. 11, 2017, no. 49, 2393-2446 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2017.78260 Bivariate Least Squares Linear Regression: Towards a Unified Analytic

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

STAT 111 Recitation 7

STAT 111 Recitation 7 STAT 111 Recitation 7 Xin Lu Tan xtan@wharton.upenn.edu October 25, 2013 1 / 13 Miscellaneous Please turn in homework 6. Please pick up homework 7 and the graded homework 5. Please check your grade and

More information

TMA4255 Applied Statistics V2016 (5)

TMA4255 Applied Statistics V2016 (5) TMA4255 Applied Statistics V2016 (5) Part 2: Regression Simple linear regression [11.1-11.4] Sum of squares [11.5] Anna Marie Holand To be lectured: January 26, 2016 wiki.math.ntnu.no/tma4255/2016v/start

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Appendix A: Review of the General Linear Model

Appendix A: Review of the General Linear Model Appendix A: Review of the General Linear Model The generallinear modelis an important toolin many fmri data analyses. As the name general suggests, this model can be used for many different types of analyses,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Statistical View of Least Squares

Statistical View of Least Squares Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Simple Linear Regression Estimation and Properties

Simple Linear Regression Estimation and Properties Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Asymptotic Theory. L. Magee revised January 21, 2013

Asymptotic Theory. L. Magee revised January 21, 2013 Asymptotic Theory L. Magee revised January 21, 2013 1 Convergence 1.1 Definitions Let a n to refer to a random variable that is a function of n random variables. Convergence in Probability The scalar a

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Measurement And Uncertainty

Measurement And Uncertainty Measurement And Uncertainty Based on Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results, NIST Technical Note 1297, 1994 Edition PHYS 407 1 Measurement approximates or

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Linear Models in Statistics

Linear Models in Statistics Linear Models in Statistics ALVIN C. RENCHER Department of Statistics Brigham Young University Provo, Utah A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511 STAT 511 Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine December 3, 2018 A simple linear regression investigates the relationship between the two variables that is not

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Lecture 4: Least Squares (LS) Estimation

Lecture 4: Least Squares (LS) Estimation ME 233, UC Berkeley, Spring 2014 Xu Chen Lecture 4: Least Squares (LS) Estimation Background and general solution Solution in the Gaussian case Properties Example Big picture general least squares estimation:

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Reference: Davidson and MacKinnon Ch 2. In particular page

Reference: Davidson and MacKinnon Ch 2. In particular page RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy

More information

Lecture 2. Simple linear regression

Lecture 2. Simple linear regression Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short

More information

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable, Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/2 01 Examination Date Time Pages Final December 2002 3 hours 6 Instructors Course Examiner Marks Y.P.

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Ch4. Distribution of Quadratic Forms in y

Ch4. Distribution of Quadratic Forms in y ST4233, Linear Models, Semester 1 2008-2009 Ch4. Distribution of Quadratic Forms in y 1 Definition Definition 1.1 If A is a symmetric matrix and y is a vector, the product y Ay = i a ii y 2 i + i j a ij

More information

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit Statistics Lent Term 2015 Prof. Mark Thomson Lecture 2 : The Gaussian Limit Prof. M.A. Thomson Lent Term 2015 29 Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

Statistics, Data Analysis, and Simulation SS 2017

Statistics, Data Analysis, and Simulation SS 2017 Statistics, Data Analysis, and Simulation SS 2017 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 20. April 2017 The Mainz Microtron 1.6 GeV cw electron

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

ECONOMETRIC THEORY. MODULE XVII Lecture - 43 Simultaneous Equations Models

ECONOMETRIC THEORY. MODULE XVII Lecture - 43 Simultaneous Equations Models ECONOMETRIC THEORY MODULE XVII Lecture - 43 Simultaneous Equations Models Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 2 Estimation of parameters To estimate

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

MATH Notebook 4 Spring 2018

MATH Notebook 4 Spring 2018 MATH448001 Notebook 4 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH448001 Notebook 4 3 4.1 Simple Linear Model.................................

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29 REPEATED MEASURES Copyright c 2012 (Iowa State University) Statistics 511 1 / 29 Repeated Measures Example In an exercise therapy study, subjects were assigned to one of three weightlifting programs i=1:

More information

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 1: Simple Linear Regression Introduction and Estimation Methods for studying the relationship of two or more quantitative

More information

Statistical Techniques II EXST7015 Simple Linear Regression

Statistical Techniques II EXST7015 Simple Linear Regression Statistical Techniques II EXST7015 Simple Linear Regression 03a_SLR 1 Y - the dependent variable 35 30 25 The objective Given points plotted on two coordinates, Y and X, find the best line to fit the data.

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information