4 Multiple Linear Regression

Size: px
Start display at page:

Download "4 Multiple Linear Regression"

Transcription

1 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ 2 ). Y = β + β x + β 2 x β k x k + ε (4.) orollary: If Y fits a Multiple Linear Regression Model, then for a fixed value of x = (x, x 2,..., x k ) R k the conditional expectation of Y given x equals E(Y x) = E(β + β x + β 2 x β k x k + ε) = β + β x + β 2 x β k x k (4.2) and the conditional variance of Y given x is V ar(y x) = V ar(β + β x + β 2 x β k x k + ε) = V ar(ε) = σ 2 (4.3) β... β k are called the regression coefficients, and they and σ 2 are the parameters of the model. This implies that the mean of Y depends on x in a linear fashion and the variance of Y is independent from the value of x, which is called homoscedasticity. 4.2 Visualize 4.2. Without Interaction Let limit ourselves for the moment to two predictor variables x and x 2 a third variable is Y the response variable. In a model without error Y = β + β x + β 2 x 2. Let Y = 5 + x + 2x 2. The following diagram shows this function in a 3d-plot from R.

2 The equation does not include an interaction term, which means that values of x 2 do not have an effect on the slope in the relationship between x and y, therefore the graph in a plane With Interaction When interaction is present this indicates that for different values of x 2 the slope of the line relating x and y can change the graph is not a plane anymore. In the model this can be introduced by including the product term for the interacting variables, x x 2. Let Y = 5 + x + 2x 2 3 x x 2. Here you see that the result is no longer plane but a curved surface. 2

3 4.3 Estimation Like for the simple linear regression model the first goal is to estimate the parameters of the model, using point estimators and confidence intervals. The data is given as an array like y x x 2... x k y 2 x 2 x x 2k.. y n x n x n2... x nk Using the data we will see if a multiple linear model is an appropriate description for the population, and if it is what estimates for the model parameters can be found. If the MLR model is not a good fit can we adjust the model to obtain a better fit We will find if outliers are causing problems, or if the model should be discarded entirely, because Y is independent from x,..., x k. onsidering the above array, it seems to be reasonable to use matrices to represent the data Matrix notation If The MLR model fits the population then each observation in the sample can be modeled by a random variable which fits the model equation, i.e. for i n Y i = β + β x i + β 2 x i2 + + β k x ik + ε i For the analysis it is assumed that the n random variables are independent. Then with Y = (Y, Y 2,..., Y n ), ε = (ε,..., ε n ), β = (β, β,..., β k ), and ˆ x x 2... x k X =. x 2 x x 2k x n x n2... x nk The model for n independent observations can be given in the following form: Y = X β + ε, ε N (, σ 2 I n ) 4.4 Point Estimators The least squares estimators are chosen such that the total squared distances to the regression function are as small as possible. Theorem 4.. In the case that X X is invertible the least squares estimator for the coefficient vector β, ˆβ, is given by ˆβ = (X X) X Y 3

4 Proof: For β R k+ the total of the squared distances of y, y 2,..., y n to the model function is ε ε = ( Y X β) ( Y X β) = Y Y Y X β (X β) Y + (X β) (X β) = Y Y 2 β X Y + β X X β In order to find least squares estimator ˆβ, which minimizes the expression above, find the partial derivatives of the expression and set them to zero, then solve for β. The linear system giving ˆβ (Math 24 we will not dwell on this here) which is equivalent to the normal equation giving in the case that X X is invertible Which concludes the proof. 2X Y + 2X X ˆβ = k X X ˆβ = X Y ˆβ = (X X) X Y X X is invertible, iff the columns of X are linearly independent. The i th column of X gives the values for the n individuals from the sample for the i th predictor variable. It follows that in the case of linearly dependent columns the least square estimator is not unique. In the case the columns are linearly dependent one usually removes columns until the design matrix has full rank, and the least squares estimator becomes unique. This is also the reason it is disadvantageous if predictor variables have a high correlation, this results in a small determinant of X X making it hard numerically to invert the matrix and calculating the least squares estimator with high precisions. Least squares estimator as projection Geometrically regression is the orthogonal projection of the vector y R n into the p dimensional space spanned by the columns from X. The projection ˆµ is according to Linear lgebra ˆµ = X (X X) X y = X ˆβ In regression it is tradition to use ŷ instead of ˆµ. The hat matrix is defined as because when applied to Y, it gets a hat. H = X (X X) X The vector ŷ gives the fitted values for observed values y from the model estimates. The model for the n observations are Y = Xβ + ε where ε has en expected value of. The fitted value of y, ŷ is then ŷ = X ˆβ 4

5 Lemma. The observed residual vector for a data vector is. e = y ŷ = y X ˆβ = y X(X X) X y = (I n X(X X) X ) y = (I n H) y H is called the hat matrix and is central in regression analysis. Properties of ˆβ Theorem ˆβ is an unbiased estimator of β. 2. The covariance matrix of ˆβ is ov( ˆβ) = σ 2 (X X) 3. ˆβ has a multivariate normal distribution. Proof:. 2. E( ˆβ) = E ((X X) X Y ) = (X X) X E(Y ) = (X X) X Xβ = I n β = β ov( ˆβ) = ov ((X X) X Y ) = (X X) X ov(y )X((X X) ) = (X X) X σ 2 I n X((X X) ) = σ 2 (X X) X X(X X) = σ 2 (X X) 3. y definition is ˆβ a linear combination in normally distributed random variables and therefore multivariate normal. ˆβ has also minimal variance. This property has to be defined, because ˆβ is a random vector and therefore has a p = k + dimensional covariance matrix, and how to order matrices needs to be clarified. Theorem 4.3. (Gauss-Markov Theorem) Let ψ = c β, for c R p, be a linear combination of the parameter vector β in the MLR model, assume the design matrix X has full rank, then ˆψ = c ˆβ is an unbiased estimator for ψ and has smallest variance in the class of linear unbiased estimators for ψ. 5

6 Proof:(ompare Faraway(22)) ecause of E( ˆψ) = E( c ˆβ) = c E( ˆβ) = c β = ψ, ˆψ is an unbiased estimator for ψ. Suppose a Y is some unbiased estimate of c β so that which means that a X = c (). Now let b = (X X) c, then c = X X b (2) and E( a Y ) = c β for all β a X β = c β for all β ˆψ = c ˆβ = b X X ˆβ = b X X(X X) X Y = b X Y (3) Now find the variance of a Y (show that V ar( a Y ) V ar( ˆψ)): V ar( a Y ) = V ar( a Y c ˆβ + c ˆβ) = V ar( a Y c ˆβ) + V ar( c ˆβ) + 2ov( a Y c ˆβ, c ˆβ) ut using (3) ov( a Y c ˆβ, c ˆβ) = ov( a Y b X Y, b X Y ) = ov(( a b X ) Y, b X Y ) = ( a b X )ov( Y )X b according to intermission 4 (ov( l Y, h Y ) = l ov( Y ) h). Then ( a b X )ov( Y )X b = ( a b X )σ 2 I n X b = σ 2 ( a X b X X) b = σ 2 ( c c ) b = because of () and (2). Therefore V ar( a Y ) = V ar( a Y c ˆβ) + V ar( c ˆβ) Since variances are always at least, this means that for all unbiased estimators a Y Which concludes the proof. Example 4.. V ar( a Y ) V ar( c ˆβ) small example relating age and weight to blood pressure: The data 6

7 weight age blood pressure The scatterplots do not display strong linear relationships, but we will never the less propose the following model for blood pressure in the population: bp = β + β (weight) + β 2 (age) + ε, ε N (, σ 2 ) First find the vector of parameter estimates for β = (β, β, β 2 ), based on the sample data above: X = with X X = and (X X) = (Last matrix found with R) With y = , X y = we get ˆβ = (X X) X y = and ŷ = X ˆβ = omparing the vector of fitted values, ŷ, with the measurement vector, y, we can observe a close fit. 7

8 For closer examination find the residual vector: e = y ŷ = The residuals giving the difference in blood pressure measured and predicted by the model, are quite small, and seem to support the model. Interpretation of the parameter estimates: The mean blood pressure for a person of age and weight is estimated to be This is not making sense!. We would be extrapolating using this interpretation. The blood pressure increase in average by 2.38 for every extra year when the weight is unchanged. The blood pressure increase in average by.44 for every extra kilogram when the age is unchanged. To be continued. The estimator for the variance of the error in the MLR model, σ 2, is again based on the residuals. Definition 4.2. Let p := k + ( the number of regression parameters). The residual sum of squares are 2. The mean squares of error are SS Res = e e MS Res = SS Res n p Theorem SS Res = Y Y ˆβX Y = Y [I n X(X X) X ] Y 2. E(SS Res ) = (n p)σ 2 3. SS Res /σ 2 is χ 2 distributed with df = (n p) and non centrality parameter λ =. Proof: 8

9 . SS Res = e e = ( Y Ŷ ) ( Y Ŷ ) = Y Y Y X ˆβ ˆβ X Y + ˆβ X X ˆβ = Y Y Y X ˆβ ˆβ X Y + Y X(X X) X X ˆβ = Y Y Y X ˆβ ˆβ X Y + Y X ˆβ = Y Y ˆβ X Y which proves the first part. ontinue from here SS Res = Y Y ˆβ X Y = Y Y Y X(X X) X Y = Y [I X(X X) X ] Y which concludes the second part of the equation. 2. To find the mean use E( Y Y ) = tr(ov(y )) + E( Y ) E( Y ) from intermission 4. Therefore we need the tr(i X(X X) X ): Then tr([i X(X X) X ]) = tr(i) tr(x(x X) X ) = n tr((x X) X X) = n tr(i p ) = n p E(SS Res ) = E( Y [I X(X X) X ] Y ) = tr([i X(X X) X ] σ 2 I) + E( Y ) [I X(X X) X ] E( Y ) = σ 2 (n p) + β X [I X(X X) X ] Xβ = σ 2 (n p) + β X Xβ β X X(X X) X Xβ = σ 2 (n p) + β X Xβ β X Xβ = σ 2 (n p) 3. ecause of intermission 5 it follows that SS Res /σ 2 is χ 2 distributed with df = n p. Rest without proof. onclusion: ˆσ 2 = MS Res is an unbiased estimator for σ 2. ontinue Example. 4.: SS Res = e e = (.5,.853,.869,.53,.2,.53) The estimated variance ˆσ 2 = 5.78/(6 3) =.93, and therefore ˆσ =.93 =.39. = 5.78 We estimate that the measurements vary with a standard deviation of.39 around the regression function. 9

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

4.7 Confidence and Prediction Intervals

4.7 Confidence and Prediction Intervals 4.7 Confidence and Prediction Intervals Instead of conducting tests we could find confidence intervals for a regression coefficient, or a set of regression coefficient, or for the mean of the response

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 25 Outline 1 Multiple Linear Regression 2 / 25 Basic Idea An extra sum of squares: the marginal reduction in the error sum of squares when one or several

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

STA 302f16 Assignment Five 1

STA 302f16 Assignment Five 1 STA 30f16 Assignment Five 1 Except for Problem??, these problems are preparation for the quiz in tutorial on Thursday October 0th, and are not to be handed in As usual, at times you may be asked to prove

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance: 8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Reference: Davidson and MacKinnon Ch 2. In particular page

Reference: Davidson and MacKinnon Ch 2. In particular page RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Well-developed and understood properties

Well-developed and understood properties 1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

MEI Exam Review. June 7, 2002

MEI Exam Review. June 7, 2002 MEI Exam Review June 7, 2002 1 Final Exam Revision Notes 1.1 Random Rules and Formulas Linear transformations of random variables. f y (Y ) = f x (X) dx. dg Inverse Proof. (AB)(AB) 1 = I. (B 1 A 1 )(AB)(AB)

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Lecture Notes on Different Aspects of Regression Analysis

Lecture Notes on Different Aspects of Regression Analysis Andreas Groll WS 2012/2013 Lecture Notes on Different Aspects of Regression Analysis Department of Mathematics, Workgroup Financial Mathematics, Ludwig-Maximilians-University Munich, Theresienstr. 39,

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

2. Regression Review

2. Regression Review 2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

REGRESSION ANALYSIS AND INDICATOR VARIABLES

REGRESSION ANALYSIS AND INDICATOR VARIABLES REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2) RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Estadística II Chapter 5. Regression analysis (second part)

Estadística II Chapter 5. Regression analysis (second part) Estadística II Chapter 5. Regression analysis (second part) Chapter 5. Regression analysis (second part) Contents Diagnostic: Residual analysis The ANOVA (ANalysis Of VAriance) decomposition Nonlinear

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Lecture 18 MA Applied Statistics II D 2004

Lecture 18 MA Applied Statistics II D 2004 Lecture 18 MA 2612 - Applied Statistics II D 2004 Today 1. Examples of multiple linear regression 2. The modeling process (PNC 8.4) 3. The graphical exploration of multivariable data (PNC 8.5) 4. Fitting

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Appendix A: Review of the General Linear Model

Appendix A: Review of the General Linear Model Appendix A: Review of the General Linear Model The generallinear modelis an important toolin many fmri data analyses. As the name general suggests, this model can be used for many different types of analyses,

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Analysis of Bivariate Data

Analysis of Bivariate Data Analysis of Bivariate Data Data Two Quantitative variables GPA and GAES Interest rates and indices Tax and fund allocation Population size and prison population Bivariate data (x,y) Case corr&reg 2 Independent

More information

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23 2.4. ASSESSING THE MODEL 23 2.4.3 Estimatingσ 2 Note that the sums of squares are functions of the conditional random variables Y i = (Y X = x i ). Hence, the sums of squares are random variables as well.

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

1. Simple Linear Regression

1. Simple Linear Regression 1. Simple Linear Regression Suppose that we are interested in the average height of male undergrads at UF. We put each male student s name (population) in a hat and randomly select 100 (sample). Then their

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Linear Model Under General Variance

Linear Model Under General Variance Linear Model Under General Variance We have a sample of T random variables y 1, y 2,, y T, satisfying the linear model Y = X β + e, where Y = (y 1,, y T )' is a (T 1) vector of random variables, X = (T

More information

Analisi Statistica per le Imprese

Analisi Statistica per le Imprese , Analisi Statistica per le Imprese Dip. di Economia Politica e Statistica 4.3. 1 / 33 You should be able to:, Underst model building using multiple regression analysis Apply multiple regression analysis

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Applied linear statistical models: An overview

Applied linear statistical models: An overview Applied linear statistical models: An overview Gunnar Stefansson 1 Dept. of Mathematics Univ. Iceland August 27, 2010 Outline Some basics Course: Applied linear statistical models This lecture: A description

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.

More information

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009 Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information