Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Size: px
Start display at page:

Download "Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA"

Transcription

1 Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, / 44

2 1 OLS estimator 2 Restricted regression 3 Errors in variables 4 ANOVA 5 The F test in an ANOVA framework 6 Contrasts KC Border Linear Regression II March 6, / 44

3 OLS estimator Standard Linear Model y = Xβ + ε, where E ε = 0 and Var ε = E(εε ) = σ 2 I KC Border Linear Regression II March 6, / 44

4 OLS estimator OLS estimation With N observations on X 1,, X K, Y, let X be the N K matrix of regressors, and y be the N 1 vector of observations on the response Y Then if X has rank K, the OLS estimator ˆβ OLS of the parameter vector β is given by ˆβ OLS = (X X) 1 X y (1) It is obtained by orthogonally projecting y onto the column space of X KC Border Linear Regression II March 6, / 44

5 OLS estimator Regression and Correlation X X = N t x t (X X) 1 = β 0 = (X X) 1 X y = β 1 y t = β 0 + β 1 x t t x t t x 2 t and X t y = y t t y tx t t x t 2 t x t t x t N 1 N t (x t x) 2 1 N t (x t x) 2 ˆβ 1 = N( t y tx t ) ( t x t)( t y t) N( t x 2 t ) ( t x t) 2 = ( ( t xt 2)( t yt) ( t xt)( t ytx t ) ( t x t)( t y t) + N( t y tx t ) t y tx t ) ( t x t)( t y t)/n ( t x t 2 ) ( t x t) 2 /N KC Border Linear Regression II March 6, / 44

6 OLS estimator Corr(X, Y ) = Cov(X, Y ) (SD X)(SD Y ) = Cov(X, Y ) = E(X Y ) Given pairs (x t, y t ), t = 1,, N, of observations, define the sample correlation coefficient r by Nt=1 (x t x)(y t ȳ) r = Nt=1 Nt=1, (x t x) 2 (y t ȳ) 2 which is the sample analog of the correlation It is also known as the Pearson product-moment correlation coefficient Consider the centered variables x t = x t x, ỹ t = y t ȳ ˆβ 1 = N( t y tx t ) ( t x t)( t y t) N( t x t 2 ) ( t x t) 2 = N( t ỹt x t ) ( t x t)( t ỹt) N( t x t 2 ) ( t x t) 2 But by construction, x t = t t ỹ t = 0, KC Border Linear Regression II March 6, / 44

7 OLS estimator Now look at the formula for the correlation coefficient It can be rewritten as Nt=1 x t ỹ t r = = s x s ˆβ s x 1, y s y where s x = t (x t x) 2 = t x 2 t and s y = t ỹ 2 t Among other things this implies that r = 0 if and only the slope ˆβ 1 of the regression line is zero (If s x = 0, then all the x t are the same, and the slope is not identifiable) KC Border Linear Regression II March 6, / 44

8 OLS estimator Testing for serial correlation Regress e t on e t 1 : Test ˆβ 1 = 0 e t = β 0 + β 1 e t 1 KC Border Linear Regression II March 6, / 44

9 OLS estimator Testing linear restrictions on β To test q simultaneous restrictions, let H 0 : a = Aβ, where A is a q K matrix with rank q Theorem Under the null hypothesis, the test statistic F = 1 qs 2 (a A ˆβ OLS ) [ A(X X) 1 A ] (a A ˆβ OLS ) has an F -distribution with (q, N K) degrees of freedom KC Border Linear Regression II March 6, / 44

10 OLS estimator The F -test of the regression Many software packages, including R, compute for you something called the F -statistic for the regression The F -statistic for the regression tests the null hypothesis that all the coefficients on the non-constant terms are zero, H 0 : β 2 = β 3 = = β K = 0 (If you have a constant term, it is usually X 1 in our terminology) KC Border Linear Regression II March 6, / 44

11 OLS estimator Coefficient of Multiple Correlation R y 2 = y y = ˆβ OLSX X ˆβ OLS + e e + 2 ˆβ OLS X e }{{} = 0 The coefficient of multiple correlation R is a measure of the fraction of y y explained by the regressors Specifically, 1 R 2 = e e y y, or R2 = ŷ ŷ y y = ˆβ OLSX X ˆβ OLS y y The Pythagorean Theorem implies y y = e e + ŷ ŷ, so 0 R 2 1 KC Border Linear Regression II March 6, / 44

12 OLS estimator Geometry of R 2 R = R 2 is the cosine of the angle ϕ between y and ŷ = X ˆβ OLS y ˆβ 1 x 1 e 0 x 1 ϕ x 2 ˆβ2 x 2 ŷ KC Border Linear Regression II March 6, / 44

13 OLS estimator Adjusted R 2 Increasing the number of right-hand side variates can only decrease the sum of squared residuals, so it is desirable to penalize the measure of fit The adjusted R 2 is defined by: or (1 R 2 ) = 1 N K e e 1 N 1 y y = N 1 N K (1 R2 ) R 2 = 1 K N K + N 1 N K R2 It is possible for the adjusted R 2 to be negative KC Border Linear Regression II March 6, / 44

14 OLS estimator What is a good value for R 2? KC Border Linear Regression II March 6, / 44

15 OLS estimator Prediction intervals Let y = x β + ε, ŷ = x ˆβ OLS But what is the confidence interval for y? ŷ y = x ˆβ OLS x β ε = x ( ˆβ OLS β) ε Therefore Var(ŷ y ) = Var (x ( ˆβ ) OLS β) ε ( = Var x ( ˆβ ) OLS β) + Var(ε ) = σ 2 (x (X X) 1 x + 1) KC Border Linear Regression II March 6, / 44

16 OLS estimator Confidence intervals Under the normality hypothesis, x ˆβ OLS y σ 2 (x (X X) 1 x +1) (N K)s 2 σ 2 = x ˆβ OLS y s x (X X) 1 x + 1 Thus a (1 α) confidence interval of y is [ ŷ t α,n K 2 s x (X X) 1 x + 1, t(n K) ] ŷ + t α,n K 2 s x (X X) 1 x + 1 KC Border Linear Regression II March 6, / 44

17 Restricted regression The Lagrange Multiplier Theorem If x minimizes f (x) subject to g i (x) = 0 (i = 1,, m), and if g 1 (x ),, g m (x ) are linearly independent, then there exist Lagrange multipliers λ i, i = 1,, m such that f (x ) + λ 1 g 1 (x ) + + λ m g m (x ) = 0 KC Border Linear Regression II March 6, / 44

18 Restricted regression Restricted OLS To minimize (y Xb) (y Xb) subject to the constraint Ab = a (where A is q k), the LMT tells us to form the Lagrangean and solve the FOC (y Xb) (y Xb) + λ (Ab a) 2X y + 2X Xb + A λ = 0 (2) KC Border Linear Regression II March 6, / 44

19 Restricted regression Solving the FOC Premultiply by A(X X) 1 : 2A(X X) 1 X y + 2 A(X X) 1 (X X)b A(X X) 1 A λ = 0, }{{} =a so, solving for λ [ λ = 2 A(X X) 1 A ] 1 [ ] a A(X X) 1 X y Substitute this into (2) to get X y + X Xb A [A(X X) 1 A ] 1 [a A(X X) 1 X y] which after premultiplying by (X X) 1, with some work simplifies to b = ˆβ OLS + (X X) 1 A [A(X X) 1 A ] 1 (a A ˆβ OLS ) KC Border Linear Regression II March 6, / 44

20 Restricted regression Restricted residuals Let e r = y Xb be the vector of residuals from the restricted regression It can be shown that e r e r = e u e u + (a A ˆβ OLS ) [A(X X) 1 A ] 1 (a A ˆβ OLS ), where e u e u is the sum of squares from the unrestricted OLS regression Thus e r e r e u e u = (a A ˆβ OLS ) [A(X X) 1 A ] 1 (a A ˆβ OLS ) is a quadratic form in the q variables a A ˆβ OLS KC Border Linear Regression II March 6, / 44

21 Restricted regression Testing a linear restriction H 0 : Aβ = a Let e u and e r be the vector of residuals from the unrestricted and restricted regressions Then under the null hypothesis, F = e r e r e u e u q e u e u N K has an F -distribution with (q, N K) degrees of freedom The null hypothesis should be rejected if F F 1 α,q,n K KC Border Linear Regression II March 6, / 44

22 Restricted regression y e r e u y r y u y r x 2 y u 0 x 1 {Xb : Ab = a} Restricted regression with restriction β 1 = 1 The points y r, y u, y form a right triangle with hypotenuse y r y KC Border Linear Regression II March 6, / 44

23 Restricted regression F -tests and t-tests may seem to conflict! KC Border Linear Regression II March 6, / 44

24 Errors in variables Measurement error True model: y = Xβ + ε, but observe X = X + V So the estimated model is y = Xβ + η, (3) The OLS estimate derived from (3) is ˆβ = β + (X X) 1 X (ε V β) The expectation is E ˆβ = β + E(X X) 1 X V β, which is not, in general, unbiased, nor consistent KC Border Linear Regression II March 6, / 44

25 ANOVA Some jargon According to Larsen and Marx, pp , The word factor is used to denote any treatment or therapy applied to the subjects being measured or any relevant feature (age, sex, ethnicity, etc) characteristic of those subjects Different versions, extents, or aspects, of a factor are referred to as levels Sometimes subjects or environments share certain characteristics that affect the way levels of a factor respond, yet those characteristics are of no intrinsic interest to the experimenter Any such set of conditions or subjects is called a block KC Border Linear Regression II March 6, / 44

26 ANOVA ANOVA ANOVA is an acronym for ANalysis Of VAriance KC Border Linear Regression II March 6, / 44

27 ANOVA Model equations One factor with k levels Y ij is the i th measurement of the response at factor level j n j observations at level j Y ij = µ j + ε ij, (i = 1,, n j ; j = 1,, k) n = n n k is the total number of observations ε ij are assumed to be independent, have common mean zero and common variance σ 2 µ j is just the expected value of the response at level j KC Border Linear Regression II March 6, / 44

28 ANOVA ANOVA is a special case of the Standard Linear Model X j is a dummy variable or indicator for the j th level y 11 y n1 1 y 12 y n2 2 y 1k y nk k = µ 1 µ 2 µ k + ε 11 ε n1 1 ε 12 ε n2 2 ε 1k ε nk k KC Border Linear Regression II March 6, / 44

29 ANOVA OLS and ANOVA X X = KC Border Linear Regression II March 6, / 44

30 ANOVA n n X X = 0 0 n k KC Border Linear Regression II March 6, / 44

31 ANOVA X y = y 11 y n1 1 y 12 y n2 2 y 1k y nk k KC Border Linear Regression II March 6, / 44

32 ANOVA X y = n1 i=1 y i1 n2 i=1 y i2 nk i=1 y ik KC Border Linear Regression II March 6, / 44

33 ANOVA (X X) 1 X y = n1 n2 nk i=1 y i1 n 1 i=1 y i2 n 2 i=1 y ik n k KC Border Linear Regression II March 6, / 44

34 The F test in an ANOVA framework Hypothesis testing in ANOVA The most common hypothesis is H 0 : µ 1 = = µ k, against the alternative H 1 : not all the µ j s are equal KC Border Linear Regression II March 6, / 44

35 The F test in an ANOVA framework A little more jargon y ij is the response of the i th observation at level j n j T j = y ij is the response total at level j i=1 Ȳ j = T j is the sample mean at level j n j k n j n T = y ij = T j is the sample overall total response j=1 i=1 j=1 Ȳ = 1 k n j y ij = 1 k T j is the sample overall average response n n j=1 i=1 j=1 KC Border Linear Regression II March 6, / 44

36 The F test in an ANOVA framework The fundamental identity For any list x 1,, x n, n i=1 x 2 i = n (x i x) 2 + n x 2 i=1 n n (x i x) 2 = (xi 2 2x i x + x 2 ) i=1 i=1 n n n = xi 2 2 x x i + x 2 i=1 i=1 i=1 n = xi 2 + n x 2 i=1 KC Border Linear Regression II March 6, / 44

37 The F test in an ANOVA framework The treatment sum of squares SSTR is defined to be k SSTR = n j (Ȳ j Ȳ ) 2 j=1 It is not hard to show using the fundamental identity and other tricks that (see L&M Theorem 1221, p ) where µ = k j=1 k E(SSTR) = (k 1)σ 2 + n j (µ j µ) 2, (4) j=1 n j n µ j is the overall average of the (unobserved) µ j s That is, a large value of SSTR relative to (k 1)σ 2 indicates that the null hypothesis H 0 : µ 1 = = µ k = µ should be rejected KC Border Linear Regression II March 6, / 44

38 The F test in an ANOVA framework Estimating σ 2 Start by defining and aggregating s 2 j = nj i=1 (Y ij Ȳ j) 2, n j 1 k k n j SSE = (n j 1)sj 2 = (y ij ȳ j ) 2, (5) j=1 j=1 i=1 which is called the error sum of squares The important fact about these is: SSE σ 2 χ2 (n k) and SSE and SSTR are stochastically independent (L&M, Theorem 1223, p 600) KC Border Linear Regression II March 6, / 44

39 The F test in an ANOVA framework Under the null H 0 : µ 1 = = µ k = µ, Therefore, under the null, SSTR σ 2 χ 2 (k 1) F = SSTR/(k 1) SSE/(n k) F k 1, n k KC Border Linear Regression II March 6, / 44

40 The F test in an ANOVA framework The F -test At the α-level of significance, reject H 0 : µ 1 = = µ k if SSTR/(k 1) SSE/(n k) F 1 α,k 1,n k KC Border Linear Regression II March 6, / 44

41 The F test in an ANOVA framework ANOVA tables The traditional way to present ANOVA data is in the form of a table like this: Source df SS MS F P Treatment k-1 SSTR Error n-k SSE Total n-1 SSTOT SSTR k 1 SSE n k SSTR/(k 1) SSE/(n k) Two more terms: the mean square for treatments is MSTR = SSTR k 1 the mean square for errors is MSE = SSE n k F F k 1,n k KC Border Linear Regression II March 6, / 44

42 The F test in an ANOVA framework Contrasts A linear combination of the form C = w µ, where 1 w = 0 is called a contrast A typical contrast uses a vector of the form w = (0,, 0, 1, 0,, 0, 1, 0, 0), j j so C = w µ = µ j µ j Then the hypothesis H 0 : C = 0 amounts to H 0 : µ j = µ j This is probably why it is called a contrast KC Border Linear Regression II March 6, / 44

43 The F test in an ANOVA framework To test a hypothesis that C = 0, we weight the sample means k Ĉ = w j Ȳ j j=1 Then Define E Ĉ = C Var Ĉ = σ2 k SS C = Ĉ 2 kj=1 wj 2 n j j=1 w 2 j n j KC Border Linear Regression II March 6, / 44

44 The F test in an ANOVA framework F test of a contrast The test statistic F = SS C SSE/(n k) has F -distribution with (1, n k) degrees of freedom The null hypothesis H 0 : w µ = 0 should be rejected if F F 1 α,1,n k KC Border Linear Regression II March 6, / 44

The Standard Linear Model: Hypothesis Testing

The Standard Linear Model: Hypothesis Testing Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 25: The Standard Linear Model: Hypothesis Testing Relevant textbook passages: Larsen Marx [4]:

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

2. Regression Review

2. Regression Review 2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Chapter 10: Analysis of variance (ANOVA)

Chapter 10: Analysis of variance (ANOVA) Chapter 10: Analysis of variance (ANOVA) ANOVA (Analysis of variance) is a collection of techniques for dealing with more general experiments than the previous one-sample or two-sample tests. We first

More information

Chapter 11 - Lecture 1 Single Factor ANOVA

Chapter 11 - Lecture 1 Single Factor ANOVA Chapter 11 - Lecture 1 Single Factor ANOVA April 7th, 2010 Means Variance Sum of Squares Review In Chapter 9 we have seen how to make hypothesis testing for one population mean. In Chapter 10 we have seen

More information

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A ) Econ 60 Matrix Differentiation Let a and x are k vectors and A is an k k matrix. a x a x = a = a x Ax =A + A x Ax x =A + A x Ax = xx A We don t want to prove the claim rigorously. But a x = k a i x i i=

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

16.3 One-Way ANOVA: The Procedure

16.3 One-Way ANOVA: The Procedure 16.3 One-Way ANOVA: The Procedure Tom Lewis Fall Term 2009 Tom Lewis () 16.3 One-Way ANOVA: The Procedure Fall Term 2009 1 / 10 Outline 1 The background 2 Computing formulas 3 The ANOVA Identity 4 Tom

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

TMA4255 Applied Statistics V2016 (5)

TMA4255 Applied Statistics V2016 (5) TMA4255 Applied Statistics V2016 (5) Part 2: Regression Simple linear regression [11.1-11.4] Sum of squares [11.5] Anna Marie Holand To be lectured: January 26, 2016 wiki.math.ntnu.no/tma4255/2016v/start

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question

More information

1. The OLS Estimator. 1.1 Population model and notation

1. The OLS Estimator. 1.1 Population model and notation 1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Lecture 4: Testing Stuff

Lecture 4: Testing Stuff Lecture 4: esting Stuff. esting Hypotheses usually has three steps a. First specify a Null Hypothesis, usually denoted, which describes a model of H 0 interest. Usually, we express H 0 as a restricted

More information

Analisi Statistica per le Imprese

Analisi Statistica per le Imprese , Analisi Statistica per le Imprese Dip. di Economia Politica e Statistica 4.3. 1 / 33 You should be able to:, Underst model building using multiple regression analysis Apply multiple regression analysis

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

REGRESSION ANALYSIS AND INDICATOR VARIABLES

REGRESSION ANALYSIS AND INDICATOR VARIABLES REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest. Experimental Design: Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest We wish to use our subjects in the best

More information

Reliability of inference (1 of 2 lectures)

Reliability of inference (1 of 2 lectures) Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of

More information

Basic Probability Reference Sheet

Basic Probability Reference Sheet February 27, 2001 Basic Probability Reference Sheet 17.846, 2001 This is intended to be used in addition to, not as a substitute for, a textbook. X is a random variable. This means that X is a variable

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi

1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Effect Coding Hervé Abdi 1 Overview Effect coding is a coding scheme used when an analysis of variance (anova) is performed

More information

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal Email: yuppal@ysu.edu Chapter 13, Part A: Analysis of Variance and Experimental Design Introduction to Analysis of Variance Analysis

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 3 Jakub Mućk Econometrics of Panel Data Meeting # 3 1 / 21 Outline 1 Fixed or Random Hausman Test 2 Between Estimator 3 Coefficient of determination (R 2

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

Analysis of Variance

Analysis of Variance Analysis of Variance Math 36b May 7, 2009 Contents 2 ANOVA: Analysis of Variance 16 2.1 Basic ANOVA........................... 16 2.1.1 the model......................... 17 2.1.2 treatment sum of squares.................

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST4233 Linear Models: Solutions. (Semester I: ) November/December, 2007 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST4233 Linear Models: Solutions. (Semester I: ) November/December, 2007 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Linear Models: Solutions (Semester I: 2007 2008) November/December, 2007 Time Allowed : 2 Hours Matriculation No: Grade Table Problem 1 2 3 4 Total Full marks

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

ANOVA (Analysis of Variance) output RLS 11/20/2016

ANOVA (Analysis of Variance) output RLS 11/20/2016 ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Greene, Econometric Analysis (7th ed, 2012)

Greene, Econometric Analysis (7th ed, 2012) EC771: Econometrics, Spring 2012 Greene, Econometric Analysis (7th ed, 2012) Chapters 2 3: Classical Linear Regression The classical linear regression model is the single most useful tool in econometrics.

More information

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information