where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Similar documents
Chapter 11: Simple Linear Regression and Correlation

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Economics 130. Lecture 4 Simple Linear Regression Continued

Comparison of Regression Lines

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Economics & Business

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics II Final Exam 26/6/18

a. (All your answers should be in the letter!

Statistics for Business and Economics

Regression. The Simple Linear Regression Model

Chapter 13: Multiple Regression

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

STAT 3008 Applied Regression Analysis

Statistics Chapter 4

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Regression Analysis. Regression Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Scatter Plot x

Basic Business Statistics, 10/e

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Statistics MINITAB - Lab 2

β0 + β1xi. You are interested in estimating the unknown parameters β

Professor Chris Murray. Midterm Exam

Negative Binomial Regression

x i1 =1 for all i (the constant ).

Topic 7: Analysis of Variance

Lecture 4 Hypothesis Testing

e i is a random error

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 6: Introduction to Linear Regression

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Chapter 12 Analysis of Covariance

Chapter 14 Simple Linear Regression

18. SIMPLE LINEAR REGRESSION III

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

STATISTICS QUESTIONS. Step by Step Solutions.

28. SIMPLE LINEAR REGRESSION III

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

First Year Examination Department of Statistics, University of Florida

17 - LINEAR REGRESSION II

Linear Regression Analysis: Terminology and Notation

Chapter 15 - Multiple Regression

x = , so that calculated

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Polynomial Regression Models

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Analysis of Variance and Design of Experiments-II

F statistic = s2 1 s 2 ( F for Fisher )

Biostatistics 360 F&t Tests and Intervals in Regression 1

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero:

STAT 511 FINAL EXAM NAME Spring 2001

β0 + β1xi. You are interested in estimating the unknown parameters β

Correlation and Regression

Learning Objectives for Chapter 11

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Introduction to Regression

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Lecture 3 Stat102, Spring 2007

CHAPTER 8. Exercise Solutions

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

The SAS program I used to obtain the analyses for my answers is given below.

Lecture 2: Prelude to the big shrink

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Lecture Notes on Linear Regression

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

A Comparative Study for Estimation Parameters in Panel Data Model

/ n ) are compared. The logic is: if the two

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

SIMPLE LINEAR REGRESSION

Linear Approximation with Regularization and Moving Least Squares

Transcription:

11.4.1 Estmaton of Multple Regresson Coeffcents In multple lnear regresson, we essentally solve n equatons for the p unnown parameters. hus n must e equal to or greater than p and n practce n should e at least 3 or 4 tmes as large as p. he dfference etween the oserved and predcted value of y (usng regresson) or the error s = y yˆ. he regresson coeffcents are otaned y mnmzng the sum of squares of errors. In matrx form, the n equatons can e wrtten as Y = X + e (11.36) where Y = (n x 1) column vector of the dependent varale, X = (n x p) matrx of ndependent varales, = (p x 1) column vector of the regresson coeffcents, and e = (n x 1) column vector of resduals. he resduals are condtoned y: E[e] = 0 (11.37) Cov e I (11.38) where I = (n x n) dagonal dentty matrx wth dagonal elements = 1 and off-dagonal elements = 0; and σ e = varance of (Y X). Accordng to the least squares prncple the estmates of regresson parameters are those whch mnmze the resdual sum of squares e e. Hence e e = (Y X) (Y-X) (11.39) s dfferentated wth respect to, and the resultng expresson s set equal to zero. hs gves: X X X Y (11.40) whch are called the normal equatons. Multplyng oth sdes wth (X X) -1 leads to an explct expresson for : (X X) X Y = (p * 1) (p * n)(n * p) (p * n) (n * 1) -1

(11.41) Note that the ndependent varales should e chosen such that none of these s a lnear comnaton of other ndependent varales. he propertes of the estmator : Cov X X 1 (11.4) By (1) and () the total adjusted sum of squares Y Y can e parttoned nto an explaned part due to regresson and an unexplaned part aout regresson, as follows: Y Y X Y e e. (11.43) where (X) Y = sum of squares due to regresson; e e = sum of squares aout regresson. hs equaton states: otal sum of squares aout mean = regresson sum of squares + resdual sum of Squares he mean squares values of the rght hand sde terms n (11.43) are otaned y dvdng the sum of squares y ther correspondng degrees of freedom. If s a (p x 1)-column vector,.e. there are p-ndependent varales n regresson, then the regresson sum of squares has p- degrees of freedom. Snce the total sum of squares has (n-1)-degrees of freedom (note: 1 degree of freedom s lost due to the estmaton of ȳ), t follows y sutracton that the resdual sum of squares has (n-1-p)-degrees of freedom. It can e shown that the resdual mean square S e : S e e e (11.44) n 1 p Is an unased estmate of σ ε.he estmate se of σ ε s the standard error of estmate. he analyss of varance (ANOVA) tale (see ale 11.) summarzes the sum of squares quanttes. ale 11.: Analyss of varance tale (ANOVA) Source Sum of squares Degrees of freedom otal S Y = Y Y n

Mean ny 1 Regresson X Y - ny p-1 Resdual Y Y - X Y n-p As for the smple lnear regresson a measure for the qualty of the regresson equaton s the coeffcent of determnaton, defned as the rato of the explaned or regresson sum of squares and the total adjusted sum of squares. R m X Y e 1 Y Y Y e Y (11.45) 11.4. Confdence Intervals on the Regresson Lne o place confdence lmts on Y 0 where Y 0 = X 0 t s necessary to have an estmate for the varance of Ŷ 0. Consderng Cov() as gven n (5) the varance Var(Ŷ 0 ) s gven y: 1 X X X VarY0 Se X 0 0 he confdence lmts for the mean regresson equaton are gven y (11.46) CL X t a /, n p Var (11.47) 0 1 Y0 Coeffcent of Determnaton (R ) Let Z,j = (X,j - x j )/S j (11.48) where x j and S j are the mean and standard devaton of the j th ndependent varale. he correlaton matrx s: R = Z Z/(n-1) = [R,j ] (11.49) where R,j s the correlaton etween the th and j th ndependent varales. R s a symmetrc matrx snce R,j = R j,. he coeffcent of determnaton s defned as R = Sum of squares due to regresson / Sum of squares aout mean

or R =( X Y - ny )/(Y Y - ny ) (11.50) ( Lˆ, U ˆ ) = ( ˆ -t S ˆ, ˆ t(1- /), (n-p) S ˆ) (1- /), (n-p) Here s the transpose of vector of sze (1xp), and Y s the transpose of vector Y of sze (1xn). Let the resdual error e = Y X. R s the part of the total sum of squares conceted for mean that s explan y the regresson equaton. It ranges etween 0 and 1 and closer t s to1, the etter s the regresson. 11.4.3 Inferences on Regresson Coeffcents () Confdence ntervals on Assumng that the model s correct, the quantty ˆ / S degrees of freedom. he confdence ntervals on are gven as ˆ follows a t-dstruton wth (n-p) (11.51) () est of hypothess concernng he hypothess that the th varale s not contrutng sgnfcantly to explanng the varaton n the dependent varale s equvalent to testng the hypothess H o : = o versus H a : o. he test s conducted y computng: t = (ˆ - o ) / S ˆ (11.5) Null hypothess H o s rejected f t > t (1-/), (n-p). If ths hypothess s accepted, t s advsale to delete the concerned varale from the regresson model. Sgnfcance of the overall regresson he null hypothess H o : 1 = = p = 0 versus H a : at least one of these 's s not zero s used to test whether the regresson equaton s ale to explan a sgnfcant amount of varaton of Y or not. he rato of the mean square error due to regresson to the resdual mean square has an F dstruton wth p-1 and n-p degrees of freedom. Hence, the hypothess s tested y computng the test statstc: ( X Y - ny )/(p- 1) F = (Y Y - ˆ XY)/(n- p)

(11.53) H o s rejected f F exceeds the crtcal value F (1-), (p-1), (n-p). Confdence Intervals on Regresson Lne: o put the confdence lmts on Y = X, t s necessary to estmate the varance of ŷ. hs s gven y S Yˆ S X ( X ' X ) 1 X ' (11.54) where L, U ) = { Yˆ -t, Yˆ +t } Confdence Intervals on Indvdual Predcted Value of Y S ˆ Y Yˆ L, U') = { Yˆ -t K = X K ˆ = S [ 1+X (X X, Yˆ +t ) -1 X ] } (11.55) (11.56) Example 11.5: ale contans ranfall for the months of July and August and dscharge for the August month for a catchment. Estmate the parameters of lnear regresson and multple lnear regresson and fnd out f there s an advantage n usng multple lnear regresson n ths case. ( (1- /), (n-p) SYˆ (1- /), (n-p) SYˆ ( (1- /), (n-p) SY ˆ (1- /), (n-p) SY ˆ ale 11.3 Data and computatons for multple lnear regresson example YEAR Comp. Q Os Q Comp. Q RF-JUL RF-AUG y Mult. (Qo- Aug y Ln (Qo-Q L )^ (MCM) (MCM) Ln Reg Q M )^ (MCM) Reg (Q L ) (Q M ) 198 500.04 15664.05 5996.939 6830.0 694015.6 6873.0 76753 1983 7980.13 6546.4 557.916 363.6 497987.7 357.3 108983 1984 300.36 13086.63 4395.515 581.9 034467.0 4736.5 1164 1985 857.75 753.13 575.0 3649. 4308915.7 4314.0 1990914 1986 54.03 5799.34 53.373 971.4 19787. 045. 3739 1987 6311.05 95.80 774.517 447.9 733589. 4353.5 49339 1988 6040.00 785.46 4163.013 355.7 3747.7 313.1 108147

1989 1597.33 69.49 046.694 3410.8 186070.7 1068.8 95635 1990 8561.71 6889.43 4190.084 3397.8 6765.7 3988.7 40541 1991 7153.31 1566.8 6107.45 5618.5 39036.6 67.3 14365 199 563.67 1063.08 5145.44 4717.4 183188.8 4433.0 507510 1993 433.30 7108.91 300.774 3483.7 139981.8 73. 759 1994 13076.88 1047.3 8994.085 4799. 17596705.8 7679.9 177088 1995 6843.64 8068.47 3695.11 3859.0 6865.5 385.6 4788 1996 7819.49 9330.16 4870.4 435.5 68196.6 4893.4 531 1997 9403.8 744.9 3943.455 3607.3 113005.8 4610.9 445541 1998 7040.85 8306.55 3801.77 395.1 64.6 4054.5 63883 1999 7380.56 9987.30 5895.899 4609.6 1654653.4 5036. 739043 000 860.8 483.79 1501.445 378.6 769480.6 713.5 1469074 001 9113.46 5071.5 670.739 686.8 56.8 3314.4 414338 00 196.93 11168.68 319.95 5071.7 359547.4 3060.5 17531 003 7493.84 7784.6 3708.33 3748.0 157.8 3985.1 76593 Sum 14747.41 191085.61 9009.88 9009.88 3.91E+07 9.0E+04 1.4E+07 Average 6701.5 8685.71 4100.45 4100.449 Soluton: Usng the data gven n the tale, lnear regresson equaton of the followng form was estalshed etween the ranfall and oserved dscharge for the August month. Q A = a+ R A where Q A = dscharge for August and R A = ranfall for August. he parameters a and were estmated to yeld the followng equaton: Q A = 703.05 + 0.391 R A Coeffcent of determnaton R = 1 3.91*10 7 /6.33*10 7 = 0.38. Next, dscharge for the August month was computed y usng the aove equaton and the sum of square of resduals turned out to e 3.91*10 7. In case of multple lnear regresson, the ndependent varales were the ranfall for the month July and August and dependent varale as the dscharge for August. Regresson equaton of the followng form was envsaged Q A = a + 1 R J + R A

where R J = ranfall for July month. After computatons, the followng regresson equaton was otaned. Q A = -3058.4 + 0.4 R J + 0.50 R A Coeffcent of determnaton R = 1 1.4*10 7 /6.33*10 7 = 0.78. he dscharge for August was computed y LR and MLR equatons and the sum of squares of errors were computed. he values were 3.91*10 7 for LR and 1.4*10 7 for MLR. When these values are compared along wth R for the two cases, t can e concluded that MLR gves much mproved estmates of the dscharge compared to LR.