Generating OLS Results Manually via R

Similar documents
lm statistics Chris Parrish

Inference for Regression Simple Linear Regression

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

MS&E 226: Small Data

The Classical Linear Regression Model

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

General Linear Model (Chapter 4)

Least Squares Solutions for Overdetermined Systems Joel S Steele

STAT 3022 Spring 2007

Inference for Regression Inference about the Regression Model and Using the Regression Line

STAT 350: Summer Semester Midterm 1: Solutions

Ch 2: Simple Linear Regression

Motor Trend Car Road Analysis

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

ECON3150/4150 Spring 2016

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

ECON3150/4150 Spring 2015

14 Multiple Linear Regression

Handout 4: Simple Linear Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Applied Regression Analysis

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018

Empirical Application of Simple Regression (Chapter 2)

The Simple Linear Regression Model

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

MS&E 226: Small Data

sociology 362 regression

MORE ON SIMPLE REGRESSION: OVERVIEW

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Simple Linear Regression

Section Least Squares Regression

ST505/S697R: Fall Homework 2 Solution.

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

Two-Variable Regression Model: The Problem of Estimation

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Multiple Regression Analysis

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24

Statistical Inference with Regression Analysis

R Output for Linear Models using functions lm(), gls() & glm()

Multiple Regression: Example

INFERENCE FOR REGRESSION

1 Use of indicator random variables. (Chapter 8)

Lecture 6 Multiple Linear Regression, cont.

Inference with Heteroskedasticity

Computer Exercise 3 Answers Hypothesis Testing

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

ECO220Y Simple Regression: Testing the Slope

Biostatistics 380 Multiple Regression 1. Multiple Regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Leftovers. Morris. University Farm. University Farm. Morris. yield

Chapter 12: Linear regression II

STOR 455 STATISTICAL METHODS I

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

ECON 497 Midterm Spring

2 Prediction and Analysis of Variance

ECON3150/4150 Spring 2016

Addition and subtraction: element by element, and dimensions must match.

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Statistical Modelling in Stata 5: Linear Models

Density Temp vs Ratio. temp

POL 213: Research Methods

Lecture 4: Regression Analysis

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Lecture 19 Multiple (Linear) Regression

Inference for Regression

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

1 Least Squares Estimation - multiple regression.

L2: Two-variable regression model

1 Independent Practice: Hypothesis tests for one parameter:

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

x 21 x 22 x 23 f X 1 X 2 X 3 ε

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Lecture 4 Multiple linear regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

Essential of Simple regression

Lecture 5. In the last lecture, we covered. This lecture introduces you to

ANOVA (Analysis of Variance) output RLS 11/20/2016

Ch 3: Multiple Linear Regression

1 Introduction 1. 2 The Multiple Regression Model 1

Lecture 9 SLR in Matrix Form

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Simple linear regression: estimation, diagnostics, prediction

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing

Week 3: Simple Linear Regression

Simple Linear Regression

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Beam Example: Identifying Influential Observations using the Hat Matrix

Instrumental Variables, Simultaneous and Systems of Equations

The Statistical Sleuth in R: Chapter 7

Statistics II Exercises Chapter 5

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Weighted Least Squares

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

. a m1 a mn. a 1 a 2 a = a n

Transcription:

Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command on STATA give quick and well compiled results. With this ease, however, people often don t know or forget how to actually conduct these analyses manually. In this article, we manually recreate regression results created via the lm package in R. Using the mtcars data set, which is pre-loaded into R, we produce regression results using the lm package. We randomly select mpg as the independent variable, and disp, hp, and wt as the independent variables. Find the R Output below: mpg = β 0 + β 1 disp + β 2 hp + β 3 wt + ɛ (1) # Loading the Data data ( mtcars ) # LM Package f o r OLS lm package < lm( mtcars $mpg mtcars $ d i s p + mtcars $hp + mtcars $wt) #show r e s u l t s summary(lm package ) Call : lm( formula = mtcars $mpg mtcars $ d i s p + mtcars $hp + mtcars $wt) R e s i d u a l s : Min 1Q Median 3Q Max 3.891 1.640 0.172 1.061 5.861 C o e f f i c i e n t s : Estimate Std. Error t value Pr( t ) ( I n t e r c e p t ) 37.105505 2.110815 17.579 < 2e 16 mtcars$ disp 0.000937 0.010350 0.091 0.92851 mtcars$hp 0.031157 0.011436 2.724 0.01097 mtcars$wt 3.800891 1.066191 3.565 0.00133 1

S i g n i f. codes : 0 0.001 0.01 0.05. 0. 1 1 Residual standard e r r o r : 2.639 on 28 d e g r e e s o f freedom Multiple R squared : 0. 8 2 6 8, Adjusted R squared : 0.8083 F s t a t i s t i c : 44.57 on 3 and 28 DF, p value : 8.65 e 11 We start with getting the correct β coefficients for the independent variables. Since there are 32 observations, and 3 independent variables, we will have the following model. Y 32 1 = X 32 4 β 4 1 + ɛ 32 1 (2) y 1 y 2.. = y 32 1 disp 1 hp 1 wt 1 1 disp 2 hp 2 wt 2........ + 1 disp 32 hp 32 wt 32 ɛ 1 ɛ 2.. ɛ 32 We know that the formula to calculate the vector or β coefficients is as follows. β = (X X) 1 (X Y ) (3) For the vector X, we must remember to add the vector of ones as there is a constant in our model. Using the specification in equation (2), we can define the the matrices of X, and Y and use equation(3) to generate the vector of β. Find the R Output below: #Independent V a r i a b l e (Y) Y < matrix ( mtcars$mpg) ind < mpg #Dependent v a r i a b l e s (X) v < mtcars$ disp v < rep ( 1, length ( mtcars$ disp ) ) X < matrix ( v ) X < cbind (X, mtcars$disp, mtcars$hp, mtcars$wt ) dep < c ( constant, disp, hp, wt ) #Matrix Operations #Transpose o f X X t < t (X) 2

#Generating X Y f i r s t < X t% %Y #Generating (X X) second < X t % % X #I n v e r s e ( (X X)ˆ 1) second < solve ( second ) # Beta Vector = (X X)ˆ 1 (X Y) manual < second 2 % % f i r s t manual mpg constant 37.1055052690 disp 0.0009370091 hp 0.0311565508 wt 3.8008905826 So here we see that we have exactly reproduced the vector of β. The next step is to try and recreate some of the measures, starting with the R Squared measure. T SS = RSS + ESS (4) T SS = (y i ȳ) 2 (5) ESS = (y i ŷ) 2 (6) Find the R output below: R 2 = 1 ESS T SS (7) #C a l c u l a t i n g R Squared #Mean o f Y (Y bar ) TSS i < (Y Y bar ) ˆ 2 TSS < sum (TSS i ) #Error Sum o f Squares (ESS) # C a l c u l a t i n g the P r e d i c t e d v a l u e o f Y (Y hat ) Beta 1 < manual [ 1 ] Beta 2 < manual [ 2 ] Beta 3 < manual [ 3 ] 3

Beta 4 < manual [ 4 ] Y hat < Beta 1 + ( Beta 2 mtcars$ disp ) + ( Beta 3 mtcars$hp ) + ( Beta 4 mtcars$wt) ESS i < (Y Y hat )ˆ2 ESS < sum (ESS i ) # Rsquared R squared < 1 ESS/TSS R squared [ 1 ] 0.8268361 #Summary S t a t s f o r R e s i d u a l s summary(y Y hat ) mpg Min. : 3.891 1 s t Qu.: 1.640 Median : 0. 172 Mean : 0.000 3 rd Qu. : 1.061 Max. : 5.861 We move on to calculating the F Stat. Find the R output below. MSM = RSS DF M (8) MSM = p 1 (9) MSE = ESS DF E (10) DF E = n p (11) f = MSM MSE (12) #C a l c u l a t i n g F S t a t # Regression Sum o f Squares RSS = TSS ESS 4

# Regression Degrees o f Freedom DFM = 4 1 #Mean Squares o f Model MSM = RSS/DFM # Error Degrees o f Freedom DFE = 32 4 #Mean Square Error MSE = ESS/ DFE #F S t a t a f = MSM/MSE f [ 1 ] 44.56552 Finally, we calculate the standard errors of the coefficients. Find the R Output Below: V [ ˆβ] = σ 2 (X X) 1 (13) ˆσ 2 = ɛ ɛ n p (14) # Standard Errors # R e s i d u a l s resid < Y X % % manual # Estimating sigma square ( sigma hat square ) sigma hat square < ( t ( resid ) % % resid )/(32 4) # Variance Covariance Matric o f Beta h a t vcov beta < c ( sigma hat square ) solve ( t (X) % % X) # Standard Errors se < sqrt ( diag ( vcov beta ) ) se constant disp hp wt 2.11081525 0.01034974 0.01143579 1.06619064 5

Thus, we have manually reproduced all of the key statistics that had been produced by the lm package, namely - the coefficients, the R Squared statistic, the F Statistic, and the standard errors of the coefficients. It is obvious that process of computing these statistics manually is time consuming and inefficient. The available software packages are much more efficient. Having said that, it would be useful to conduct this exercise from time to time to refresh one s theoretical knowledge of regression analysis. Finally, if someone is mechanically using software packages without understanding the underlying theory behind regression analysis, then conducting such an exercise is highly recommend. 6