Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Similar documents
Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Economics & Business

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 11: Simple Linear Regression and Correlation

Lecture 6: Introduction to Linear Regression

Introduction to Regression

e i is a random error

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Comparison of Regression Lines

Statistics for Business and Economics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics MINITAB - Lab 2

Chapter 15 - Multiple Regression

x i1 =1 for all i (the constant ).

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Learning Objectives for Chapter 11

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Basic Business Statistics, 10/e

Chapter 13: Multiple Regression

STAT 3008 Applied Regression Analysis

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Lecture 3 Stat102, Spring 2007

Negative Binomial Regression

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Chapter 9: Statistical Inference and the Relationship between Two Variables

STATISTICS QUESTIONS. Step by Step Solutions.

/ n ) are compared. The logic is: if the two

Correlation and Regression

The Ordinary Least Squares (OLS) Estimator

Economics 130. Lecture 4 Simple Linear Regression Continued

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

18. SIMPLE LINEAR REGRESSION III

17 - LINEAR REGRESSION II

28. SIMPLE LINEAR REGRESSION III

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

β0 + β1xi. You are interested in estimating the unknown parameters β

Chapter 14 Simple Linear Regression

Introduction to Analysis of Variance (ANOVA) Part 1

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Statistics II Final Exam 26/6/18

Linear Regression Analysis: Terminology and Notation

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Topic 7: Analysis of Variance

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Biostatistics 360 F&t Tests and Intervals in Regression 1

β0 + β1xi. You are interested in estimating the unknown parameters β

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 8 Indicator Variables

SIMPLE LINEAR REGRESSION

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Properties of Least Squares

Polynomial Regression Models

x = , so that calculated

January Examinations 2015

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

β0 + β1xi and want to estimate the unknown

a. (All your answers should be in the letter!

Lecture 2: Prelude to the big shrink

experimenteel en correlationeel onderzoek

Chapter 8 Multivariate Regression Analysis

First Year Examination Department of Statistics, University of Florida

Lab 4: Two-level Random Intercept Model

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

ANOVA. The Observations y ij

T E C O L O T E R E S E A R C H, I N C.

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Lecture 4 Hypothesis Testing

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Kernel Methods and SVMs Extension

Diagnostics in Poisson Regression. Models - Residual Analysis

This column is a continuation of our previous column

Laboratory 3: Method of Least Squares

A Robust Method for Calculating the Correlation Coefficient

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

Topic 23 - Randomized Complete Block Designs (RCBD)

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Regression Analysis. Regression Analysis

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Topic- 11 The Analysis of Variance

Statistics Chapter 4

Chapter 3 Describing Data Using Numerical Measures

Chapter 15 Student Lecture Notes 15-1

Transcription:

Chapter 4 Smple Lnear Regresson Page. Introducton to regresson analyss 4- The Regresson Equaton. Lnear Functons 4-4 3. Estmaton and nterpretaton of model parameters 4-6 4. Inference on the model parameters 4-5. Sums of Squares and the ANOVA table 4-5 6. An example 4-7. Estmaton and Predcton 4-3 8. Standardzed regresson coeffcents 4-8 9. Addtonal concerns and observatons 4-3 4- A. Karpnsk

The Regresson Equaton. Overvew of regresson analyss Regresson analyss s generally used when both the ndependent and the dependent varables are contnuous. (But modfcatons exst to handle categorcal ndependent varables and dchotomous dependent varables.) Type of Analyss Independent Varable Dependent Varable ANOVA Categorcal Contnuous Regresson Categorcal Analyss (Contngency Table Analyss) Contnuous or Categorcal Contnuous Categorcal Goals of regresson analyss: o To descrbe the relatonshp between two varables o To model responses on a dependent varable o To predct a dependent varable usng one or more ndependent varables o To statstcally control the effects of varables whle examnng the relatonshp between the ndependent and dependent varable Regresson analyss s usually performed on observatonal data. In these cases, we descrbe, model, predct, and control, but we cannot make any causal clams regardng these relatonshps 4- A. Karpnsk

Termnology n regresson analyss o As n ANOVA, we wll develop a model to explan the data DATA = MODEL + ERROR o The model assumes greater mportance n regresson. Unlke ANOVA, we are usually nterested n the model parameters o The goal of most regresson models s to use the nformaton contaned n a set of varables to predct a response. As a result, we use slghtly dfferent termnology n regresson, compared to ANOVA. ANOVA Dependent varable Independent varables REGRESSION Dependent varable or Response varable or Outcome varable Independent varables or Predctor varables 4-3 A. Karpnsk

Smple Lnear Regresson The Regresson Equaton. Lnear Functons The goal of smple lnear regresson s to descrbe an outcome varable (Y) as a lnear functon of a predctor varable (). The end result wll be a model that defnes the equaton of a straght lne Y = b + a Where b = the y-ntercept a = the slope o Let s consder a smple example: Y = + 3 The y-ntercept s - The lne crosses the y-axs at y = The slope of the lne s /3 The slope s a measure of the steepness of the lne The slope s the change n y assocated wth a unt change n x Two data ponts A straght lne through two ponts y-axs 4 3 - - - 3 4 5 6 - x-axs y-axs 4 3 - - - 3 4 5 6 - x-axs 4-4 A. Karpnsk

o Let s revew the method covered n hgh school algebra for determnng the lne that falls through ponts: (, ) & (5, ) Frst, we compute the slope of the lne y y slope = x x ( ) slope = 5 ( ) = =.333 6 We nterpret the slope as the change n y assocated wth a unt change n x In ths example, for every unt ncrease n x, y wll ncrease by.333 x - 3 4 5 y - -.667 -.333.333.667 We compute the y-ntercept by fndng the value of y when x = We can use: the equaton for the slope of a lne and the coordnates of ether known pont to solve for, y ) ( ( x, y) Let s use (5, ) y.333 = 5. 333(5) = y y =.667 y =.667 Fnally, we use the slope and the ntercept to wrte the equaton of the lne through the ponts Y = b + a Y =. 667 +.333( ) 4-5 A. Karpnsk

3. Estmaton and nterpretaton of model parameters Wth real data, the ponts rarely fall drectly on a straght lne. Regresson s a technque to estmate the slope and the y-ntercept from nosy data Because not every pont wll fall on the regresson lne, there wll be error n our model DATA = MODEL + ERROR o The DATA, or the outcome we want to predct s the Y varable o The MODEL s the equaton of the regresson lne, b + b b = the populaton value of the ntercept b = the populaton value of the slope = the predctor varable o The ERROR s devaton of the observed data from our regresson lne. We refer to the ndvdual error terms as resduals o The full smple lnear regresson model s gven by the followng equaton: DATA = MODEL + ERROR Y b + b + ε = Some key characterstcs of ths model o We can only model lnear relatonshps between the outcome varable and the predctor varable o The model can be expanded to nclude the lnear relatonshps between multple predctor varables and a sngle outcome Y = b + b + b +... + b k k + ε 4-6 A. Karpnsk

Predcted values and resduals o Wth real data, we need to estmate the value of the slope and the ntercept. (Detals on the estmaton process wll follow shortly.) Y = bˆ + bˆ + ε ˆb = the estmated value of the ntercept ˆb = the estmated value of the slope o Based on the model, we have a best guess as to the partcpant s response on the outcome varable ˆ = bˆ + bˆ Y In other words, we use the equaton of the lne we have developed to estmate how each partcpant responded on the outcome varable Yˆ s called the predcted value or ftted value for the th partcpant o If the actual response of the partcpant devates from our predcted value, then we have some ERROR n the model. We defne the resdual to be the devaton of the observed value from the predcted value. DATA = MODEL + ERROR Y = ( bˆ + bˆ ) + e Y = Y ˆ + e e = Y Yˆ o If we want to know f our model s a good model, we can examne the resduals. If we have many large resduals, then there are many observatons that are not predcted well by the model. We say that the model has a poor ft. If most of the resduals are small, then our model s very good at explanng responses on the Y varable. Ths model would have a good ft. 4-7 A. Karpnsk

o Let s consder a smple example to llustrate these ponts Y 3 4 4 5 5 4 3 3 4 5 6 We notce that a straght lne can be drawn that goes drectly through three of the 5 observed data ponts. Let s use ths lne as our best guess lne ~ Y = + Now we can calculate predcted values and resduals Y Yˆ e 3 4 3-4 5 4 4-8 A. Karpnsk

In the prevous example, we eyeballed a regresson lne. We would lke to have a better method of estmatng the regresson lne. Let s consder desrable two propertes of a good regresson lne. The sum of the resduals should be zero ( yˆ) = y If we have ths property, then the average resdual would be zero In other words, the average devaton from the predcted lne would be zero. Overall, we would lke the resduals to be as small as possble We already requre the resduals to sum to zero, by property (). So, let s requre the sum of the squared resduals to be as small as possble. Ths approach has the added beneft of penalzng large resduals more than small resduals ( y yˆ ) = mnmum o Estmatng a regresson lne usng these two propertes s called the ordnary least squares (OLS) estmaton procedure o Estmates of the ntercept and slope are called the ordnary least squares (OLS) estmates o To solve for these estmates, we can use the followng procedure We want to mnmze SSE = ( Y ˆ Y ) = ( Y b b ) We take the dervatves of SSE wth respect to b and b, set each equal to zero, and solve for b and b SSE SSE = and = b b We ll skp the detals and jump to the fnal estmates SS ˆ Y bˆ Y b SS = b = Where SS SS Y = = n n ( ( ) )( Y Y ) 4-9 A. Karpnsk

Now, let s return to our example and examne the least squares regresson lne 5 4 Eyeball Lne 3 LS Lne 3 4 5 6 Let s compare the least squares regresson lne to our eyeball regresson lne ~ Y = + Y ˆ =. +. 7 Y Data Eyeball Least Squares Y ~ e ~ ~e Yˆ ê ê.6.4.6.3 -.3.9 3. 4 3 -.7 -.7.49 4 5 4 3.4.6.36 e e. o For both models, we satsfy the condton that the resduals sum to zero o But the least squares regresson lne produces the model wth the smallest squared resduals Note that other regresson lnes are possble o We could mnmze the absolute value of the resduals o We could mnmze the shortest dstance to the regresson lne 4- A. Karpnsk

4. Inference on the model parameters We have learned how to estmate the model parameters, but also want to perform statstcal tests on those parameters SS ˆ Y bˆ Y b SS = ˆ b = Frst, let s estmate the amount of error n the model, σ o Intutvely, the greater the amount of error n a sample, the more dffcult t wll be to estmate the model parameters. o The error n the model s captured n the resduals, e We need to calculate the varance of the resduals Recall a varance s the average squared devaton from the mean When appled to resduals, we obtan But we know ε = ( ε ε ) Var( ε ) = N Varˆ( ( ˆ ε ) ( ˆ ) Y Y ) = ˆ σ = = N N ε ε = SSResdual N Why use N-? A general heurstc s to use N (number of parameters ftted) In ths case, we have estmated two parameter: the slope and the ntercept Recall that for Var(), we dvded by N-. We only estmated one parameter (the grand mean) Ths heurstc also appled for ANOVA. 4- A. Karpnsk

And so we are left wth ( ˆ ε ) SSresd Varˆ( ε ) = = N N # of parameters = MSresd And we are justfed usng nvolvng the regresson model MSresd as the error term for tests o Interpretng MSresd: Resduals measure devaton from regresson lne (the predcted values) The varance of the resduals captures the average squared devaton from the regresson lne So we can nterpret MSresd as a measure of average devaton from the regresson lne. SPSS labels MSresd as standard error of the estmate Now that we have an estmate of the error varance, we can proceed wth statstcal tests of the model parameters We can perform a t-test usng our famlar t-test formula t ~ estmate standard error of the estmate o We know how to calculate the estmates of the slope and the ntercept. All we need are standard errors of the estmates 4- A. Karpnsk

Inferences about the slope, ˆb o Dervng the samplng dstrbuton of ˆb tedous. We ll skp the detals (see an advanced regresson textbook, f nterested) and the end result s: std. error ( bˆ ) MSresd ( ) = o Thus, we can conduct the followng statstcal test: H : b = H : b t( N ) ~ bˆ standard error ( bˆ ) o We can also easly compute confdence ntervals around ˆb estmate ± tα /, df bˆ * standard error of MSresd ( ) ± tα /, df * estmate o Conclusons If the test s sgnfcant, then we conclude that there s a sgnfcant lnear relatonshp between and Y For every one-unt change n, there s a unt change n Y ˆb If the test s not sgnfcant, then there s no sgnfcant lnear relatonshp between and Y Utlzng the lnear relatonshp between and Y does not sgnfcantly mprove our ablty to predct Y, compared to usng the grand mean. There may stll exst a sgnfcant non-lnear relatonshp between and Y 4-3 A. Karpnsk

Inferences about the ntercept, b o b tells us the predcted value of Y when = o The test of b s automatcally computed and dsplayed, but be careful not to msnterpret ts sgnfcance! o Only rarely are we nterested n the value of the ntercept o Agan, we ll skp the detals concernng the dervaton of the samplng dstrbuton of ˆb (see an advanced regresson textbook, f nterested) and the end result s: std. error ( bˆ ) ( = MSresd N ) o Thus, we can conduct the followng statstcal test: H H : b : b = t( N ) ~ bˆ standard error ( bˆ ) o We can also easly compute confdence ntervals around ˆb estmate ± tα /, df bˆ * standard error of ( estmate ) ± tα /, df * MSresd N 4-4 A. Karpnsk

5. Sums of Squares n Regresson and the ANOVA table Total Sums of Squares (SST) o In ANOVA, the total sums of squares were the sum of the squared devatons from the grand mean o We wll use ths same defnton n regresson. SST s the sum of the squared devatons from the grand mean of Y 5 SST = n = ( Y Y ) 4 3 Mean(Y) 3 4 5 6 Sums of Squares Regresson o In ANOVA, we had a sum of squares for the model. Ths SS captured the mprovement n our predcton of Y based on all the terms n the model o In regresson, we can also examne how much we mprove our predcton (compared to the grand mean) by usng the regresson lne to predct new observatons If we had not conducted a regresson, then our best guess for a new value of Y would be the mean of Y, Y But we can use the regresson lne to make better predctons of new observatons ˆ = bˆ + bˆ Y 4-5 A. Karpnsk

The devaton of the regresson best guess (the predcted value) from the grand mean s the SS Regresson. 5 SSReg = n = ( Yˆ Y ) 4 3 LS Lne Mean(Y) 3 4 5 6 Sums of Squares Error / Resdual o The resduals are the devatons of the predcted values from the observed values e = Y Yˆ o The SS Resdual s the amount of the total SS that we cannot predct from the regresson model n SSResd = ( Y Yˆ ) 5 = 4 3 LS Lne Mean(Y) 3 4 5 6 4-6 A. Karpnsk

Sums of Squares parttonng o We have three SS components and we can partton them n the followng manner n = SST = SSreg + n SSresd ( Y ) = ( ˆ ) + ( ˆ Y Y Y Y Y ) = o In ANOVA, we had a smlar partton SST = SSmodel + SSerror It turns out that ANOVA s a specal case of regresson. If we set up a regresson wth categorcal predctors, then we wll fnd SSreg = SSmodel SSresd = SSerror Every analyss we conducted n ANOVA, can be conducted n regresson. But regresson provdes a much more general statstcal framework (and thus s frequently called the general lnear model ). Where there are sums of squares, there s an ANOVA table. o Based on the SS decomposton, we can construct an ANOVA table Source SS df MS F Regresson n SSReg = Yˆ (# of parameters) SSreg MSreg ( Y ) - = df MSresd Resdual n SSResd = Y Yˆ N SSresd ( ) (# of parameters) df = Total SST = n = ( Y Y ) N- 4-7 A. Karpnsk

o The Regresson test examnes all of the slope parameters n the model smultaneously. Do these parameters sgnfcantly mprove our ablty to predct Y, compared to usng the grand mean to predct Y? H : b = b =... = bk = : b j ' s = H Not all o For smple lnear regresson, we only have one slope parameter. Ths test becomes a test of the slope of b H : b = : b H In other words, for smple lnear regresson, the Regresson F-test wll be dentcal to the t-test of the b parameter Ths relatonshp wll not hold for multple regresson, when more than one predctor s entered nto the model Calculatng a measure of varance n Y accounted for by o SS Total s a measure of the total varablty n Y SST = n = ( Y Y ) SST Var ( Y ) = N o The SS Regresson s the part of the total varablty that we can explan usng our regresson lne o As a result, we can consder the followng rato, R to be a measure of the proporton of the sample varance n Y that s explaned by R = SSReg SSTotal R s analogous to η n ANOVA 4-8 A. Karpnsk

o But n ANOVA, we preferred a measure varance accounted for n the populaton ( ω ) rather than n the sample ( η ). o The regresson equvalent of ω s called the Adjusted R. Any varable (even a completely random varable) s unlkely to have SSReg exactly equal to zero. Thus, any varable we use wll explan some of the varance n the sample Adjusted R corrects for ths overestmaton by penalzng R for the number of varables n the regresson equaton What happens f we take the square root of R? R = SSReg SSTotal o R s nterpreted as the overall correlaton between all the predctor varables and the outcome varable o When only one predctor s n the model, R s the correlaton between and Y, r Y 4-9 A. Karpnsk

6. An example Predctng the amount of damage caused by a fre from the dstance of the fre from the nearest fre staton Fre Damage Data Fre Damage Dstance from Staton (Thousands (Mles) of Dollars) 3.4 6..6 9.6.8 7.8 4.3 3.3 4.6 3.3. 4..3 3.. 7.3 3. 7.5 6. 43. 5.5 36. 4.8 36.4.7 4. 3.8 6. 3..3 Dstance from Staton (Mles) Fre Damage (Thousands of Dollars) Always plot the data frst!!! 4. dollars 3..... 4. mles 6. 8. 4- A. Karpnsk

In SPSS, we use the Regresson command to obtan a regresson analyss REGRESSION /DEPENDENT dollars /METHOD=ENTER mles. Varables Entered/Removed b Model Varables Entered Varables Removed Method MILES a. Enter a. All requested varables entered. b. Dependent Varable: DOLLARS Ths box tells us that MILES was entered as the only predctor Ths box gves us measures of the varance accounted for by the model Model Summary Model R R Square Adjusted R Square Std. Error of the Estmate.96 a.93.98.3635 a. Predctors: (Constant), MILES MSE ANOVA b Here s our old frend the ANOVA table Model Sum of Squares df Mean Square F Sg. Regresson 84.766 84.766 56.886. a Resdual 69.75 3 5.365 Total 9.57 4 a. Predctors: (Constant), MILES b. Dependent Varable: DOLLARS Unstandardzed Coeffcents Coeffcents a Standardzed Coeffcents Model B Std. Error Beta t Sg. (Constant).78.4 7.37. MILES 4.99.393.96.55. a. Dependent Varable: DOLLARS These are the tests of the ntercept and the slope 4- A. Karpnsk

o From ths table, we read that bˆ =. 78 and that bˆ = 4. 99. Usng ths nformaton we can wrte the regresson equaton Y ˆ =.78 + 4.99 * o To test the slope: H H : b : b = We fnd a sgnfcant lnear relatonshp between the dstance from the fre, and the amount of damage caused by the fre, t ( 3) =.53, p <.. For every mle from the fre staton, the fre caused an addtonal $4,99 n damage o Note that the t-test for ˆβ s dentcal to the Regresson test on the ANOVA table because we only have one predctor n ths case. o In ths case, the test of the ntercept s not meanngful You can also easly obtan 95% confdence ntervals around the parameter estmates REGRESSION /STATISTICS coeff r anova c /DEPENDENT dollars /METHOD=ENTER mles. o COEFF, R and ANOVA are defaults COEFF prnts the estmates of b and b R prnts R and Adjusted R ANOVA prnts the regresson ANOVA table 4- A. Karpnsk

o Addng CI to the STATISTICS command wll prnt the confdence ntervals for all model parameters Model Unstandardzed Coeffcents Coeffcents a Standard zed Coeffcen ts 95% Confdence Interval for B B Std. Error Beta t Sg. Lower Bound Upper Bound (Constant).78.4 7.37. 7. 3.346 MILES 4.99.393.96.55. 4.7 5.768 a. Dependent Varable: DOLLARS b ˆ = 4.99 t ( 3) =.53, p <. 95% CI for ˆb : bˆ * ( ˆ ) ± tα /, df std.error b 4.99 +.6(.393) (4.7, 5.77) 7. Estmaton and predcton One of the goals of regresson analyss s to allow us to estmate or predct new values of Y based on observed values. There are two knds of Y values we may want to predct o Case I: We may want to estmate the mean value of Y, Ŷ, for a specfc value of In ths case, we are attemptng to estmate the mean result of many events at a sngle value of For example, what s the average damage caused by (all) fres that are 5.8 mles from a fre staton? 4-3 A. Karpnsk

o Case II: We may also want to predct a partcular value of Y, Yˆ, for a specfc value of In ths case, we are attemptng to predct the outcome of a sngle event at a sngle value of For example, what would be the predcted damage caused by a (sngle) fre that s 5.8 mles from a fre staton? In ether case, we can use our regresson equaton to obtan an estmated mean value or partcular value of Y Y ˆ =.78 + 4.99 * o For a fre 5.8 mles from a staton, we substtute = 5. 8 nto the regresson equaton Yˆ =.78 + 4.99 * 5.8 Yˆ = 38.8 The dfference n these two uses of the regresson model les n the accuracy (varance) of our estmate of the predcton Case I: Varance of the estmate the mean value of Y, Yˆ, at p o When we attempt to estmate a mean value, there s one source of varablty: the varablty due to the regresson lne We know the equaton of the regresson lne: ˆ = bˆ + bˆ Y Var Yˆ) = Var( bˆ + bˆ ( Skppng a few detals, we arrve at the followng equaton ) ˆ σ Yˆ = Var( Yˆ) = MSE N ( p ) + S 4-4 A. Karpnsk

o And thus, the equaton for the confdence nterval of the estmate of the mean value of Y, Yˆ s Yˆ ± t σ ˆ α /, N Yˆ Yˆ ± tα /, N MSE N ( p ) + S Case II: Varance of the predcton of a partcular value of Y, Yˆ, at p o When we attempt to predct a sngle value, there are now two sources of varablty: the varablty due to the regresson lne and varablty of Y around ts mean Predcton lmts f the mean value of Y s at the lower bound Predcton lmts f the mean value of Y s at the upper bound Ŷ Confdence nterval for mean value of Y Predcton nterval for a sngle value of Y o The varance for the predcton nterval of a sngle value must nclude these two forms of varablty ˆ σ Y ˆ ˆ σ = Yˆ σ + ε σ Yˆ ( p ) = MSE + + N S 4-5 A. Karpnsk

o And thus, the equaton for the predcton nterval of the estmate of a partcular value of Y, Yˆ s Yˆ ± t σ Yˆ ± tα /, N ˆ α /, N Y ˆ ( p ) MSE + + N S Luckly, we can get SPSS to perform most of the ntermedate calculatons for us, but we need to be sneaky o Add a new lne to the data fle wth a mssng value for Y and = p o Ask SPSS to save the predcted value and the standard error of the predcted value when you run the regresson REGRESSION /MISSING LISTWISE /DEPENDENT dollars /METHOD=ENTER mles /SAVE PRED (pred) SEPRED (sepred). 4-6 A. Karpnsk

We wll have two new varables n the data fle PRED Yˆ for the value SEPRED for the value σˆ Yˆ DOLLARS MILES PRED SEPRED 6. 3.4 7.365.59993 7.8.8 9.37.834 3.3 4.6 3.9685.7949 3..3.5939.7 7.5 3. 5.5785.64 36. 5.5 37.3345.573 4..7 3.746.7663.3 3. 5.359.68 9.6.6 3.689.655 3.3 4.3 3.435.7985 4...685.7566 7.3. 5.6899.4439 43. 6. 4.8585.587 36.4 4.8 33.897.8453 6. 3.8 8.9739.6399. 5.8 38.85.564 For = 5. 8 p Y ˆ = 38.8 ˆ ˆ =. 56 σ Y o Use the formulas to compute the confdence and predcton ntervals To calculate a 95% confdence nterval around the mean value, Ŷ ˆ Y ± t ˆ α /, N σ Yˆ 38.8± t.5, 3 (.56) 38.8± (.6)(.56) ( 36.3, 4.3) To calculate a 95% predcton nterval around the sngle value, Yˆ Yˆ ± t ˆ Y ± t α /,N ˆ α /, N σ Y ˆ ˆ σ ε + ˆ σ Y ˆ 38.8± t.5,3 5.365 + (.56) 38.8± (.6)(.589) (3.8, 45.54) 4-7 A. Karpnsk

The regresson lne can be used for predcton and estmaton, but not for extrapolaton In other words, the regresson lne s only vald for s wthn the range of the observed s SPSS can be used to graph confdence ntervals and predcton ntervals Confdence Bands Predcton Bands 4. 4. dollars 3. dollars 3....... 4. 6. 8... 4. 6. 8. mles mles 8. Standardzed regresson coeffcents To nterpret the slope parameter, we must return to the orgnal scale of the data b = 56 suggests that for every one unt change n the varable, Y changes by 56 unts. Ths dependence on unts can make for dffculty n comparng the effects of on Y across dfferent studes o If one researcher measures self-esteem usng a 7 pont scale and another uses a 4 pont scale, they wll obtan dfferent estmates of b o If one researcher measures length n centmeters and another uses nches, they wll obtan dfferent estmates of b One soluton to ths problem s to use standardzed regresson coeffcents β = b σ σ Y 4-8 A. Karpnsk

To understand how to nterpret standardzed regresson coeffcents, t s helpful to see how they can be obtaned drectly o Transform both Y and nto z-scores, z Y and z compute zmles = (mles - 3.8)/.576. compute zdollar = (dollars - 6.4)/8.6898. o Regress z Y on z REGRESSION /DEPENDENT zdollar /METHOD=ENTER zmles. Model Unstandardzed Coeffcents (Constant) ZMILES a. Dependent Varable: ZDOLLAR Coeffcents a Standard zed Coeffcen ts B Std. Error Beta t Sg. 4.3E-4.74.6.996.96.77.96.55. b = β =.96 o Compare ths result to the regresson on the raw data REGRESSION /DEPENDENT dollars /METHOD=ENTER mles. Model (Constant) MILES Unstandardzed Coeffcents a. Dependent Varable: DOLLARS Coeffcents a Standard zed Coeffcen ts B Std. Error Beta t Sg..78.4 7.37. 4.99.393.96.55. β =.96 4-9 A. Karpnsk

o To nterpret standardzed beta coeffcents, we need to thnk n terms of z-scores A standard devaton change n (mles), s assocated wth a.96 standard devaton change n Y (dollars) For smple lnear regresson (wth only predctor), β = ry Wth more than predctor, standardzed coeffcents should not be nterpreted as correlatons. It s possble to have standardzed coeffcents greater than. 9. Addtonal concerns and observatons Standard assumptons of regresson analyss ε ~ NID (, σ ) o All observatons are ndependent and randomly selected from the populaton (or equvalently, the resdual terms, ε s, are ndependent) o The resduals are normally dstrbuted at each level of o The varance of the resduals s constant across all levels of Addtonally, we assume that the regresson model s a sutable proxy for the correct (but unknown) model: o The relatonshp between and Y must be lnear o No mportant varables have been omtted from the model o No outlers or nfluental observatons These assumptons can be examned by lookng at the resduals 4-3 A. Karpnsk