Introduction to Regression

Similar documents
Chapter 11: Simple Linear Regression and Correlation

Statistics for Economics & Business

Statistics MINITAB - Lab 2

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Business and Economics

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 9: Statistical Inference and the Relationship between Two Variables

Comparison of Regression Lines

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Basic Business Statistics, 10/e

Learning Objectives for Chapter 11

18. SIMPLE LINEAR REGRESSION III

Lecture 6: Introduction to Linear Regression

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

28. SIMPLE LINEAR REGRESSION III

The Ordinary Least Squares (OLS) Estimator

Regression Analysis. Regression Analysis

Chapter 13: Multiple Regression

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Chapter 14 Simple Linear Regression

Lecture 3 Stat102, Spring 2007

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

β0 + β1xi. You are interested in estimating the unknown parameters β

Biostatistics 360 F&t Tests and Intervals in Regression 1

STAT 3008 Applied Regression Analysis

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Chapter 15 - Multiple Regression

17 - LINEAR REGRESSION II

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

β0 + β1xi. You are interested in estimating the unknown parameters β

e i is a random error

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Introduction to Analysis of Variance (ANOVA) Part 1

Economics 130. Lecture 4 Simple Linear Regression Continued

x i1 =1 for all i (the constant ).

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

III. Econometric Methodology Regression Analysis

STATISTICS QUESTIONS. Step by Step Solutions.

/ n ) are compared. The logic is: if the two

β0 + β1xi and want to estimate the unknown

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 15 Student Lecture Notes 15-1

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Statistics II Final Exam 26/6/18

Some basic statistics and curve fitting techniques

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Negative Binomial Regression

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Polynomial Regression Models

Linear Regression Analysis: Terminology and Notation

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

T E C O L O T E R E S E A R C H, I N C.

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

The SAS program I used to obtain the analyses for my answers is given below.

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Topic 7: Analysis of Variance

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

SIMPLE LINEAR REGRESSION

Statistics Chapter 4

Chapter 10. What is Regression Analysis? Simple Linear Regression Analysis. Examples

Correlation and Regression

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

Unit 10: Simple Linear Regression and Correlation

January Examinations 2015

a. (All your answers should be in the letter!

Lecture 4 Hypothesis Testing

Laboratory 1c: Method of Least Squares

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Laboratory 3: Method of Least Squares

Chapter 3 Describing Data Using Numerical Measures

First Year Examination Department of Statistics, University of Florida

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Midterm Examination. Regression and Forecasting Models

Cathy Walker March 5, 2010

Kernel Methods and SVMs Extension

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Sociology 470. Bivariate Regression. Extra Points. Regression. Liying Luo Job talk on Thursday 11/3 at Pond 302

Properties of Least Squares

Linear Feature Engineering 11

Systems of Equations (SUR, GMM, and 3SLS)

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

Transcription:

Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes estmates from a model Allows for nference and testng hypotheses Extends our abltes from ANOVA Enable us to test theores We wll start wll smple, bvarate models: Y s a functon of a sngle X varable And then move toward the complex, multvarate models: Y s a functon of a set of X varables 2 Regresson We are lookng at the relatonshp between two or more varables One s called the Dependent Varable (Y), whch s to modeled or predcted The others are called Independent Varables (X or a set of Xs), whch are used to explan, estmate, or predct Y In a bvarate (two varable) case, one way to express the relatonshp s n terms of covarance and correlaton: Expressed as a lnear measures of assocaton Symmetrc measures Regresson s an extenson of correlaton/covarance Stll lnear No longer symmetrc Covarance s the basc buldng block of regresson 3 Regresson Models Regresson models represent an assortment of models wth assumptons about the dependent varable - contnuous or dscrete - and the form of the relatonshp wth ndependent varables - lnear or nonlnear We wll focus on the followng Bvarate Regresson Models Multvarate Lnear Nonlnear Dscrete Lnear Nonlnear Dscrete 4

Fahrenhet versus Celsus Fttng a Lne to the data When I go to Europe I have to deal wth temperatures n Celsus How do I convert from C to F? A frend once told me a quck rule of thumb was to double C and add 30 I used my calculator to make a small data set of values And I used t n a regresson F C 10-12.22 15-9.44 20-6.67 25-3.89 30-1.11 32 0.00 35 1.67 45 7.22 50 10.00 55 12.78 60 15.56 65 18.33 70 21.11 75 23.89 80 26.67 85 29.44 90 32.22 95 35.00 5 The relatonshp between F and C s perfect, r = 1 It s a determnstc functon I wll run a regresson of F on C and see what equaton I get. Regresson wll generate a best fttng lne to the data In Excel I wll use Tools, Data Analyss Regresson F Fahrenhet Vs Celsus Temperature 100 90 80 70 60 50 40 30 20 10 0 y = 1.8x + 32-20.0-10.0 0.0 10.0 20.0 30.0 40.0 C 6 Regresson of F on C Ths s the regresson result from Excel The estmated equaton s F = 32 + 1.8 C SUMMARY OUTPUT Regresson Statstcs Multple R 1 R Square 1 Adjusted R Square 1 Standard Error 7.0966E-05 Observatons 18 ANOVA df SS MS F Sg F Regresson 1 12372.94 12372.94 2456783461497.83 0.00 Resdual 16 0.00 0.00 Total 17 12372.94 Coeffcents Std Error t Stat P-value Lower 95% Upper 95% Intercept 32.0 0.0 1519489.6 0.0 32.0 32.0 C 1.8 0.0 1567413.0 0.0 1.8 1.8 7 Requrements of Regresson We specfy one varable as Dependent Usually represented as Y It must be measured as a contnuous varable not a dchotomy or ordnal The dependent varable s thought to be a functon of one or more Independent varables Usually represented as X Can be contnuous, dchotomes, or ordnal Regresson s lmted to Lnear Relatonshps n the parameters n the form of: Y = b0 + b1x1 + b2x2 +... bkxk We wll wll have k ndependent varables 8

Nonlnear relatonshps that can be represented by regresson It s possble to represent a nonlnear relatonshp wth a lnear approach, such as a Polynomal or Log functon Log functon take the log of both sdes Y = ax b Ln(Y) = Ln(a)+ b*ln(x) Polynomal of the kth order Y = b0 + b1x + b2x 2 + b3x 3 + bkx k It s not terrbly restrctve to be lmted to lnear relatonshps The equaton of a Lne I suspect you have seen the equaton of a lne wrtten as Y = mx + b Where m s the slope and b s the ntercept We specfy a dependent varable Y, and ndependent varable X We wll use the form Y = b0 +b1x1 Note: n multple regresson there may be more than one X: Y = b0 + b1x1 + b2x2 When referrng to the populaton I wll use Greek terms: 9 Y =! 0 +! 1 X 1 10 Equaton of a Lne In realty, we often have a random component Y = 5 +.5X X=0 then Y=5 The ntercept X=10 then Y=10 X=20 then Y=15 X=30 then Y=20 The slope shows how much Y changes for a unt change n X: Y changes.5 for each 1 unt change n X Ths s a determnstc model - there s an exact relatonshp between the two varables A Probablstc Model has a determnstc component and a random error component, denoted as e or! Our Expectaton of Y s the determnstc component Y! = " 0 + " 1X 1 + 1 E( Y ) =! +! 0 1X 1 11 12

The error term n our model Have we seen the error term before? The error component s very mportant Observed n populaton/sample Predcted from model The dfference between what we predct and what we observe Y " +! = 0 + " 1X 1 ) Y =! +! X o " = Y )! Y 1 1 Consder the followng model usng the mean Y = µ +! " A smple model based on the mean! " = Y µ Devatons about the mean!" 2 =!( Y µ ) 2 Sum of Squared Devatons!" 2 /n =!( Y µ ) 2 /n Mean Squared Devatons!" 2 /n = 2 Populaton Varance 13 14 The error term n Regresson s mportant! The error term n regresson s a measure of the: Varance of the Model Standard Devaton of the Model And ultmately contrbutes to the estmate of the Standard Error for our coeffcents We wll assume equal varances for Y (dependent varable) across each level of X (ndependent varable) In essence we wll pool the measure of the varance n regresson Ths s called, Homoscadastcty 15 How do we ft a lne to our data? We wll use the property of Least Squares We wll fnd estmates for "0 and "1 that wll mnmze the squared devatons about the ftted lne Frst an example, and then the detals A catalog sales company whch sells electronc equpment wants to mprove ts marketng campagn They collect data on a random sample of 1,000 customers The man varable of nterest s the amount of sales (n dollars) n the prevous year. 16

There s an Excel fle (Catalogs.xls) and a JMP fle (Catalogs.jmp) Y s the Dependent Varable: SALES X s the Independent Varable: SALARY The correlaton between SALES and SALARY s.700 Look at a Scatter Plot Excel wll add a trendlne and an equaton and R-square whch s based on regresson Catalog Sales data 17 Estmated Regresson of Sales on Salary SALES = -15.332 +.022(SALARY) If SALARY = 0 SALES = -15.332 +.022(0) SALES = -15.332 A unt change n SALARY ($1) results n a.022 change n SALES Ths s better expressed as: $1,000 change n SALARY results n Sales of $22.00 Our predcton of SALES for a household wth a SALARY of $50,000 s: SALES = -15.332 +.022($50,000) SALES = $1,084.67 I wll refer to ths as solvng the equaton for a person wth a salary of $50,000 18 How to do ths n Excel Organze data n columns One column contans Y (dependent) Remanng Columns contan contguous Xs (ndependent) TOOLS Data Analyss Regresson Specfy Y varable Specfy X varables need to be contguous columns (for more Xs n model, columns must be next to each other) Remember to specfy f frst row has labels Specfy Output I modfy the output How many decmal places are showng (3 to 4) Change Headngs to make them ft Bold Headers 19 The correlaton and R-square The ANOVA Table The estmated coeffcents SUMMARY OUTPUT of SALES Regressed on SALARY Regresson Statstcs Multple R 0.700 R Square 0.489 Adjusted R Square 0.489 Standard Error 687.068 Observatons 1000 Excel output ANOVA df SS MS F Sg F Regresson 1 451624335.68 451624335.68 956.71 0.000 Resdual 998 471117860.07 472061.98 Total 999 922742195.74 Coef. Std Error t Stat P-value Lower 95% Upper 95% Intercept -15.332 45.374-0.338 0.736-104.373 73.708 SALARY 0.021961 0.000710 30.931 0.000 0.021 0.023 20

Response SALES Whole Model Summary of Ft Output from JMP RSquare RSquare Adj Root Mean Square Error Mean of Response Observatons (or Sum Wgts) Analyss of Varance Source Model Error C. Total Lack Of Ft Source Lack Of Ft Pure Error Total Error Term Intercept SALARY DF 1 998 999 DF 634 364 998 Estmate -15.31783 0.0219608 Sum of Squares 451615197 471114029 922729225 Parameter Estmates 0.489434 0.488923 687.0649 1216.77 1000 Sum of Squares 345072568 126041460 471114029 Std Error 45.37416 0.00071 Mean Square 451615197 472058.14 544278 346268 F Rato 956.6940 Prob > F <.0001* F Rato Mean Square 1.5718 Prob > F t Rato -0.34 30.93 Prob> t 0.7357 <.0001* <.0001* Max RSq 0.8634 21 A few ponts about our model It s possble to predct outsde the range of the data When Salary = 0: SALES = -15.332 +.022($0) = -$15.33 When Salary = 1,000,000 SALES = -15.332 +. 022($1,000,000) = $21,985 The model parameters should be nterpreted only wthn the sampled range of the ndependent varables The predcton part of our model s determnstc, but we know we wll have some error our predcton won t match the data exactly We are fttng a model to the data All models are wrong, some models are useful George Box We wll have the ablty to test coeffcents and construct confdence ntervals - there s a known samplng dstrbuton for regresson coeffcents 22 How to generate a Best Fttng Lne We wll use the property of Least Squares We wll fnd estmates for "0 (ntercept) and "1 (slope) that wll mnmze the squared devatons about the ftted lne Best Ft means the Dfference Between Actual Y Values & Predcted Y Values Are a Mnmum Least Squares generates a set of coeffcents that mnmzes the Sum of the Squared Errors (SSE) n SSE = (Y! Y ) n " ) 2 ) = " 2 =mnmum =1 =1 23 B-varate Regresson Formulas for estmates of!0 and!1 I wll tend to use b0 and b1 for the estmated values The slope coeffcent s based on the covarance of Y and X, adjusted for the varablty n X The Intercept s based on the estmate of b1 and the means of the other varables ) Y = )! 0 + )! 1 X 1 SS! ) 1 = SS XY X where SS XY = (X " X )(Y "Y ) = X Y " ( X SS X = (X " X ) 2 = X 2 ) 2 " n )! 0 = Y " )! 1 X X n 24 Y

Summary Regresson s a strategy to model the relatonshp of a set of ndependent varables (Xs) on a dependent varable (Y). We say we regress Y on X or as set of Xs. Regresson estmates a best fttng lne to the data by mnmzng squared devatons about that lne. It s a natural extenson of much of what we have covered before, especally ANOVA. We wll cover the regresson output, the ANOVA table, understandng the regresson coeffcents, nference n regresson, and multple regresson. 25