Chapter 10. What is Regression Analysis? Simple Linear Regression Analysis. Examples

Similar documents
Chapter 11: Simple Linear Regression and Correlation

Statistics for Economics & Business

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Business and Economics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Introduction to Regression

Chapter 15 Student Lecture Notes 15-1

Chapter 9: Statistical Inference and the Relationship between Two Variables

Basic Business Statistics, 10/e

Comparison of Regression Lines

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Lecture 6: Introduction to Linear Regression

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Chapter 13: Multiple Regression

Statistics MINITAB - Lab 2

Chapter 15 - Multiple Regression

Chapter 14 Simple Linear Regression

The Ordinary Least Squares (OLS) Estimator

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Correlation and Regression

Learning Objectives for Chapter 11

Negative Binomial Regression

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

STATISTICS QUESTIONS. Step by Step Solutions.

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

/ n ) are compared. The logic is: if the two

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

e i is a random error

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

28. SIMPLE LINEAR REGRESSION III

STAT 3008 Applied Regression Analysis

18. SIMPLE LINEAR REGRESSION III

January Examinations 2015

Lecture 3 Stat102, Spring 2007

x i1 =1 for all i (the constant ).

Statistics II Final Exam 26/6/18

Economics 130. Lecture 4 Simple Linear Regression Continued

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

III. Econometric Methodology Regression Analysis

β0 + β1xi. You are interested in estimating the unknown parameters β

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

Regression Analysis. Regression Analysis

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

The SAS program I used to obtain the analyses for my answers is given below.

β0 + β1xi and want to estimate the unknown

Polynomial Regression Models

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

SIMPLE LINEAR REGRESSION

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Diagnostics in Poisson Regression. Models - Residual Analysis

Midterm Examination. Regression and Forecasting Models

Chapter 8 Indicator Variables

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Biostatistics 360 F&t Tests and Intervals in Regression 1

Basically, if you have a dummy dependent variable you will be estimating a probability.

Properties of Least Squares

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

β0 + β1xi. You are interested in estimating the unknown parameters β

Scatter Plot x

17 - LINEAR REGRESSION II

Regression. The Simple Linear Regression Model

Chapter 3 Describing Data Using Numerical Measures

This column is a continuation of our previous column

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

STAT 511 FINAL EXAM NAME Spring 2001

Linear Regression Analysis: Terminology and Notation

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

a. (All your answers should be in the letter!

NUMERICAL DIFFERENTIATION

UNIVERSITY OF TORONTO. Faculty of Arts and Science JUNE EXAMINATIONS STA 302 H1F / STA 1001 H1F Duration - 3 hours Aids Allowed: Calculator

Unit 10: Simple Linear Regression and Correlation

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

Lecture 4 Hypothesis Testing

Topic 7: Analysis of Variance

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Lecture 2: Prelude to the big shrink

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Linear Approximation with Regularization and Moving Least Squares

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

IV. Modeling a Mean: Simple Linear Regression

Transcription:

Chapter 10 Smple Lnear Regresson Analyss What s Regresson Analyss? A statstcal technque that descrbes the relatonshp between a dependent varable and one or more ndependent varables. Examples Consder the relatonshp between constructon permts (x) and carpet sales (y) for a company. OR Relatonshp between advertsng expendtures and sales There probably s a relatonshp......as number of permts ncreases, sales should ncrease....set advertsng expendture and we can predct sales But how would we measure and quantfy ths relatonshp? Smple Lnear Regresson Model (SLR) Assume relatonshp to be lnear Y = a + bx + Where Y = dependent varable X = ndependent varable a = y-ntercept b = slope = random error 1

Random Error Component () Makes ths a probablstc model... Represents uncertanty random varaton not explaned by x Determnstc Model = Exact relatonshp Example: Temperature: o F = 9/5 o C + 3 Assets = Labltes + Equty Probablstc Model = Det. Model + Error Graphcally, SLR lne s dsplayed as... 15 lne of means 10 Y 5 0 0 10 0 30 40 50 X Model Parameters a and b Estmated from the data Data collected as a par (x,y) Process of Developng SLR Model Hypothesze the model: E(Y) = a + bx Estmate Coeffcents yˆ aˆ bx ˆ Specfy dstrbuton of error term How adequate s the model? When model s approprate, use t for estmaton and predcton

Fttng the Straght-Lne Model Ordnary Least Squares (OLS) Once t s assumed that the model s Y = a + bx + Next we must collect the data Before estmatng parameters, we must ensure that the data follows a lnear trend Use scatterplot, scattergram, scatter dagram Monthly Carpet Sales A Scatter Plot of the Data Carpet Cty Problem 15 10 5 0 0 10 0 30 40 50 Monthly Constructon Permts Assessng Ft Assessng Ft (Devatons) Monthly Carpet Sales 15 10 5 0 Carpet Cty Problem 0 10 0 30 40 50 Monthly Constructon Permts aka errors or resduals (r, e ) Dfference between the observed value of y and the predcted value of y e r Want r to be small y yˆ 3

Assessng Ft (Cont.) NOTE: Sum of the resduals s 0 e ( y yˆ ) 0 Least Squares Lne Fnd the lne that mnmzes y ŷ wth respect to the parameters Can ft many dfferent lnes; whch one s best? Lne that best fts the data s the one that mnmzes the sum of squares of the errors (SSE). Ths s the least squares lne. Recall that Mnmze yˆ aˆ bˆ x y aˆ bˆ x Least Squares Lne (Cont.) Example 1 Estmated parameters yeld smallest SSE Estmated coeffcents are gven by: ˆ SS b SS aˆ y bx ˆ SS SS xy xx xy xx x xy y x x The Central Company manufactures a certan specalty tem once a month n a batch producton run. The number of tems produced n each run vares from month to month as demand fluctuates. The company s nterested n the relatonshp between the sze of the producton run (x) and the number of man-hours of labor (y) requred for the run. The company has collected the followng data for the 10 most recent runs: 4

Example 1 (Cont.) Example 1 (Cont.) Run Number of tems Labor (man-hours) 1 40 83 30 60 3 70 138 4 90 180 5 50 97 6 60 118 7 70 140 8 40 75 9 80 159 10 70 144 Estmated Regresson Equaton ŷ 1.836.01x Interpretaton of Regresson Equaton What does ths mean? ŷ 1.836.01x x = # of tems produces y = # of man-hours of labor State conclusons n terms of problem Intercept: when no tems are produced, the est. # of hrs. s -1.836. Does ths make sense? No! Interpretaton (Cont.) When usng regresson to predct a response, the value of the ndependent varable must fall n the range of the orgnal data. Predctons made outsde of the range of the data s called EXTRAPOLATION and may have lttle or no valdty. In our example, our ndependent varable ranges from 30 to 90, and predctons should be made n ths range. 5

Interpretaton (Cont.) Slope: every unt change n x, the average value of y wll change by the slope In the example,.01 mples that for every tem produced, the average # of man-hours s expected to ncrease by.01. Example The Tr-Cty Offce Equpment Corporaton sells an mported desk calculator on a franchse bass and performs preventve mantenance and repar servce on ths calculator. Data has been collected from 18 recent calls on users to perform routne preventve mantenance servce; for each call, x s the number of machnes servced and y s the total number of mnutes spent by the servce person. Example (Cont.) Obtan Estmated Regresson Equaton SS bˆ SS xy xx aˆ y bx ˆ 64 x xy y x x yˆ.34 14.7383x 14.73834.5 1,098 14.7383 74.5.34 Example (Cont.) Interpretatons Intercept: When no machnes are servced, the reparman spends an avg. of -.34 mnutes; Note that x=0 s probably not n the range of the data, so ntercept makes no sense. Slope: For each machne servced, we would expect approx. 14.74 mnutes of servce tme spent 6

Model Assumptons E() = 0 Var() = s normally dstrbuted I are ndependent Y The Nature of a Statstcal Relatonshp Regresson Curve Before performng regresson analyss, these assumptons should be valdated. Probablty dstrbutons for Y at dfferent levels of X X Assumptons for Regresson Descrptve Measures of Assocaton Coeffcent of Determnaton (R ) Unknown Relatonshp Y = 0 + 1 X 7 7

Y Y Error Decomposton Y (actual value) Y -Y {* } Y -Y ^ ^ } Y (estmated value) Y ^ -Y Coeffcent of Determnaton (Cont.) 0 SSE SS yy 0 R 1 Larger R, the more varablty s explaned by the regresson model yˆ aˆ bx ˆ X Coeffcent of Determnaton (Cont.) Coeffcent of Determnaton (Cont.) 30 5 0 Y 15 10 5 0 0 10 0 30 40 50 X 30 5 0 Y 15 10 5 0 0 10 0 30 40 50 X 8

Correlaton Coeffcent (r) Postve square root of R aka Pearson product-moment correlaton coeffcent Untless -1 r 1 Descrbes the strength of the relatonshp between x and y Correlaton Coeffcent (r) Computatonal Formula SSxy r SSxxSSyy -1 mples strong negatve relatonshp 0 mples no relatonshp +1 mples strong postve relatonshp Measures of Assocaton (Cont.) Hgh correlaton does not mply causaton. What does ths mean? Estmaton and Predcton Satsfed wth the model, we can perform: Estmaton of the mean value of y for a gven value of x Predcton of a new observaton for a gven value of x Where do we expect to have the most success? 9

Estmaton & Predcton (Cont.) Scatter Plot of Correct Model The ftted SLR model s yˆ aˆ bx ˆ Estmatng y at a gven value of x, say x p, yelds the same value as predctng y at a gven value of x p. Dfference s n precson of the estmate... the samplng errors Y = 3.0 + 0.5X R = 0.67 38 Scatter Plot of Curvlnear Model Scatter Plot of Outler Model Y = 3.0 + 0.5X R = 0.67 39 Y = 3.0 + 0.5X R = 0.67 40 10

Scatter Plot of Influental Model Verfyng Assumptons Y = 3.0 + 0.5X R = 0.67 41 4 Examnng Resdual Plots Regresson and Excel Excel also has a bult-n tool for performng regresson that: s easer to use provdes a lot more nformaton about the problem To nstall the Regresson tool, Tools AddIns Analyss ToolPak Then to perform the analyss Data Data Analyss Regresson 43 11

The TREND( ) Functon TREND(Y-range, X-range, X-value for predcton) where: Y-range s the spreadsheet range contanng the dependent Y varable, X-range s the spreadsheet range contanng the ndependent X varable(s), X-value for predcton s a cell (or cells) contanng the values for the ndependent X varable(s) for whch we want an estmated value of Y. Enterng the Central Company Data (see Example 1) Note: The TREND( ) functon s dynamcally updated whenever any nputs to the functon change. However, t does not provde the statstcal nformaton provded by the regresson tool. It s best to use these two dfferent approaches to regresson n conjuncton wth one another. Important Software Note Regresson Output When usng more than one ndependent varable, all varables for the X-range must be n one contguous block of cells (that s, n adjacent columns). SUMMARY OUTPUT Regresson Statstcs Multple R 0.997739951 R Square 0.99548501 Adjusted R Square 0.99490636 Standard Error.80535817 Observatons 10 ANOVA df SS MS F Sgnfcance F Regresson 1 13881.44118 13881.44118 1763.875549 1.13834E-10 Resdual 8 6.9588353 7.86985941 Total 9 13944.4 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept -1.83594118 3.0198958-0.60773449 0.56015547-8.799190869 5.1860634-8.799190869 5.1860634 Number of Items.0058835 0.048110941 41.99851841 1.13834E-10 1.909644135.13153336 1.909644135.13153336 1

Regresson Plot Man-hours of Labor Central Company 00 y =.006x - 1.8353 180 R = 0.9955 160 140 10 100 80 60 40 0 30 40 50 60 70 80 90 100 Number of Items Multple Regresson & Model Buldng Most regresson problems nvolve more than one ndependent varable. If each ndependent varable vares n a lnear manner wth y, the estmated regresson functon n ths case s: ˆ ˆ ˆ X ŷ a b1x 1 bx b k ˆ The optmal values for the b can agan be found by mnmzng the ESS. The resultng functon fts a hyperplane to our sample data. k Example Regresson Surface for Two Independent Varables Y * * * * * * ** * * * * * * * * ** * * * * * X X 1 13

Example Admssons data In SLR, we had x 1 = Entrance test score y = End of year GPA Suppose other factors nvolved x = HS GPA x 3 = SAT score Model becomes y = a + b 1 x 1 + b x + b 3 x 3 + Independent Varables May represent hgher-order terms x 1 = age x = age May be dummy/ndcator varables x 3 0, f female 1, f male May be functons of ndependent varables x 4 = prce x 5 = ndustry average prce x 6 = prce dfference = x 5 x 4 Steps to Developng Multple Regresson Model 1. Hypothesze the model: y = a + b 1 x 1 + + b k x k +. Estmate coeffcents 3. Specfy dstrbuton of and estmate 4. Valdate model assumptons 5. Evaluate model adequacy 6. Use for estmaton and predcton Fttng the Model b represents the change n y wth respect to each unt change n x when ALL other x s are held constant Method of fttng s the same as n SLR Estmate b s to mnmze SSE Computatonally ntensve Use MS Excel 14

Multple Regresson Salsberry Realty Salsberry Realty sells homes along the east coast of the Unted States. One of the questons frequently asked by prospectve buyers s: If we purchase ths home, how much can we expect to pay to heat t durng the wnter? The research department at Salsberry has been asked to develop some gudelnes regardng heatng costs for sngle famly homes. Three varables are thought to relate to the heatng costs: (1) the mean daly outsde temperature, () the number of nches of nsulaton n the attc, and (3) the age of the furnace. To nvestgate, Salsberry s research department selected a random sample of 0 recently sold homes. They determned the cost to heat the home last January, as well as the mean outsde temperature durng January n the regon, the number of nches of nsulaton n the attc, and the age of the furnace. The sample nformaton s gven below. Home Multple Regresson Salsberry Realty Heatng Cost ($) Mean Outsde Temperature ( o F) Attc Insulaton (nches) Age of Furnace (years) 1 50 35 3 6 360 9 4 10 3 165 36 7 3 4 43 60 6 9 5 9 65 5 6 6 00 30 5 5 7 355 10 6 7 8 90 7 10 10 9 30 1 9 11 10 10 55 5 11 73 54 1 4 1 05 48 5 1 13 400 0 5 15 14 30 39 4 7 15 7 60 8 6 16 7 0 5 8 17 94 58 7 3 18 190 40 8 11 19 35 7 9 8 0 139 30 7 5 Multple Regresson Salsberry Realty Determne the multple regresson equaton. Whch varables are the ndependent varables? Whch varable s the dependent varable? Use MS Excel to develop a regresson equaton. Dscuss the regresson coeffcents. Why does t ndcate that some are postve and some are negatve? What s the ntercept value? What s the estmated heatng cost for a home where the mean outsde temperature s 30 degrees, there are 5 nches of nsulaton n the attc, and the furnace s 10 years old? Multple Regresson Salsberry Realty The hypotheszed model s gven by y = a + b 1 x 1 + b x + b 3 x 3 + where y = heatng cost x 1 = mean outsde temp. x = attc nsulaton x 3 = age of furnace = random error 15

Scatterplot 1 Mean Temp. vs. Heatng Cost Scatterplot Attc Insulaton vs. Heatng Cost 450 450 400 400 350 350 300 300 Heatng Cost 50 00 Heatng Cost 50 00 150 150 100 100 50 50 0 0 0 10 0 30 40 50 60 70 0 4 6 8 10 1 14 Mean Outsde Temperature Attc Insulaton Scatterplot 3 Age of Furnace vs. Heatng Cost Multple Regresson Salsberry Realty 450 400 SUMMARY OUTPUT Heatng Cost 350 300 50 00 150 Regresson Statstcs Multple R 0.89675599 R Square 0.804170066 Adjusted R Square 0.767451954 Standard Error 51.04855358 Observatons 0 ANOVA df SS MS F Sgnfcance F 1710.478 57073.49094 1.9011803 6.56178E-06 Regresson 3 Resdual 16 41695.7717 605.95483 Total 19 1915.75 100 50 0 0 4 6 8 10 1 14 16 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 47.1938033 59.6014931 7.167509374.3764E-06 300.844446 553.5431606 Mean Outsde -4.586666 0.77319353-5.933636915.10035E-05-6.19906146 -.945419105 Temperature (F) Attc Insulaton (nches) -14.8308669 4.7544181-3.11938977 0.006605963-4.909764-4.751961175 Age of Furnace (years) 6.10103061 4.0110166 1.50650381 0.14786484 -.404808 14.60634494 Age of Furnace 16

Multple Regresson Salsberry Realty Estmated regresson equaton: ŷ 47.19 4.58x 14.83x 6.10 Dscusson Meanngful nterpretatons of coeffcents Check range of each ndependent varable 1 x3 Estmate the heatng cost for a mean outsde temp. of 30 0 F, there are 5 n. of nsulaton, and the furnace s 10 years old. Estmaton and Predcton Model Assumptons: Same as SLR : Estmaton of the varance, : s d SSE MSE n k 1 ~ N 0, Usng Dummy/Indcator Varables Qualtatve varables can also be used n the regresson model Dummy/ndcator or bnary (0, 1) varables denote the presence or absence of the varable of nterest Usng Dummy/Indcator Varables A qualtatve varable wth c classes wll be represented by (c-1) dummy/ndcator varables n the model, wth each takng on the values of 0 and 1. Example: Suppose we have an ndependent var. that represents type of det: Weght Watchers, Atkns, Body for Lfe, and Proten. Note we have 4 classes (c = 4) We wll need (c-1) = 3 varables n the model 17

Usng Dummy/Indcator Varables Types of Det could be modeled as: 1, f WW 0, otherwse x1 x x 3 1, f Atkns 0, otherwse 1, f BFL 0, otherwse Moton Pcture Industry Example A moton pcture ndustry analyst wants to estmate the gross earnngs generated by a move. The estmate wll be based on dfferent varables nvolved n the flm's producton. The ndependent varables consdered are X 1 = producton cost of the move and X = total cost of all promotonal actvtes. A thrd varable (X 3) that the analyst wants to consder s whether or not the move s based on a book publshed before the release of the move. The analyst obtans nformaton on a random sample of 0 Hollywood moves made wthn the last fve years. The data s gven n the followng table. The model could resemble: y = a + b 1 x 1 + b x + b 3 x 3 + Moton Pcture Industry Example Moton Pcture Industry Example Move Gross Earnngs, Mllons $ Producton Cost, Mllons $ Promoton Cost, Mllons $ Book 1 8 4. 1 No 35 6.0 3 Yes 3 50 5.5 6 Yes 4 0 3.3 1 No 5 75 1.5 11 Yes 6 60 9.6 8 Yes 7 15.5 0.5 No 8 45 10.8 5 No 9 50 8.4 3 Yes 10 34 6.6 No 11 48 10.7 1 Yes 1 8 11.0 15 Yes 13 4 3.5 4 No 14 50 6.9 10 No 15 58 7.8 9 Yes 16 63 10.1 10 No 17 30 5.0 1 Yes 18 37 7.5 5 No 19 45 6.4 8 Yes 0 7 10.0 1 Yes Prepare a scatter plot of gross earnngs versus producton cost and promoton cost. Does there appear to be a lnear relatonshp between gross earnngs and ether producton cost or promoton cost. If the analyst were to use a smple lnear regresson model to predct gross earnngs, whch varable should be used? Explan. Determne the parameter estmates for the model gven by Yˆ ˆ ˆ aˆ b1 X1 b X Analyze the results. Determne the parameter estmates for the model gven by Yˆ ˆ ˆ ˆ aˆ b1 X1 b X b3 X 3 Does X 3 help explan the gross earnngs when X 1 and X are also n the model? Explan. 18

Gross Earnngs v. Promoton Cost Gross Earnngs v. Producton Cost 90 90 80 80 70 70 60 Gross Earnngs 60 50 40 Gross Earnngs 50 40 30 30 0 0 10 10 0 0 4 6 8 10 1 14 Producton Cost 0 0 4 6 8 10 1 14 16 Promoton Cost Moton Pcture Industry Example Wth smplcty n mnd, suppose we ft three smple lnear regresson functons: ŷ ŷ ŷ aˆ bˆ x 1 1 aˆ bˆ x aˆ bˆ 3x 3 Key regresson results are: Varables Adjusted Parameter n the Model R R S e Estmates X 1 0.751 0.738 9.506 a=5.071, b 1 =5.57 X 0.779 0.766 8.970 a=4.33, b =3.761 X 3 0.99 0.60 15.960 a=35.111, b 3 =19.889 The model usng X accounts for 77.9% of the varaton n y, leavng approx. % unaccounted for. Moton Pcture Industry Example SUMMARY OUTPUT Regresson Statstcs Multple R 0.966636507 R Square 0.934386137 Adjusted R Square 0.96666859 Standard Error 5.0578595 Observatons 0 ANOVA df SS MS F Sgnfcance F Regresson 6113.641776 3056.80888 11.0457945 8.79959E-11 Resdual 17 49.3084 5.534495 Total 19 654.95 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 8.151744038 3.176305439.56643 0.0001068 1.450315984 14.8531709 Producton Cost 3.6748185 0.514344941 6.355104 7.58E-06.18073701 4.354669 Promoton Cost.367378471 0.3438166 6.885591916.64016E-06 1.641988553 3.09768388 19

Moton Pcture Industry Example Modelng earnngs usng producton and promoton costs yelds: ŷ 8.15 3.7x.37 1 x R = 0.9344, whch mples 93.44% of the varaton n earnngs can be explaned by prod. and prom. costs. s = 5.05, whch s sgnfcantly less than ether of the SLR models Moton Pcture Industry Example SUMMARY OUTPUT Regresson Statstcs Multple R 0.98315367 R Square 0.96671458 Adjusted R Square 0.960471044 Standard Error 3.689501338 Observatons 0 ANOVA df SS MS F Sgnfcance F Regresson 3 635.15178 108.383759 154.8867681 4.95768E-1 Resdual 16 17.7987 13.61401 Total 19 654.95 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 7.836190009.33338079 3.35899579 0.00399676.889645907 1.7873411 Producton Cost.8476964 0.3933955 7.5834473 1.91353E-06.015969814 3.679414714 Promoton Cost.7837363 0.53436865 8.989368476 1.18387E-07 1.74097533.815499395 Book 7.1660987 1.817963514 3.94184363 0.001166383 3.3118354 11.000049 Moton Pcture Industry Example Usng the full model: ŷ 7.84.85x 1.8x 7.17x 3 R ncreases to 96.67% and the std. error s reduced to 3.6895. Moton Pcture Industry Example Indcator varables revsted ŷ 7.84.85x 1.8x 7.17x 3 Note that x 3 takes on the values of 0 and 1. 0

Selectng the Model We want to dentfy the smplest model that adequately accounts for the systematc varaton n the dependent varable, y. Arbtrarly usng all of the ndependent varables may result n overfttng. Adjusted R Statstc As addtonal ndependent varables are added to a model: The R statstc can only ncrease. The Adjusted-R statstc can ncrease or decrease. R a SSE n 1 1 SSyy n k 1 Adjusted R R The R statstc can be artfcally nflated by addng any ndependent varable to the model. We can compare adjusted-r values as a heurstc to tell whether addng an addtonal ndependent varable really helps to mprove a regresson model. Moton Pcture Industry Example Key regresson results are: Varables Adjusted Parameter n the Model R R S e Estmates x 1 0.751 0.738 9.506 a=5.071, b 1 =5.57 x 1 & x 0.934 0.97 5.05 a=8.15, b 1 =3.67, b =.367 x 1, x &x 3 0.967 0.961 3.689 a=7.836, b 1 =.848, b =.78, b 3 =7.166 The model usng x 1, x, and x 3 appears to be best: Hghest adjusted-r and hghest R Lowest s (most precse predcton ntervals) Estmaton and Predcton Same as n SLR Lke SLR, dfference les n the error of estmaton and predcton errors In multple regresson, these standard errors are complex and beyond the scope of ths class Wll rely on MS Excel output 1

Concerns Parameter Estmablty nablty of the model to estmate parameters because data s concentrated n one area data must nclude at least one more level of x than the hghest order of the x-varable that s ncluded n the model Multcollnearty relatonshp between two or more ndependent varables varables contrbutng the same nformaton f two or more varables are hghly correlated, then we only need one n the model Extrapolaton (already dscussed n SLR) Correlated Errors measurements on the dependent varable are correlated tme seres analyss Polynomal Regresson Sometmes the relatonshp between a dependent and ndependent varable s not lnear. Sellng Prce $175 $150 $15 $100 $75 $50 0.900 1.00 1.500 1.800.100.400 Square Footage Ths graph suggests a quadratc relatonshp between square footage (X) and sellng prce (Y). Polynomal Regresson An approprate regresson functon n ths case mght be, ˆ ŷ aˆ bˆ x b or equvalently, ŷ where, aˆ bˆ x 1 1 x1 bˆ 1x1 x x 1 Sellng Prce Graph of Estmated Quadratc Regresson Functon $175 $150 $15 $100 $75 $50 0.900 1.00 1.500 1.800.100.400 Square Footage

Fttng a Thrd Order Polynomal Model We could also ft a thrd order polynomal model, ˆ 3 Ŷ a bˆ X bˆ X b 1 1 1 3X1 or equvalently, Ŷ ˆ ˆ a b X1 bx where, X X 1 b3x 3 X 1 3 3 X 1 ˆ Sellng Prce Graph of Estmated Thrd Order Polynomal Regresson Functon $175 $150 $15 $100 $75 $50 0.900 1.00 1.500 1.800.100.400 Square Footage Polynomal Regresson Overfttng When fttng polynomal models, care must be taken to avod overfttng. The adj.-r statstc can also be used for buldng/fttng polynomal regresson models. We can gauge the amount of overfttng by Valdatng the ft, or usng a tranng sample to buld the model and a valdaton sample to examne ts estmaton or predcton accuracy. 3