Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Similar documents
Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Multiple Linear Regression Analysis

Probability and. Lecture 13: and Correlation

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Statistics: Unlocking the Power of Data Lock 5

ENGI 3423 Simple Linear Regression Page 12-01

Chapter 13 Student Lecture Notes 13-1

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Statistics MINITAB - Lab 5

Lecture Notes Types of economic variables

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Chapter 14 Logistic Regression Models

Simple Linear Regression

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Multiple Choice Test. Chapter Adequacy of Models for Regression

Lecture 8: Linear Regression

residual. (Note that usually in descriptions of regression analysis, upper-case

Objectives of Multiple Regression

Summary of the lecture in Biostatistics

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Linear Regression with One Regressor

: At least two means differ SST

Simple Linear Regression

ENGI 4421 Propagation of Error Page 8-01

Multiple linear regression


Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

ESS Line Fitting

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Simple Linear Regression - Scalar Form

Simple Linear Regression and Correlation.

Econometric Methods. Review of Estimation

Chapter Two. An Introduction to Regression ( )

Lecture 3. Sampling, sampling distributions, and parameter estimation

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

Special Instructions / Useful Data

ε. Therefore, the estimate

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Chapter 8. Inferences about More Than Two Population Central Values

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

r y Simple Linear Regression How To Study Relation Between Two Quantitative Variables? Scatter Plot Pearson s Sample Correlation Correlation

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Multivariate Transformation of Variables and Maximum Likelihood Estimation

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Logistic regression (continued)

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

Simple Linear Regression. How To Study Relation Between Two Quantitative Variables? Scatter Plot. Pearson s Sample Correlation.

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

STK3100 and STK4100 Autumn 2017

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Chapter 3 Multiple Linear Regression Model

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

Correlation and Simple Linear Regression

UNIVERSITY OF TORONTO AT SCARBOROUGH. Sample Exam STAC67. Duration - 3 hours

Functions of Random Variables

Linear Regression Siana Halim

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

STK3100 and STK4100 Autumn 2018

Simple Linear Regression Analysis

Linear Regression. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Linear Regression. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Linear Regression. Can height information be used to predict weight of an individual? How long should you wait till next eruption?

Class 13,14 June 17, 19, 2015

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

22 Nonparametric Methods.

Bayes (Naïve or not) Classifiers: Generative Approach

Uncertainty, Data, and Judgment

Introduction to F-testing in linear regression models

4. Standard Regression Model and Spatial Dependence Tests

Correlation and Regression Analysis

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Chapter 2 Simple Linear Regression

Chapter 11 The Analysis of Variance

Analysis of Variance with Weibull Data

Simulation Output Analysis

Transforming Numerical Methods Education for the STEM Undergraduate Torque (N-m)

Maximum Likelihood Estimation

CHAPTER 2. = y ˆ β x (.1022) So we can write

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Summarizing Bivariate Data. Correlation. Scatter Plot. Pearson s Sample Correlation. Summarizing Bivariate Data SBD - 1

Chapter 5 Properties of a Random Sample

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Fitting models to data.

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Transcription:

Example: Multple lear regresso 5000,00 4000,00 Tro Aders Moger 0.0.007 brthweght 3000,00 000,00 000,00 0,00 50,00 00,00 50,00 00,00 50,00 weght pouds Repetto: Smple lear regresso We defe a model Y = β0 + βx + ε where ε are depedet, ormally dstrbuted, wth equal varace σ Wsh to ft a le as close to the observed data (two ormally dstrbuted varables) as possble Example: Brth weght=β 0 +β *mother s weght Least squares regresso brthweght 5000,00 4000,00 3000,00 000,00 000,00 R Sq Lear = 0,035 0,00 50,00 00,00 50,00 00,00 50,00 weght pouds

How to compute the le ft wth the least squares method? How do you get ths aswer? Let (x, y ), (x, y ),...,(x, y ) deote the pots the plae. Fd a ad b so that y=a+bx ft the pots by mmzg Soluto: y) + ( a + bx y) + + ( a + bx y) = ( a + bx y ) S = ( a + bx L b = x y ( x )( y ) x y = ( x ) ( x ) x x xy y b x a = = y bx where x = x, y = y ad all sums are doe for,...,. Dfferetate S wth respect to a og b, ad set the result to 0 S = ( a + bx y ) = 0 a S = ( a + bx y ) x = 0 b We get: a + b( x ) y = 0 ( x ) + b( x ) x y = 0 a Ths s two equatos wth two ukows, ad the soluto of these gve the aswer. How close are the data to the ftted le? R Defe SSE: Error sum of squares ( y a+ bx) SSR: Regresso sum of squares ( a+ bx y) SST: Total sum of squares ( y y) We ca show that SST = SSR + SSE SSR SSE Defe R = corr( x, y) R SST = SST = s the coeffcet of determato What s the logc behd R? y y = a+ bx ˆ SST = y y x x ε = SSE = y yˆ SSR = yˆ y

Example: Regresso of brth weght wth mother s weght as depedet varable Summary b Adusted Std. Error of R R Square R Square the Estmate,86 a,035,09 78,470 a. Predctors: (Costat), weght pouds b. Depedet Varable: brthweght (Costat) weght pouds Regresso Resdual Total a. Depedet Varable: brthweght ANOVA b Sum of Squares df Mea Square F Sg. 344888 344888,30 6,686,00 a 964687 87 5587,574 9997053 88 a. Predctors: (Costat), weght pouds b. Depedet Varable: brthweght Coeffcets a Ustadardzed Stadardzed Coeffcets Coeffcets 95% Cofdece Iterval for B B Std. Error Beta t Sg. Lower Boud Upper Boud 369,67 8,43 0,374,000 99,040 80,304 4,49,73,86,586,00,050 7,809 Iterpretato: Have ftted the le Brth weght=369.67+4.49*mother s weght If mother s weght creases by 0 pouds, what s the predcted mpact o fat s brth weght? 4.49*0=89 grams What s the predcted brth weght of a fat wth a 50 poud mother? 369.67+4.49*50=3034 grams But how to aswer questos lke: Gve that a postve slope (b) has bee estmated: Does t gve a reproducble dcato that there s a postve tred, or s t a result of radom varato? What s a cofdece terval for the estmated slope? What s the predcto, wth ucertaty, at a ew x value? Cofdece tervals for smple regresso I a smple regresso model, β 0 a estmates b estmates β ˆ σ = SSE /( ) Also, ( b β )/ Sb ~ t ˆ σ where Sb = ( ) sx of b estmates So a cofdece terval for by b± t, α /Sb σ estmates varace β s gve 3

Hypothess testg for smple regresso Choose hypotheses: H 0 : β = 0 H: β 0 Test statstc: b/ Sb ~ t ReectH 0 f b/ Sb < t, α / or b/ Sb > t, α / For the example: Test H 0 : β mother s weght =0 o 5%-sg. level Get 4.49/.73=.586. Look up.5 ad 97.5-percetles t-dstrbuto wth 87 degrees of freedom (approx. ormal dst.) Fd p-value<0.05, reect H 0 Predcto from a smple regresso model A regresso model ca be used to predct the respose at a ew value x + The ucertaty ths predcto comes from two sources: The ucertaty the regresso le The ucertaty of ay respose, gve the regresso le A cofdece terval for the predcto: ( x x ) + ˆ +, α /σ ( x x ) a+ bx ± t + + Example: The cofdece terval of the predcted brth weght of a fat wth a 50 poud mother Foud that the predcted weght was 3034 grams The cofdece terval for the predcto s: 369.67+4.43*50±t 87,0.05 * 78.4* (+/89+(50-9.8) /(75798.5)) =.96 Not gve drectly the spss output Whch becomes (60.9, 4447.) Calculated as: MSE/S b =5587/.7 More tha oe depedet varable: Multple regresso Assume we have data of the type (x, x, x 3, y ), (x, x, x 3, y ),... We wat to expla y from the x-values by fttg the followg model: y = a + bx + cx + dx3 Just lke before, oe ca produce formulas for a,b,c,d mmzg the sum of the squares of the errors. x,x,x 3 ca be trasformatos of dfferet varables, or trasformatos of the same varable 4

0,35 0,30 0,5 0,0 0,5 0,0 0,05 69,00 69,50 70,00 70,50 7,00 7,50 0,35 0,30 0,5 0,0 0,5 0,0 0,05 0,35 0,30 0,5 0,0 0,5 0,0 0,05 3,00 3,0 3,40 3,60 3,80 0,00 5,00 0,00 5,00 0,00 5,00 30,00 Multple regresso model y x x x = β0 + β + β +... + β + ε ε The errors are depedet radom (ormal) varables wth expectato zero ad varace σ The explaatory varables x, x,, x caot be learly related New example: Traffc deaths 976 (from fle crash o textbook CD) Wat to fd f there s ay relatoshp betwee hghway death rate (deaths per 000 per state) the U.S. ad the followg varables: Average car age ( moths) Average car weght ( 000 pouds) Percetage lght trucks Percetage mported cars All data are per state Frst: Scatter plots: Uvarate effects (oe depedet varable at a tme!): Summary b Adusted Std. Error of R R Square R Square the Estmate,49 a,4,6,0506 a. Predctors: (Costat), carage Deaths per 000=a+b*car age ( moths) deaths deaths b. Depedet Varable: deaths Coeffcets a (Costat) carage Ustadardzed Coeffcets Stadardzed Coeffcets 95% Cofdece Iterval for B B Std. Error Beta t Sg. Lower Boud Upper Boud 4,56,34 3,98,000,33 6,800 -,06,06 -,49-3,834,000 -,094 -,09 a. Depedet Varable: deaths 0,35 carage vehwt Hece: If all else s equal, f average car age creases by oe moth, you get 0.06 fewer deaths per 000 habtats; crease age by moths, you get *0.06=0.74 fewer deaths per 000 habtats 0,30 0,5 Summary b Adusted Std. Error of R R Square R Square the Estmate,8 a,079,059,05740 Deaths per 000=a+b*car weght ( pouds) deaths 0,0 deaths a. Predctors: (Costat), vehwt b. Depedet Varable: deaths Coeffcets a 0,5 Ustadardzed Coeffcets Stadardzed Coeffcets 95% Cofdece Iterval for B 0,0 0,05 5,00 0,00 5,00 0,00 5,00 30,00 35,00 lghttrks mpcars (Costat) vehwt a. Depedet Varable: deaths B Std. Error Beta t Sg. Lower Boud Upper Boud -,7, -,7,6 -,76,74,4,06,8,983,053 -,00,49 5

Uvarate effects cot d (oe depedet varable at a tme!): Summary b Adusted Std. Error of R R Square R Square the Estmate,76 a,5,50,0478 a. Predctors: (Costat), lghttrks b. Depedet Varable: deaths Summary b (Costat) lghttrks a. Depedet Varable: deaths Adusted Std. Error of R R Square R Square the Estmate,308 a,095,075,05690 a. Predctors: (Costat), mpcars b. Depedet Varable: deaths (Costat) mpcars Coeffcets a Ustadardzed Stadardzed Coeffcets Coeffcets 95% Cofdece Iterval for B B Std. Error Beta t Sg. Lower Boud Upper Boud,046,08,478,07,009,083 a. Depedet Varable: deaths Hece: Icrease prop. lght trucks by 0 meas 0*0.007=0.4 more deaths per 000 habtats,007,00,76 6,947,000,005,00 Predcted umber of deaths per 000 f prop. Imported cars s 0%: 0.06-0.004*0=0.7 Coeffcets a Ustadardzed Stadardzed Coeffcets Coeffcets 95% Cofdece Iterval for B B Std. Error Beta t Sg. Lower Boud Upper Boud,06,00 0,46,000,66,46 -,004,00 -,308 -,93,033 -,007,000 Buldg a multple regresso model: Forward regresso: Try all depedet varables, oe at a tme, keep the varable wth the lowest p-value Repeat step, wth the depedet varable from the frst roud ow cluded the model Repeat utl o more varables ca be added to the model (o more sgfcat varables) Backward regresso: Iclude all depedet varables the model, remove the varable wth the hghest p- value Cotue utl oly sgfcat varables are left However: These methods are ot always correct to use practce! For the traffc deaths, ed up wth: Deaths per 000=.7-0.037*car age +0.006*perc. lght trucks Check of assumptos: Hstogram Normal P-P Plot of Regresso Stadardzed Resdual Depedet Varable: deaths Depedet Varable: deaths,0 Summary b 4 Adusted Std. Error of R R Square R Square the Estmate,768 a,590,57,0387 a. Predctors: (Costat), lghttrks, carage b. Depedet Varable: deaths (Costat) carage lghttrks a. Depedet Varable: deaths Coeffcets a Ustadardzed Stadardzed Coeffcets Coeffcets 95% Cofdece Iterval for B B Std. Error Beta t Sg. Lower Boud Upper Boud,668,895,98,005,865 4,470 -,037,03 -,95 -,930,005 -,063 -,0,006,00,6 6,8,000,004,009 Cocluso: Dd a multple lear regresso o traffc deaths, wth car age, car weght, prop. lght trucks ad prop. mported cars as depedet varables. Car age ( moths, β=-0.037, 95% CI=(-0.063, -0.0)) ad prop. lght trucks (β=0.006, 95% CI=(0.004, 0.009)) were sgfcat o 5%-level Frequecy 0 8 6 4 0-3 - - 0 3 4 Regresso Stadardzed Resdual Mea =,3E-7 Std. Dev. = 0,978 N = 48 Expected Cum Prob 0,8 0,6 0,4 0, 0,0 0,0 0, 0,4 0,6 0,8,0 Observed Cum Prob 6

Check of assumptos cot d: Regresso Stadardzed Resdual Scatterplot Depedet Varable: deaths 4 3 0 - - -3 - - 0 Regresso Stadardzed Predcted Value Least squares estmato y x x x = β0 + β + β +... + βk K + ε The least squares estmates of β0, β,..., βk are the values b, b,, b K mmzg (... ) SSE = b + b x + b x + + b x y 0 K K They ca be computed wth smlar but more complex formulas as wth smple regresso Explaatory power Defg yˆ = b0 + bx + bx +... + bkxk ( ) SST = y y ( ˆ ) SSE = y y SSR = ( yˆ y ) We get as before We defe SSR SSE R = SST = SST We also get that R = Corr( y, yˆ ) SST = SSR + SSE Coeffcet of determato Adusted coeffcet of determato Addg more depedet varables wll geerally crease SSR ad decrease SSE Thus the coeffcet of determato wll ted to dcate that models wth may varables always ft better. To avod ths effect, the adusted coeffcet of determato may be used: SSE /( K ) R = SST /( ) 7

Drawg ferece about the model parameters Smlar to smple regresso, we get that the followg statstc has a t dstrbuto wth -K- degrees of freedom: b β tb = sb where b s the least squares estmate for ad s b s ts estmated stadard devato s b s computed from SSE ad the correlato betwee depedet varables Cofdece tervals ad hypothess tests A cofdece terval for b ± t s K, α / b becomes Testg the hypothess H0 : β = 0vs H : β 0 b Reect f t K, α / or β < > t K, α / b sb s b Testg sets of parameters We ca also test the ull hypothess that a specfc set of the betas are smultaeously zero. The alteratve hypothess s that at least oe beta the set s ozero. The test statstc has a F dstrbuto, ad s computed by comparg the SSE the full model, ad the SSE whe settg the parameters the set to zero. Makg predctos from the model As smple regresso, we ca use the estmated coeffcets to make predctos As smple regresso, the ucertaty the predctos has two sources: The varace aroud the regresso estmate The varace of the estmated regresso model 8

What f the relatoshp s olear? Most commo thg to do s to categorze the depedet varable E.g. categorze age to 0-0 yrs, -40 yrs, 4-60 yrs ad so o Choose a basele category, ad estmate a slope b for each of the other categores Does ot matter what relatoshp you have betwee the outcome ad the depedet varable Wll talk more about ths ext tme Other optos f the relatoshp s o-lear: Trasformed varables The relatoshp betwee varables may ot be lear Example: The atural model may be bx y = ae We wat to fd a ad b bx so that the le y = ae approxmates the pots as well as possble 0.05 0.0 0.5 0.0 5 0 5 30 bx Whe y = ae the log( y ) = log( a) + bx Use stadard formulas o the pars (x,log(y )), (x, log(y )),..., (x, log(y )) We get estmates for log(a) ad b, ad thus a ad b Example (cot.) 0.05 0.0 0.5 0.0 Aother example of trasformed varables Aother atural model may be b y = ax We get that log( y) = log( a) + b log( x) Use stadard formulas o the pars (log(x ), log(y )), (log(x ), log(y )),...,(log(x ),log(y )) 0.008 0.00 0.0 0.04 0.06 5 0 5 30 0 4 6 8 Note: I ths model, the curve goes through (0,0) 9

Assume data (x,y ),..., (x,y ) seem to follow a thrd degree polyomal We use multvarate regresso o (x, x, x 3, y ), (x, x, x 3, y ),... We get estmated a,b,c,d, a thrd degree polyomal curve y = a + bx + cx + A thrd example: dx 3-3.0 -.5 -.0 -.5 -.0-0.5 0.0 0.0 0.5.0.5.0.5 3.0 Dog a regresso aalyss Plot the data frst, to vestgate whether there s a atural relatoshp Lear or trasformed model? Are there outlers whch wll uduly affect the result? Ft a model. Dfferet models wth same umber of parameters may be compared wth R Check the assumptos! Make tests / cofdece tervals for parameters A lot of practce s eeded! Cocluso ad further optos Regresso versus correlato: Ca clude more depedet varables regresso Gets a more detaled pcture o the effect a depedet varable has o the depedet varable What f the depedet varable oly has two possble values? Logstc regresso Smlar to lear regresso But the terpretatos of the β s are dfferet: They are terpreted as odds-ratos stead of the slope of a le 0