Inference in Simple Regression

Similar documents
Section 10 Regression with Stochastic Regressors

Regression with Stochastic Regressors

ENGI 4421 Probability & Statistics

Economics 130. Lecture 4 Simple Linear Regression Continued

This is natural first assumption, unless theory rejects it.

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

What regression does. x is called the regressor The linear form is a natural first assumption, unless theory rejects it. positive or negative.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

What regression does. = β + β x : The conditional mean of y given x is β + β x 1 2

Chapter 14 Simple Linear Regression

Section 14 Limited Dependent Variables

Chapter 11: Simple Linear Regression and Correlation

Basics of heteroskedasticity

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

Chapter 6 : Gibbs Free Energy

The Simple Linear Regression Model: Theory

Shell Stiffness for Diffe ent Modes

Statistics for Economics & Business

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

_J _J J J J J J J J _. 7 particles in the blue state; 3 particles in the red state: 720 configurations _J J J _J J J J J J J J _

STAT 3008 Applied Regression Analysis

x i1 =1 for all i (the constant ).

Chapter 3, Solution 1C.

Lecture 4 Hypothesis Testing

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Chapter 7. Systems 7.1 INTRODUCTION 7.2 MATHEMATICAL MODELING OF LIQUID LEVEL SYSTEMS. Steady State Flow. A. Bazoune

Statistics for Business and Economics

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Exercises H /OOA> f Wo AJoTHS l^»-l S. m^ttrt /A/ ?C,0&L6M5 INFERENCE FOR DISTRIBUTIONS OF CATEGORICAL DATA. tts^e&n tai-ns 5 2%-cas-hews^, 27%

Fall 2010 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. (n.b. for now, we do not require that k. vectors as a k 1 matrix: ( )

Wp/Lmin. Wn/Lmin 2.5V

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Section 3: Detailed Solutions of Word Problems Unit 1: Solving Word Problems by Modeling with Formulas

28. SIMPLE LINEAR REGRESSION III

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 13: Multiple Regression

Circuits Op-Amp. Interaction of Circuit Elements. Quick Check How does closing the switch affect V o and I o?

Comparison of Regression Lines

SIMULATION OF THREE PHASE THREE LEG TRANSFORMER BEHAVIOR UNDER DIFFERENT VOLTAGE SAG TYPES

18. SIMPLE LINEAR REGRESSION III

Professor Chris Murray. Midterm Exam

Inference in the Multiple-Regression

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Topic 7: Analysis of Variance

4DVAR, according to the name, is a four-dimensional variational method.

January Examinations 2015

Introduction to Electronic circuits.

, which yields. where z1. and z2

Properties of Least Squares

Basic Business Statistics, 10/e

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Negative Binomial Regression

V. Electrostatics Lecture 27a: Diffuse charge at electrodes

Conduction Heat Transfer

Topic- 11 The Analysis of Variance

Biostatistics 360 F&t Tests and Intervals in Regression 1

17 - LINEAR REGRESSION II

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Correlation and Regression

Kinetic Model Completeness

Chapter 15 Student Lecture Notes 15-1

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chem 204A, Fall 2004, Mid-term (II)

6. ELUTRIATION OF PARTICLES FROM FLUIDIZED BEDS

x = , so that calculated

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

CHAPTER 3 ANALYSIS OF KY BOOST CONVERTER

EE 204 Lecture 25 More Examples on Power Factor and the Reactive Power

Lucas Imperfect Information Model

Lecture 3 Stat102, Spring 2007

Scatter Plot x

Chapter 15 - Multiple Regression

Conservation of Energy

Problem Set 5 Solutions - McQuarrie Problems 3.20 MIT Dr. Anton Van Der Ven

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

CHM112 Lab Graphing with Excel Grading Rubric

Lecture 12. Heat Exchangers. Heat Exchangers Chee 318 1

Feedback Principle :-

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

a. (All your answers should be in the letter!

Design of Analog Integrated Circuits

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

β0 + β1xi. You are interested in estimating the unknown parameters β

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

β0 + β1xi and want to estimate the unknown

Physics 107 HOMEWORK ASSIGNMENT #20

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Experiment #3. Graphing with Excel

NUMERICAL DIFFERENTIATION

55:041 Electronic Circuits

Kernel Methods and SVMs Extension

Reproducing kernel Hilbert spaces. Nuno Vasconcelos ECE Department, UCSD

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Transcription:

Sectn 3 Inference n Smple Regressn Havng derved the prbablty dstrbutn f the OLS ceffcents under assumptns SR SR5, we are nw n a pstn t make nferental statements abut the ppulatn parameters: hypthess tests and cnfdence ntervals. (Cnfdence) Interval estmatrs What a cnfdence nterval means: The cnfdence lmts are randm varables, the parameter s nt. Under the assumptns we have made abut the mdel, 95% f the tme ur (randm) cnfdence nterval wll nclude the actual parameter value. Cnfdence nterval s a par f randm varables l and u such that Pr.95 r anther specfed cnfdence level. l Under ur assumptns, b ~, u Ths s true always f SR6 (nrmalty) s satsfed It s true asympttcally f SR6 s nt vald but the actual dstrbutn f e has fnte furth mments If we knw then we can cnvert b t a standard nrmal varable by subtractng ts epected value and dvdng by ts standard devatn: Z b / ~, Frm prpertes f standard nrmal, we knw that Z Wth a lttle algebra: b Pr.96.96.95 / Pr.96.96.95. Pr.96 / b.96 /.95 Pr b.96 / b.96 /.95. Ths s the 95% nterval estmate (usually called a cnfdence nterval) fr. ~ 3 ~

We can t use the frmula abve unless we knw, whch we nrmally dn t. If we replace by s, then b fllws a t dstrbutn wth degrees f freedm rather than the nrmal dstrbutn. b b t ~ t se.. b s / If t c s the 5% tw-taled crtcal value fr ths t dstrbutn, then Pr b t se.. b b t se.. b.95 c c Eplan hw t fnd the crtcal value, bth theretcally and n the tables. Stata (and sme ther packages) prnts ut these cnfdence lmts based n assumptns SR SR5. Hypthess tests abut sngle ceffcents The mst cmmn test n ecnmetrcs s the t-test f the hypthess that a sngle ceffcent equals zer. Ths test s prnted ut fr each regressn ceffcent n Stata and ther statstcal packages. Dependng n the assumptns f the mdel (and whether they are vald), the tstatstc may r may nt fllw Student s t dstrbutn. bk c General frm fr calculatng a t-statstc s t, where c s the hypthetcal value s.e.bk (usually zer) that we are testng aganst and s.e. s the standard errr f the ceffcent estmatr b k. Ths test statstc s useful because we knw ts dstrbutn under the null hypthess that k = c. Thus we can determne hw lkely r unlkely t s that we wuld bserve the current sample f the null hypthess s true. Ths allws us t cntrl the Type I errr at sgnfcance level. Usng the t-statstc t test H : k = c aganst the tw-sded alternatve H : k c te that hypthess t be tested s always epressed n terms f the actual ceffcent, nt the estmated ne. Use the frmula abve t calculate the t statstc. (Stata wll prnt ut b and ts standard errr, and als the t value crrespndng t c =.) If the abslute value f the calculated t value s greater than the crtcal value, then reject the null. D eample ncludng lkng up crtcal value. ~ 3 ~

Alternatvely, we can cmpute the prbablty (p) value asscated wth the test: the prbablty that an utcme at least ths ncnsstent wth the null hypthess wuld ccur f the null s ndeed true. act p Pr H b c b c act b c b c PrH, where the act refers t the actual se.. b se.. b act Pr t t. H bserved/calculated value If we knw the dstrbutn f the t statstc, then we can calculate the last prbablty frm tables. Under assumptns SR SR5, the t statstc wll be asympttcally nrmal. Wth SR6 t s t wth degrees f freedm n small samples. Stata calculates the p value asscated wth the null hypthess = usng the t dstrbutn Shw dagram crrespndng t HGL s Fgure 3. n page 3: Fr gven t, shw hw t calculate p value. On same dagram shw crtcal values fr test at gven level f sgnfcance, and hw t decde the result f the test te.96 as tw-taled 5% crtcal value fr nrmal dstrbutn. Then shw the symmetry: the p value s the smallest sgnfcance level at whch the null hypthess can be rejected. One-taled test such as H : = c, H : < c. (Or H : c) Same basc prcedure, but n ths case we cncentrate the entre rejectn regn n ne tal f the dstrbutn. We reject the null f and nly f Pr[t < t act ] < crtcal value (gnrng rght tal f dstrbutn) and fal t reject fr any pstve t value n matter hw large. Other drectn f H s > c: Fal t reject null fr any negatve value f t and reject when Pr[t < t act ] > crtcal value. Present sme eamples f regressns and practce wth tests f = and = ther values. Gd (multple regressn) eample wth lts f dfferent sgnfcance levels: reg gpnts rdr satv satm takng f freshman Can d just takng t get almst sgnfcant eample fr smple regressn Testng lnear cmbnatns f parameters What f we want an nterval estmatr r hypthess test fr the value f y when =? ~ 33 ~

Ths wuld be an estmatr fr +. The natural estmatr s b + b What s the dstrbutn f b + b? b + b s a lnear functn f b and b, s t s nrmal (r t) f they are s t s unbased (under assumptns) Eb b Eb Eb var b b var b, var b cv b, b We can apprmate these varances and cvarance by ther sample estmatrs, and use the result t calculate a t statstc. Can als d hypthess test f such lnear cmbnatns f ceffcents. Predctn n the smple regressn mdel One f the mst cmmn tasks fr whch we use ecnmetrcs s cndtnal predctn r frecastng. We want t answer the questn What wuld y be f were sme value? Ths s eactly the same prblem we dscussed abve n estmatng the dstrbutn f b + b, whch s the OLS predctn f y fr =. OLS predctn: ŷ b b Because E e =, we predct t t be zer. Mght nt d that f we had nfrmatn abut the errr term crrespndng t ur predctn. te that we are assumng t be gven. Secndary predctn prblem ccurs f we must als predct. Frecast errr (predctn errr) s f y ˆ y. f e bb b b e E f because b s unbased, s OLS predctr s unbased OLS predctr s BLUP based n BLUE OLS estmatr What s the varance f ŷ r, equvalently, the varance f f? f b b e var f var yˆ E y yˆ b b b b e var var cv, var. ~ 34 ~

~ 35 ~ Fr smple regressn under hmskedastcty, cv. b XX S ˆ var. y Predctn errr s smaller fr: Smaller errr varance Larger sample sze (thrugh bth secnd and thrd terms) Greater sample varatn n Observatns clser (X) t the mean Wth SR6 (nrmalty) r asympttcally under mre general assumptns, ~, var f f, because f s a lnear functn f nrmal varables wth mean.

We usually dn t knw, s we must replace t wth s. Ths makes the dstrbutn t rather than nrmal. Interval estmate fr ŷ s Pr y ˆ t.. ˆ ˆ.. ˆ c se y y y tc se y, where t c s the / crtcal value f the t dstrbutn. Measurng gdness f ft It s always f nterest t measure hw well ur regressn lne fts the data. There are several that are cmmnly reprted. Sum f Squares due t Errrs = SSE = e y y SST = Sum f Squares Ttal = y y. SSR = Sum f Squares due t Regressn = yˆ y ˆ ˆ, wth yˆ y b b. Warnng abut ntatn: sme bks use SSR fr sum f squared resduals and SSE t mean sum f squares eplaned. Fundamental regressn dentty: SST = SSR + SSE. Wrks due t the enfrced ndependence f ŷ and ê. See Append 4B. Standard errr f the estmate (regressn): Ths s ur estmate f the standard devatn f the errr term. n eˆ eˆ SSE SEE s s eˆ. Standard errr f regressn s ften (as n Stata) called rt mean squared errr r RMSE. Ceffcent f determnatn: R The R ceffcent measures the fractn f the varance n the dependent varable that s eplaned by the cvaratn wth the regressr. It has a range f (, ), wth R = meanng n relatnshp and R = meanng a perfect lnear ft. SSR SSE se ˆ R. SST SST s y R s apprmately the square f the sample crrelatn ceffcent between y and ŷ. Specfcatn ssues Scalng Des t matter hw we scale the and y varables? ~ 36 ~

If we add r subtract a cnstant frm etherr r y, all that s affected s the ntercept term b. Snce we are nt usually very nterested n the value f the ntercept, ths s usually meanngless. If we multply by a cnstant, the slpe estmate b wll be dvded by the same cnstant, as wll ts standard errr, leavng the t statstc unchanged. The estmated nterceptt s unchanged, as are thee resduals, SEE, and R. If we multply y by a cnstant, the slpe and ntercept estmates wll bth be multpled by the same cnstant, as wll ther standard errrs (leavng the t statstcs unchanged) and the SEE. ne f these transfrmatns has any effect n R. nlnear mdels We can easly replace ether r y wth pwers r lgs f the rgnal varables wthut cmplcatng the estmatn. What changes s the shape f the relatnshp between the levels f and y and the nterpretatn f the ceffcents. HGL Fg 4.5 n p. 4 and Table 4. n p. 43 ~ 37 ~

Lg-based mdels Many ecnmetrc mdels are specfed n lg term. Mst ecnmc varables are nn-negatve, s we dn t need t wrry abut negatve values. (Thugh many can be zer.) d ln d / = the percentage change nn, s the nterpretatn f ceffcents and effects s useful and easy. The g-lg mdel s a cnstant-elastcty specfcatn wth the ceffcent beng read drectly as an elastcty. Shape f lg functns s ften reasnable: ~ 38 ~

Shes away frm aes Mntnc wth dmnshng returns Lg f regressr nly ( lnear-lg mdel) y ln. e Change f % n changes ln by abut. and thus leads t abut a. unt abslute change n y. If ncreases by z%, ths means t s + z/ tmes as large, whch means that ts lg s ln + ln( + z/). If z s small (say, less that %) then the apprmatn s reasnable clse. Hwever, yu may want t d eact calculatns fr frmal wrk. y Partal effect n levels s, whch s mntncally ncreasng r decreasng (dependng n sgn f ) but slpe ges t zer as gets large. Lg f dependent varable nly ( lg-lnear mdel) ln y e e te that y e, s ths s clearly a dfferent errr term than when y s nt n lg terms. Change f z unts n changes ln y by z unts, s t changes y by abut z percent. The same apprmatn ssues apples here. The ncrease f unts n ln y means that y ncreases by a factr f e z, whch s apprmately + z fr small values f z. Fr larger values f z and fr mre frmal wrk, t s best t calculate the epnental drectly. y ln y y ln y Partal effect n levels s / y. ln y y y Alternatvely, e e y. Partal effect s ncreasng n abslute value as y ncreases. (te that y must always be pstve n ths mdel.) Lg f bth regressr and dependent varable ( lg-lg mdel ) ln y ln e. ln Als mples that y e e e e e v where e e, v e., The Cbb-Duglas functn takes ths frm (wth a multplcatve errr v, usually assumed t be lg-nrmally dstrbuted). Change f % n changes ln by abut., whch changes ln y by abut., whch changes y by abut %. (Bth f the apprmatn caveats abve apply here.) ~ 39 ~

Thus, s the pnt elastcty f y wth respect t. Ths makes lg-lg a ppular functn frm. y y ln y ln y Partal effect n levels s. y ln y ln y ln Alternatvely, ln e y e. Partal effect s cnstant n elastcty terms, but vares wth y and n level terms. Whch lg mdel t chse? Thery vs. let data decde? Thery may suggest that percentage changes are mre mprtant than abslute changes fr ne r bth varables. Incme s ften lgged f we thnk that a dublng f ncme frm $5, t $, wuld be asscated wth the same change n ther varables as a dublng frm $, t $, (rather than half as much). As suggested by the prevus eample, lggng a varable scales dwn etreme values. If mst f the sample varatn s between $, and $, (wth mean $5, and standard devatn $3,), but yu have a few values f $5, fr ncme, these are gng t be 5 standard devatns abve the mean n level terms but much less n lg terms. The lg f 5, s nly ln()=.3 unts larger than the lg f 5,. The standard devatn f the lg wuld prbably be n the range f.6 r s, s the hghly devant bservatns wuld be less than 4 standard devatns abve the mean nstead f 5. Snce we ften want ur varables t be nrmally dstrbuted, we mght try t decde whether the varable s mre lkely t be nrmally r lgnrmally dstrbuted. ~ 4 ~

Lg-nrmal dstrbutn te that the varus lg mdels are nt nested wth ne anther r wth We can use R t cmpare mdels f nly f the dependent varable s the same: Lnear mdel wth lnear-lg mdel B-C mdel nests lg and lnear terms fr bth dependent and ndependent varables n a nnlnear mdel. Can estmate B-C mdel and test hypthess that the relatnshp s lnear r lg.,f, B-C transfrmatn s B, ln, f =. We can d a nnlnear regressn f B(y, y ) n B(, ) and test the tw Predctn n lg and ther y-transfrmed mdelss the lnear r plynmal mdels, s t tests cannt dscrmnate between them. Lg-lnear mdel wth lg-lg mdel Ths s a cntnuus functn that equals f = and ln f =. values t see whether they are zer r ne t determne whether a lnear r lg specfcatn s preferred fr bth varables. If we run the regressn g y z e Ths prblem usually arses wh hen g y ln n y., hw wuld we predct y? ~ 4 ~

ln yb b E e b b. But We can predct lny by ln y E ln y ln y. E y E e e e The prblem s that even f E(e ) =, E(e e ). If e s nrmally dstrbuted wth varance e, then E e e. ln y In that case, we can predct y by yˆ c e e. Ths s a cnsstent predctn f the errr term s nrmal. In the nn-nrmal case, we can use a smple regressn t calculate the apprprate adjustment factr: ln y Run a regressn f y e, whch s a bvarate regressn wthut a cnstant term. Then adjust the predctns t get yˆ c ln y ˆe, whch, fr the sample bservatns, are just the predcted values frm the aulary regressn. Can t d a cnvenent nterval predctr because y ˆc s nt nrmal r t dstrbuted. s e ˆ Usng resduals All regressn sftware wll have a way t scatter-plt the actual and ftted values r the resduals aganst anther varable ( s ften mst useful). Dn t put resdual plt and actual/ftted plt n same dagram because f scalng. Resduals tell yu what yu are mssng n yur regressn: Functnal frm: f there s bvus curvature n the actual vs. ftted values, yu may need a nnlnear frm Heterskedastcty: f the varance f the resduals seems t be related t r anther varable, then yu may need t crrect fr t. Hw wuld yu tell ths frm resdual plt? Outlers: Are there specfc bservatns that are far frm the nrmal pattern? If s, they may ndcate that ne r mre bservatns d nt fllw the same mdel (Assumptn ). Or they may suggest an addtnal eplanatry varable that affected y n thse bservatns. Are resduals nrmally dstrbuted? If errr term s nrmal, then resduals shuld be. Jarque-Bera test ~ 4 ~

K JB S 6 4 3 Tests whether skewness and kurtss f varable match the zer, three epected n nrmal dstrbutn. There are ther tests as well. ~ 43 ~