Inference in the Multiple-Regression

Similar documents
Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

A Matrix Representation of Panel Data

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

, which yields. where z1. and z2

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Functional Form and Nonlinearities

Section 11 Simultaneous Equations

Distributions, spatial statistics and a Bayesian perspective

Hypothesis Tests for One Population Mean

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

IN a recent article, Geary [1972] discussed the merit of taking first differences

NUMBERS, MATHEMATICS AND EQUATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

Simple Linear Regression (single variable)

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms

Kinetic Model Completeness

We can see from the graph above that the intersection is, i.e., [ ).

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

CHM112 Lab Graphing with Excel Grading Rubric

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th,

Experiment #3. Graphing with Excel

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Pattern Recognition 2014 Support Vector Machines

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

Comparing Several Means: ANOVA. Group Means and Grand Mean

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Section 11 Simultaneous Equations

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Function notation & composite functions Factoring Dividing polynomials Remainder theorem & factor property

INSTRUMENTAL VARIABLES

Dead-beat controller design

AP Statistics Notes Unit Two: The Normal Distributions

Lead/Lag Compensator Frequency Domain Properties and Design Methods

B. Definition of an exponential

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Chapter 3: Cluster Analysis

Differentiation Applications 1: Related Rates

Preparation work for A2 Mathematics [2017]

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Thermodynamics Partial Outline of Topics

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Thermodynamics and Equilibrium

What is Statistical Learning?

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

Part 3 Introduction to statistical classification techniques

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

Introduction to Regression

Determining the Accuracy of Modal Parameter Estimation Methods

Five Whys How To Do It Better

READING STATECHART DIAGRAMS

5 th grade Common Core Standards

Chapters 29 and 35 Thermochemistry and Chemical Thermodynamics

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling

Section 13 Advanced Topics

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

Computational modeling techniques

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?

1 The limitations of Hartree Fock approximation

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Group Analysis: Hands-On

SticiGui Chapter 4: Measures of Location and Spread Philip Stark (2013)

Trigonometric Ratios Unit 5 Tentative TEST date

End of Course Algebra I ~ Practice Test #2

ENSC Discrete Time Systems. Project Outline. Semester

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

Preparation work for A2 Mathematics [2018]

Standard Title: Frequency Response and Frequency Bias Setting. Andrew Dressel Holly Hawkins Maureen Long Scott Miller

UNIV1"'RSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C. CUMULATIVE SUM CONTROL CHARTS FOR THE FOLDED NORMAL DISTRIBUTION

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

Dataflow Analysis and Abstract Interpretation

CHAPTER 8 ANALYSIS OF DESIGNED EXPERIMENTS

Smoothing, penalized least squares and splines

Lecture 7: Damped and Driven Oscillations

Tree Structured Classifier

Section 10 Simultaneous Equations

Homology groups of disks with holes

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Drought damaged area

CONTENTS OF PART IV NOTES FOR SUMMER STATISTICS INSTITUTE COURSE COMMON MISTAKES IN STATISTICS SPOTTING THEM AND AVOIDING THEM

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

Lyapunov Stability Stability of Equilibrium Points

**DO NOT ONLY RELY ON THIS STUDY GUIDE!!!**

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

IAML: Support Vector Machines

Transcription:

Sectin 5 Mdel Inference in the Multiple-Regressin Kinds f hypthesis tests in a multiple regressin There are several distinct kinds f hypthesis tests we can run in a multiple regressin. Suppse that amng the regressrs in a Reed Ecn 01 grade regressin are variables fr SAT-math and SAT-verbal: gi 1SATMi 3SATVi ei We might want t knw if math SAT matters fr Ecn 01: H0: 0 Wuld it make sense fr this t be a ne-tailed r a tw-tailed test? Is it plausible that a higher math SAT wuld lwer Ecn 01 perfrmance? Prbably a ne-tailed test makes mre sense. We might want t knw if either SAT matters. This is a int test f tw simultaneus hyptheses: H0: 0, 3 0. The alternative hypthesis is that ne r bth parts f the null hypthesis fails t hld. If = 0 but 3 0, then the null is false and we want t reect it. The int test is nt the same as separate individual tests n the tw cefficients. In general, the tw variables are crrelated, which means that their cefficient estimatrs are crrelated. That means that eliminating ne f the variables frm the equatin affects the significance f the ther. The int test tests whether we can delete bth variables at nce, rather than testing whether we can delete ne variable given that the ther is in (r ut f) the equatin. A cmmn example is the situatin where the tw variables are highly and psitively crrelated (imperfect but high multicllinearity). In this case, OLS may be able t discern that the tw variables are cllectively very imprtant, but nt which variable it is that is imprtant. Thus, individual tests f = 0 may nt be reected. (OLS cannt tell fr sure that either cefficient is nn-zer.) Hwever, the int test wuld be strngly reected. Here, the strng psitive crrelatin between the variables leads t a strng negative crrelatin between the cefficient estimatrs. Assuming that the int effect is psitive, then leaving ne cefficient ut (setting it t zer and therefre decreasing it) increases the value f the ther. In the case f int hyptheses, we always use tw-tailed tests. ~ 50 ~

We might als want t knw if the effect f the tw scres is the same. The null hypthesis in this case is H0: 3 against a tw-tailed alternative. te that if this null hypthesis is true, then the mdel can be written as i 1 i i i g SATM SATV e and we can use the SAT cmpsite rather than the tw separate scres, saving ne degree f freedm. Hypthesis tests n a single cefficient Hypthesis testing fr a single cefficient is identical t the bivariate regressin case: b c act t is the test statistic s.. e b It is asympttically (0, 1) under assumptins MR1 MR5. It is distributed as t with K degrees f freedm if e is nrmal. Tw-tailed test: reect the null f c if p-value = ( t act ) <, the chsen level f significance (using asympttic nrmal distributin) r reect if t act > t / (using small-sample distributin under nrmality assumptin). te that Stata uses t distributin t calculate p values, nt nrmal. Which is better? Bth are flawed in small samples rmal is ff because sample is nt large enugh fr cnvergence t have ccurred. t is ff because if true distributin f e is nt nrmal, then dn t knw the small-sample distributin (t nrmal as sample gets large) Single-cefficient cnfidence intervals are als identical t the bivariate case: Using the nrmal asympttic (nrmal) distributin, 1 1 Pr b s. e. b, b s. e. b. If we use the t distributin, all we change is drawing the critical value frm the t distributin rather than the nrmal. Again, Stata uses classical standard errrs and the t distributin by default. Simple hyptheses invlving multiple cefficients Suppse that we want t test the hypthesis = 3, r 3 = 0. We can use a t test fr this. The estimatr f 3 is b b 3, which has variance f b b b b b b var var var cv,. 3 3 3 ~ 51 ~

The standard errr f b b 3 is the square rt f the estimated variance, which can be calculated frm the estimated cvariance matrix f the cefficient vectr. b b3 0 The test statistic is t. se.. b b 3 If has the usual distributins, either t K r (asympttically) standard nrmal. Testing int hyptheses It is ften useful t test int hyptheses tgether. This differs frm independent tests f the cefficients. An example f this is the int test that math and verbal SAT scres have n effect n Ecn 01 grades against the alternative that ne r bth scres has an effect. Sme new prbability distributins. Tests f int hyptheses have test statistics that are distributed accrding t either the F r distributins. These tests are ften called Wald tests and may be quted either as F r as statistics. (The F cnverges t a asympttically, s the is mre ften used fr asympttic cases and the F under the right assumptins fr small samples.) Just as the t distributin varies with the number f degrees f freedm: t K, the F distributin has tw degree f freedm parameters, ne the number f restrictins being tested (J) and ne the number f degrees f freedm in the unrestricted mdel ( K). The frmer is ften called the numeratr degrees f freedm and the latter the denminatr degrees f freedm fr reasns we shall see sn. When there is nly ne numeratr degree f freedm, we are testing nly a single hypthesis and it seems like this shuld be equivalent t the usual t test. Indeed, if a randm variable t fllws the t K distributin, then its square t fllws the F (1, K) distributin. Since squaring the t statistic bliterates its sign, we lse the ptin f the netailed test when using the F distributin. Similarly, if z fllws a standard nrmal distributin, then z fllws a distributin with ne degree f freedm. Finally, as the number f denminatr degrees f freedm ges t infinity, if a randm variable F fllws the F (J, K) distributin, then JF cnverges in distributin t a with J degrees f freedm. Bth the F and distributins assign psitive prbability nly t psitive values. (Bth invlve squared values.) Bth are humped with lng tails n the right, which is where ur reectin regin lies. The mean f the F distributin is always 1. The mean f the distributin is J, the number f degrees f freedm. General case in matrix ntatin ~ 5 ~

Suppse that there are J linear restrictins in the int null hypthesis. These can be written as a system f linear equatins R r, where R is a J K matrix and r is a J 1 vectr. Each restrictin is expressed in ne rw f this system f equatins. Fr example, the tw restrictins = 0 and 3 = 0 wuld be expressed in this general matrix ntatin as 1 0 1 0 0 0 3 0. 0 0 1 0 04 0 K 1 ˆ b, with ˆ b equal t the J estimated cvariance matrix f the cefficient vectr. Under the OLS assumptins MR1-MR6, this is distributed as an F (J, ). Multiplying the test statistic by J (eliminating the fractin in frnt) gives a variable that is asympttically distributed as J, s the Wald test can be dne either way. 1 The test statistic is F Rb r R R Rb r If the restrictins implied by the null hypthesis are perfectly cnsistent with the data, then the mdel fits equally well with and withut the restrictins, Rb r 0 hlds exactly, and the F statistic is zer. This, bviusly, implies acceptance f the null. We reect the null when the (always psitive) F statistic is larger than the critical value. The Stata test cmmand gives yu a p value, which is the smallest significance level at which yu can reect the null. The same reectin cnditins apply if the distributin is used: reect if the test statistic exceeds the critical value (r if the p value is less than the level f significance). Alternative calculatin f F under classical assumptins If the classical hmskedastic-errr assumptin hlds, then we can calculate the F statistic by anther equivalent frmula that has intuitive appeal. T d this, we run the regressin with and withut the restrictins (fr example, leaving ut variables whse cefficients are zer under the restrictins in the restricted regressin). Then we calculate F as SSER SSEU / J F SSE / K RU This shws why we think f J as numeratr degrees f freedm and ( K) as the denminatr degrees f freedm. ~ 53 ~

The numeratr in the numeratr is the difference between the SSE when the restrictins are impsed and the SSE when the equatin is unrestricted. The numeratr is always nn-negative because the unrestricted mdel always fits at least as well as the restricted ne. This difference is large if the restrictins make a big difference and small if they dn t. Thus, ther things equal, we will have a larger F statistic if the equatin fits much less well when the restrictins are impsed. This F statistic (which is the same as the ne frm the matrix frmula as lng as = s XX 1 ˆ b ) fllws the F (J, K) distributin under classical assumptins. By default, the test cmmand in Stata uses the classical cvariance matrix and in either case uses the F (J, K) distributin rather than the F (J, ) r the J t cmpute the p value. Regressin F statistic A cmmn int significance test is the test that all cefficients except the intercept are zer: H0: 3 K 0 This is the regressin F statistic and it printed ut by many regressin packages (including Stata). In bivariate regressin, this is the square f the t statistic n the slpe cefficient. If yu can t reect the null hypthesis that all f yur regressrs have zer effect, then yu prbably have a pretty weak regressin! Simple hyptheses invlving multiple cefficients by alternative methds The matrix frmula R = r clearly includes the pssibility f: Single rather than multiple restrictins, and Restrictins invlving mre than ne cefficient. Fr example, t test H0: 3, we culd use 0 1 1 1 0 3 0. K This is hw Stata des such tests and is a perfectly valid way f ding them. An alternative way t test such a simple linear hypthesis is t transfrm the mdel int ne in which the test f interest is a zer-test f a single cefficient, which will then be printed ut by Stata directly. ~ 54 ~

Fr the SAT example, the restricted case is ne in which nly the sum (cmpsite) f the SAT scres matters. Let SATC SATM + SATV. Then the mdel is g SATM SATV e i 1 i 3 i i SATM SATV SATV e 1 i i 3 i i 1 SATCi 3 SATVi ei. Thus, we can regress g i n SATC, and SATV and test the hypthesis that the cefficient n SATV equals zer. This null hypthesis is H 0 : 3 = 0, which is equivalent t 3 =. This alternative methd gives us a t statistic that is exactly the square rt f the F statistic that we get by the matrix methd, and shuld have exactly the same test result. We can always refrmulate the mdel in a way that allws us t d simple tests f linear cmbinatins f cefficients this way. (This allws us t use the standard t test printed ut by Stata rather than using the test cmmand.) Again, we can use either the classical cvariance matrix r the rbust ne. Stata will use the classical ne unless the rbust ptin is specified. This methd can be used t calculate restricted least-squares estimates that impse the chsen restrictins. Sme χ alternative tests There are several tests that are ften used as alternatives t the F test, especially fr extended applicatins that are nt LS. Smetimes these are mre cnvenient t calculate; smetimes they are mre apprpriate given the assumptins f the mdel. Lagrange multiplier test The Lagrange multiplier test is ne that can be easier t cmpute than the F test. It des nt require the estimatin f the cmplete unrestricted mdel, s it s useful in cases where the unrestricted mdel is very large r difficult t estimate. Recall that the effects f any mitted variables will be absrbed int the residual (r int the effects f crrelated included regressrs). Thus it makes sense t test whether an mitted variable shuld be added by asking whether it is crrelated with the residual f the regressin frm which it has been mitted. Suppse that we have K regressrs, f which we want t test whether the last J cefficients are intly zer: y X X X X e i 1 i, KJ i, KJ KJ1 i, KJ1 K i, K i H0 KJ1 KJ K : 0. ~ 55 ~

Fr the LM test, we regress y n the first K J regressrs, then regress the residuals frm that regressin n the last J regressrs. R frm the latter regressin is asympttically distributed as a χ statistic with J degrees f freedm. Likelihd-rati test In maximum-likelihd estimatin, the likelihd-rati test is the predminant test used. If L u is the maximized value f the likelihd functin when there are n restrictins and L r is the maximized value when the restrictins are impsed, lnl lnl is asympttically distributed as a χ statistic with J degrees then f freedm. u r Mst maximum-likelihd based prcedures (such as lgit, prbit, etc.) reprt the likelihd functin in the utput, s cmputing the LR test is very easy: ust read the numbers ff the restricted and unrestricted estimatin utputs and multiply the difference by tw. Multivariate cnfidence sets Multivariate cnfidence sets are the multivariate equivalent f cnfidence intervals fr cefficients. Fr tw variables, they are generally ellipse-shaped. As with cnfidence intervals, if the cnfidence set f tw variables excludes the rigin, we reect the int null hypthesis that the tw cefficients are intly zer. Mrever, we reect the int null hypthesis that the tw cefficients equal any pint in the space the is utside the cnfidence set. There desn t seem t be a way f ding these in Stata. Gdness f fit Standard errr f the regressin is similar t bivariate case, but with K degrees f freedm. There are pieces f infrmatin in the dataset. We use K f them t minimally define the regressin functin (estimate the K cefficients). There are K degrees f freedm left. 1 SSE SER s ˆ ˆ ˆ e se ei. K i 1 K R is defined the same way: the share f variance in y that is explained by the set f explanatry variables: ~ 56 ~

R yˆ ˆ i y yi yi SSR SSE i1 i1 SST SST yi y yi y i1 i1 1 1. Hwever, adding a new regressr t the equatin always imprves R (unless it is ttally uncrrelated with the previus residuals), s we wuld expect an equatin with 10 regressrs t have a higher R than ne with nly. T crrect fr this, we ften use an adusted R that crrects fr the number f degrees f freedm: 1 y ˆ i yi 1 SSE K i1 se ˆ R 1 1 1. K SST 1 s y yi y 1 Three prperties f R : i1 R < R whenever K > 1. Adding a regressr generally decreases SSE, but als increases K, s the effect n R is ambiguus. Chsing a regressin t maximize R is nt recmmended, but it s better than maximizing R. R can be negative if SSE is clse t SST, because 1 K > 1. Sme specificatin issues In practice, we never knw the exact specificatin f the mdel: what variables shuld be included and what functinal frm shuld be used. Thus, we almst always end up trying multiple alternative mdels and chsing amng them based n the results. Specificatin search is very dangerus! If yu try 0 independent variables that are ttally uncrrelated with ne anther and with the dependent variable, n average ne (5%) will have a statistically significant t statistic. The maximum f several candidate t statistics des nt fllw the t r nrmal distributin. If yu searched five variables and fund ne that had an apparently significant t, yu cannt cnclude that it truly has an effect. This prcess is called data mining r specificatin searching. Thugh we all d it, it is very dangerus and incnsistent with a basic assumptin f ecnmetrics, which is that we knw the mdel specificatin befre we apprach the data. We shall have mre t say abut this later in the curse. Interpreting R and R Adding any variable t the regressin that has a nn-zer estimated cefficient increases R. ~ 57 ~

Adding any variable t the regressin that has a t statistic greater than ne in abslute value increases R. Given that the cnventinal levels f significance suggest critical values much bigger than ne, adpting a max R criterin wuld lead us t keep many regressins fr which we can t reect the null hypthesis that their effect is zer. R tells us nthing abut causality; it is strictly a crrelatin-based statistic. One cannt infer frm a high R that there are n mitted variables r that the regressin is a gd ne. One cannt infer frm a lw R that ne has a pr regressin r that ne has mitted relevant variables. Including irrelevant variables vs. mitting relevant nes If we include an irrelevant variable that desn t need t be in the regressin, the expected value f its cefficient is zer. In this case, ur regressin estimatr is inefficient because we are spending a degree f freedm n estimating an unnecessary parameter. Hwever, the estimatrs f the ther cefficients are still unbiased and cnsistent. If we mit a variable that belngs in the regressin, the estimatrs fr the cefficients f any variables crrelated with the mitted variable are biased and incnsistent. This asymmetry suggests erring n the side f including irrelevant variables rather than mitting imprtant nes, especially if the sample is large enugh that degrees f freedm are nt scarce. Infrmatin criteria These are statistics measuring the amunt f infrmatin captured in a set f regressrs. Tw are cmmnly used: Akaike infrmatin criterin AIC ln SSE K Schwartz criterin (Bayesian infrmatin criterin) SSE K ln SC ln In bth cases, we chse a regressin (amng nested sets) that minimizes the criterin. Bth give a penalty t higher K given and SSE. (Schwartz mre s.) RESET test One handy test that can indicate misspecificatin (especially nnlinearities amng the variables in the regressin) is the RESET test. ~ 58 ~

T use the RESET test, first run the linear regressin, then re-run the regressin with squares (and perhaps cubes) f the predicted values frm the first regressin and test the added term(s). Pwers f the predicted value will cntain pwers and crss-prducts f the x variables, s it may be an easy way f testing whether higher pwers f sme f the x variables belng in the equatin. Multicllinearity If ne f the x variables is highly crrelated with a linear cmbinatin f thers, then the XX matrix will be nearly singular and its inverse will tend t explde. It is imprtant t realize that near-multicllinearity is nt a vilatin f the OLS assumptins. If XX is nearly singular, then the diagnal elements are small relative t the ffdiagnal elements. Remember that the diagnal elements are prprtinal t sample variances f the x variables and the ff-diagnal elements are cvariances. If the crrelatins amng the x variables are high, then the cvariances are large relative t variances. If XX is near zer, then its inverse will be very large. The variances f the regressin cefficients are prprtinal t the diagnal elements f this matrix, s near-perfect multicllinearity leads t very imprecise estimatrs. This makes sense: if tw regressrs are highly crrelated with each ther, then the OLS algrithm wn t be able t figure ut which ne is affecting y. Symptms Lw t statistics but a high regressin F statistic implies that cefficients are cllectively, but nt individually, significantly different frm zer Culd have high F statistic n a few variables intly but nt individually: smething affects y but can t tell which ne. Variance-inflatin factr Measure f hw unreliable a cefficient estimate is 1 var b, where 1 s 1R s var, X R is frm reg f X n ther X VIF 1 R 1. Can d manually, r dwnlad vif and install cmmand frm Stata Web site VIF > 10 (5) means that 90% (80%) f variance f X is explained by remaining X variables. ~ 59 ~

These are cmmnly cited threshlds fr wrrying abut multicllinearity. What t d abut multicllinearity? Get better data in which the tw regressrs vary independently. If n additinal data are available, ne variable might have t be drpped, r can reprt the (accurate) results f the regressin. ~ 60 ~