MS&E 226: Small Data

Similar documents
MS&E 226: Small Data

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Stat 5102 Final Exam May 14, 2015

Lecture 6 Multiple Linear Regression, cont.

Inference for Regression

Ch 2: Simple Linear Regression

Lecture 15. Hypothesis testing in the linear model

Simple Linear Regression

MATH 644: Regression Analysis Methods

Simple Linear Regression

Regression and the 2-Sample t

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

ST430 Exam 2 Solutions

SCHOOL OF MATHEMATICS AND STATISTICS

Variance Decomposition and Goodness of Fit

Statistics for Engineers Lecture 9 Linear Regression

1 Use of indicator random variables. (Chapter 8)

Lecture 5: Clustering, Linear Regression

Multiple Linear Regression

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Tests of Linear Restrictions

Lecture 5: Clustering, Linear Regression

CAS MA575 Linear Models

Swarthmore Honors Exam 2012: Statistics

Ch 3: Multiple Linear Regression

Applied Regression Analysis

STAT 3A03 Applied Regression With SAS Fall 2017

MODELS WITHOUT AN INTERCEPT

Model comparison and selection

R 2 and F -Tests and ANOVA

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Lecture 18: Simple Linear Regression

13 Simple Linear Regression

Lecture 6: Linear Regression

Lecture 1: Linear Models and Applications

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

ST505/S697R: Fall Homework 2 Solution.

AMS 7 Correlation and Regression Lecture 8

1 Multiple Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Multiple Regression Analysis

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Chapter 8 Conclusion

Lecture 5: Clustering, Linear Regression

Multiple Linear Regression

AMS-207: Bayesian Statistics

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Multiple Linear Regression. Chapter 12

Handout 4: Simple Linear Regression

Dealing with Heteroskedasticity

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

14 Multiple Linear Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

STK4900/ Lecture 3. Program

ECO220Y Simple Regression: Testing the Slope

Linear Regression Model. Badr Missaoui

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Math 3330: Solution to midterm Exam

ST430 Exam 1 with Answers

Density Temp vs Ratio. temp

Probability and Statistics Notes

Maximum-Likelihood Estimation: Basic Ideas

Inference with Heteroskedasticity

Lectures on Simple Linear Regression Stat 431, Summer 2012

Simple and Multiple Linear Regression

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Statistical Modelling in Stata 5: Linear Models

LECTURE 5 HYPOTHESIS TESTING

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

36-707: Regression Analysis Homework Solutions. Homework 3

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

SCHOOL OF MATHEMATICS AND STATISTICS

MS&E 226: Small Data

Applied Statistics and Econometrics

Correlation 1. December 4, HMS, 2017, v1.1

Simple Linear Regression

Lecture 14 Simple Linear Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science

Exam Applied Statistical Regression. Good Luck!

Simple Linear Regression for the Climate Data

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Coefficient of Determination

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Lecture 4 Multiple linear regression

F9 F10: Autocorrelation

STAT 215 Confidence and Prediction Intervals in Regression

GMM - Generalized method of moments

Statistical Inference with Regression Analysis

Extensions of One-Way ANOVA.

Comparing Nested Models

Lectures 5 & 6: Hypothesis Testing

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

MS&E 226: Small Data

Transcription:

MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32

The recipe 2 / 32

The hypothesis testing recipe In this lecture we repeatedly apply the following approach. If the true parameter was θ 0, then the test statistic T (Y) should look like it would when the data comes from f(y θ 0 ). We compare the observed test statistic T obs to the sampling distribution under θ 0. If the observed T obs is unlikely under the sampling distribution given θ 0, we reject the null hypothesis that θ = θ 0. The theory of hypothesis testing relies on finding test statistics T (Y) for which this procedure yields as high a power as possible, given a particular size. 3 / 32

The Wald test 4 / 32

Assumption: Asymptotic normality We assume that the statistic we are looking at is in fact an estimator ˆθ of a parameter θ, that is: consistent and asymptotically normal. (Example: MLE.) I.e., for large n, the sampling distribution of ˆθ is: ˆθ N (θ, ŜE2 ), where θ is the true parameter. 5 / 32

The Wald test The Wald test uses test statistic: The recipe: T (Y) = ˆθ θ 0 ŜE. 6 / 32

The Wald test The Wald test uses test statistic: T (Y) = ˆθ θ 0 ŜE. The recipe: If the true parameter was θ 0, then the sampling distribution of the Wald test statistic should be approximately N (0, 1). 6 / 32

The Wald test The Wald test uses test statistic: T (Y) = ˆθ θ 0 ŜE. The recipe: If the true parameter was θ 0, then the sampling distribution of the Wald test statistic should be approximately N (0, 1). Look at the observed value of the test statistic; call it T obs. 6 / 32

The Wald test The Wald test uses test statistic: T (Y) = ˆθ θ 0 ŜE. The recipe: If the true parameter was θ 0, then the sampling distribution of the Wald test statistic should be approximately N (0, 1). Look at the observed value of the test statistic; call it T obs. Under the null, T obs 1.96 with probability 0.95. 6 / 32

The Wald test The Wald test uses test statistic: T (Y) = ˆθ θ 0 ŜE. The recipe: If the true parameter was θ 0, then the sampling distribution of the Wald test statistic should be approximately N (0, 1). Look at the observed value of the test statistic; call it T obs. Under the null, T obs 1.96 with probability 0.95. So if we reject the null when T obs > 1.96, the size of the test is 0.05. 6 / 32

The Wald test The Wald test uses test statistic: T (Y) = ˆθ θ 0 ŜE. The recipe: If the true parameter was θ 0, then the sampling distribution of the Wald test statistic should be approximately N (0, 1). Look at the observed value of the test statistic; call it T obs. Under the null, T obs 1.96 with probability 0.95. So if we reject the null when T obs > 1.96, the size of the test is 0.05. More generally, let z α/2 be the unique point such that P( Z > z α/2 ) = α, for a standard normal r.v. Z. Then the Wald test of size α rejects the null when T obs > z α/2. 6 / 32

The Wald test: A picture 7 / 32

The Wald test and confidence intervals Recall that a (asymptotic) 95% confidence interval for the true θ is: [ˆθ 1.96ŜE, ˆθ + 1.96ŜE]. Thus the Wald test of size 5% is equivalent to rejecting the null if θ 0 is not in the 95% confidence interval. 8 / 32

The Wald test and confidence intervals Recall that a (asymptotic) 95% confidence interval for the true θ is: [ˆθ 1.96ŜE, ˆθ + 1.96ŜE]. Thus the Wald test of size 5% is equivalent to rejecting the null if θ 0 is not in the 95% confidence interval. More generally, recall that a (asymptotic) 1 α confidence interval for the true θ is: [ˆθ z α/2 ŜE, ˆθ + z α/2 ŜE]. The Wald test of size α is equivalent to rejecting the null if θ 0 is not in the 1 α confidence interval. 8 / 32

Power of the Wald test Now suppose the true θ θ 0. What is the chance we (correctly) reject the null? Note that in this case, the sampling distribution of the Wald test statistic is still approximately normal with variance 1, but now with mean (θ θ 0 )/ŜE. Therefore the power at θ is approximately P( Z > z α/2 ), where: Z N ( ) θ θ0 ŜE, 1. 9 / 32

Power of the Wald test: A picture 10 / 32

Example: Significance of an OLS coefficient Suppose given X, Y, we run a regression and find OLS coefficients ˆβ. We test whether the true β j is zero or not. The Wald test statistic is ˆβ j /ŜE j. If this statistic has magnitude larger than 1.96, then we say the coefficient is statistically significant (at the 95% level). This is equivalent to the following: if zero is not in the 95% confidence interval for a particular regression coefficient ˆβ j, the coefficient is statistically significant at the 95% level. 11 / 32

More on statistical significance Lots of warnings: Statistical significance of a coefficient suggests it is worth including in your regression model; but don t forget all the other assumptions that have been made along the way! Conversely, just because a coefficient is not statistically significant, does not mean that it is not important to the model! Statistical significance is very different from practical significance! Even if zero is not in a confidence interval, the relationship between the corresponding covariate and the outcome may still be quite weak. 12 / 32

p-values The p-value of a test gives the probability of observing a test statistic as extreme as the one observed, if the null hypothesis were true. For the Wald test: p = P( Z > T obs ), where Z N (0, 1) is a standard normal random variable. Why? Under the null, the sampling distribution of the Wald test statistic is approximately N (0, 1). 13 / 32

p-values Note that: If the p-value is small, the observed test statistic is very unlikely under the null hypothesis. 14 / 32

p-values Note that: If the p-value is small, the observed test statistic is very unlikely under the null hypothesis. In fact, suppose we reject when p < α. This is exactly the same as rejecting when T obs > z α/2. 14 / 32

p-values Note that: If the p-value is small, the observed test statistic is very unlikely under the null hypothesis. In fact, suppose we reject when p < α. This is exactly the same as rejecting when T obs > z α/2. In other words: The Wald test of size α is obtained by rejecting when the p-value is below α. 14 / 32

p-values: A picture 15 / 32

The sampling distribution of p-values: A picture 16 / 32

Use and misuse of p-values Why p-values? They are transparent: Reporting statistically significant (or not) depends on your chosen value of α. What if my desired α is different (more or less conservative)? p-values allow different people to interpret the data using their own desired α. 17 / 32

Use and misuse of p-values Why p-values? They are transparent: Reporting statistically significant (or not) depends on your chosen value of α. What if my desired α is different (more or less conservative)? p-values allow different people to interpret the data using their own desired α. But note: the p-value is not the probability the null hypothesis is true! 17 / 32

The t-test 18 / 32

The z-test We assume that Y = (Y 1,..., Y n ) are i.i.d. N (θ, σ 2 ) random variables. If we know σ 2 : The variance of the sampling distribution of Y is σ 2 /n, so its exact standard error is SE = σ/ n. Thus if θ = θ 0, then (Y θ 0 )/SE should be N (0, 1). So we can use (Y θ 0 )/SE as a test statistic, and proceed as we did for the Wald statistic. This is called a z-test. The only difference from the Wald test is that if we know the Y i s are normally distributed, then the test statistic is exactly normal even in finite samples. 19 / 32

The t-statistic What if we don t know σ 2? Let ˆσ 2 be the unbiased estimator of σ 2. Then with ŜE = ˆσ/ n, Y θ 0 has a Student s t distribution under the null hypothesis that θ = θ 0. This distribution can be used to implement the t-test. ŜE For our purposes, just note that again this looks a lot like a Wald test statistic! Indeed, the t distribution is very close to N (0, 1), even for moderate values of n. 20 / 32

Example: Linear normal model [ ] Assume the linear normal model Y i = X i β + ε i with i.i.d. N (0, σ 2 ) errors ε i. OLS estimator is: ˆβ = (X X) 1 X Y. Now note that given X, the sampling distribution of the coefficients is exactly normal, because the coefficients are linear combinations of the Y i s (which are independent normal random variables). This fact can be used to show the exact sampling distribution of the test statistic ˆβ j /ŜE j under the null that β j = 0 is also a t distribution. (See [SM], Section 5.6.) 21 / 32

Interpreting regression output in R R output from a linear regression: Call: lm(formula = Ozone ~ 1 + Solar.R + Wind + Temp, data = airqualit Residuals: Min 1Q Median 3Q Max -40.485-14.219-3.551 10.097 95.619 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -64.34208 23.05472-2.791 0.00623 ** Solar.R 0.05982 0.02319 2.580 0.01124 * Wind -3.33359 0.65441-5.094 1.52e-06 *** Temp 1.65209 0.25353 6.516 2.42e-09 *** 22 / 32

Statistical significance in R In most statistical software (and in papers), statistical significance is denoted as follows: *** means statistically significant at the 99.9% level. ** means statistically significant at the 99% level. * means statistically significant at the 95% level. 23 / 32

The F test in linear regression 24 / 32

Multiple regression coefficients [ ] We again assume we are in the linear normal model: Y i = X i β + ε i with i.i.d. N (0, σ 2 ) errors ε i. The Wald (or t) test lets us test whether one regression coefficient is zero. 25 / 32

Multiple regression coefficients [ ] We again assume we are in the linear normal model: Y i = X i β + ε i with i.i.d. N (0, σ 2 ) errors ε i. The Wald (or t) test lets us test whether one regression coefficient is zero. What if we want to know if multiple regression coefficients are zero? This is equivalent to asking: would a simpler model suffice? For this purpose we use the F test. 25 / 32

The F statistic [ ] Suppose we have fit a linear regression model using p covariates. We want to test the null hypothesis that all of the coefficients β j, j S (for some subset S) are zero. Notation: ˆr : the residuals from the full regression ( unrestricted ) ˆr (S) : the residuals from the regression excluding variables in S ( restricted ) The F statistic is: F = ( ˆr(S) 2 ˆr 2 )/(p S ) ˆr 2. /(n p) 26 / 32

The F test [ ] The recipe for the F test: The null hypothesis is that β j = 0 for all j S. If this is true, then the test statistic has an F distribution, with p S degrees of freedom in the numerator, and n p degrees of freedom in the denominator. We can use the F distribution to determine how unlikely our observed value of the test statistic is, if the null hypothesis were true. Under the null we expect F 1. Large values of F suggest we can reject the null. 27 / 32

The F test [ ] The F test is a good example of why hypothesis testing is useful: We could implement the Wald test by just looking at the confidence interval for β j. The same is not true for the F test: we can t determine whether we should reject the null by just looking at individual confidence intervals for each β j. The F test is a succinct way to summarize our level of uncertainty about multiple coefficients at once. 28 / 32

The F test [ ] Statistical software such as R does all the work for you. First, note that regression output always includes information on the F statistic, e.g.: Call: lm(formula = Ozone ~ 1 + Solar.R + Wind + Temp, data = airqualit... F-statistic: 54.83 on 3 and 107 DF, p-value: < 2.2e-16 This is always the F test against the null that all coefficients (except the intercept) are zero. What does rejecting this null mean? What is the alternative? 29 / 32

F tests of one model against another [ ] More generally, you can use R to run an F test of one model against another: > anova(fm_small, fm_big) Analysis of Variance Table Model 1: Ozone ~ 1 + Temp Model 2: Ozone ~ 1 + Temp + Solar.R + Wind Res.Df RSS Df Sum of Sq F Pr(>F) 1 109 62367 2 107 48003 2 14365 16.01 8.27e-07 ***... 30 / 32

Caution 31 / 32

A word of warning Used correctly, hypothesis tests are powerful tools to quantify your uncertainty. However, they can easily be misused, as we will see later in the course. Some questions for thought: Suppose with 1000 covariates, you use the t (or Wald) statistic on each coefficient to determine whether to include or exclude it. What might go wrong? Suppose that you test and compare many models by repeatedly using F tests. What might go wrong? 32 / 32