Econometrics. 4) Statistical inference

Similar documents
Econometrics. 5) Dummy variables

Econometrics. 7) Endogeneity

Lecture 3: Inference in SLR

Correlation Analysis

Inferences for Regression

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Lectures 5 & 6: Hypothesis Testing

Chapter 12 - Lecture 2 Inferences about regression coefficient

Ch 2: Simple Linear Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

HYPOTHESIS TESTING. Hypothesis Testing

Sociology 6Z03 Review II

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

A discussion on multiple regression models

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

STA 101 Final Review

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Review of Statistics 101

Lecture 10 Multiple Linear Regression

Can you tell the relationship between students SAT scores and their college grades?

STAT 3A03 Applied Regression With SAS Fall 2017

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Econometrics. 8) Instrumental variables

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

LECTURE 5 HYPOTHESIS TESTING

Econometrics - 30C00200

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

Chapter 3 Multiple Regression Complete Example

SIMPLE REGRESSION ANALYSIS. Business Statistics

Lecture 13 Extra Sums of Squares

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Simple Linear Regression: One Qualitative IV

ECON 4230 Intermediate Econometric Theory Exam

Econometrics Review questions for exam

AMS 7 Correlation and Regression Lecture 8

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Basic Business Statistics 6 th Edition

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Introduction to Statistical Data Analysis III

s e, which is large when errors are large and small Linear regression model

The Multiple Regression Model

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

Chapter 16. Simple Linear Regression and Correlation

Inference for Regression Simple Linear Regression

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

One-way ANOVA. Experimental Design. One-way ANOVA

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Multiple Regression Analysis: Heteroskedasticity

Sampling Distributions: Central Limit Theorem

Six Sigma Black Belt Study Guides

Statistics for Managers using Microsoft Excel 6 th Edition

Tests about a population mean

Multiple Regression Analysis

2 Regression Analysis

Correlation and Simple Linear Regression

INFERENCE FOR REGRESSION

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

This document contains 3 sets of practice problems.

Answer Key: Problem Set 6

Simple Linear Regression

Module 8: Linear Regression. The Applied Research Center

Review of Statistics

Multiple Regression Analysis

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Chapter 16. Simple Linear Regression and dcorrelation

Taguchi Method and Robust Design: Tutorial and Guideline

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Statistics and Quantitative Analysis U4320

Regression Models - Introduction

Open book and notes. 120 minutes. Covers Chapters 8 through 14 of Montgomery and Runger (fourth edition).

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Econometrics Midterm Examination Answers

Logistic Regression Analysis

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Heteroskedasticity. Part VII. Heteroskedasticity

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Intro to Linear Regression

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Introduction to the Analysis of Variance (ANOVA)

Business Statistics. Lecture 10: Course Review

Inference for the Regression Coefficient

MS&E 226: Small Data

ST430 Exam 1 with Answers

STAT Chapter 8: Hypothesis Tests

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

10) Time series econometrics

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Inference for Regression

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Stat 5102 Final Exam May 14, 2015

Transcription:

30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen

Today s topics Confidence intervals of parameter estimates Student s t-distribution Hypothesis testing t-test of significance of coefficients p-value One-sided vs. two-sided tests Type I and II errors in hypothesis testing Power of a test

Types of statistical inference Estimation Point estimation Interval estimation Hypothesis testing

SUMMARY OUTPUT Excel output of the hedonic model - Multiple regression Regression Statistics Multiple R 0,905971 R Square 0,820784 Adjusted R Square 0,81225 Standard Error 80593,26 Observations 67 ANOVA df SS MS F Significance F Regression 3 1,87E+12 6,25E+11 96,17703 1,76E-23 Residual 63 4,09E+11 6,5E+09 Total 66 2,28E+12 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 141366,7 36401,58 3,883532 0,000249 68623,96 214109,5 size m2 6972,404 857,5745 8,130378 2,11E-11 5258,678 8686,13 nr. bedrooms -65182,6 20401,6-3,19498 0,002186-105952 -24413,3 age -2820,1 495,0578-5,6965 3,46E-07-3809,39-1830,8

Interval estimation Definition: Let X be a random sample from a probability distribution with parameter μ. The 100 (1-α)% confidence interval for parameter μ is an interval with random endpoints [a(x), b(x)] determined by random sample X, such that Pr( a( X ) b( X )) 1 Interpretation in terms of repeated samples: suppose we draw a large number of random samples X from the population, and calculate the confidence interval for each sample. The calculated confidence interval (which would differ for each sample) would encompass the true population parameter μ in 100 (1-α)% of samples. 5

Interval estimation Definition: Let X be a random sample from a probability distribution with parameter μ. The 100 (1-α)% confidence interval for parameter μ is an interval with random endpoints [a(x), b(x)] determined by random sample X, such that Pr( a( X ) b( X )) 1 Significance α is usually specified as 5% or 1% For a given sample X and significance α, our objective is to calculate the lower limit a the upper limit b. 6

Asymptotic normality By the central limit theorem, if the assumptions Exogeneity, Homoscedasticity, and Serial independence hold, then b 2 converges in distribution to the normal distribution b 2 ~ N, ( n 1) Var ( x ) 2 2 a Denoting 2 b2 2 ( n 1) Var( x) We have a shorter expression: b ~ N, b 2 2 a 2 2

Standardization The result b ~ N, b 2 2 a 2 2 implies that ( b )~ N 0, b 2 2 2 a 2 and further ( b2 2) ~ N 0,1 a b2

Standard normal distribution Ф is the cumulative density function of N(0,1) Ф(-1.96)=0.025 Ф(1.96)=0.975 Ф(-2.56)=0.005 Ф(2.56)=0.995

95% confidence interval for slope β 2 Using ( b2 2) ~ N 0,1 a b2 and Ф(-1.96)=0.025, Ф(1.96)=0.975 when n is sufficiently large ( b ) b2 2 2 Pr 1.96 0.025 ( b ) b2 2 2 Pr 1.96 0.975 Thus, ( b ) 2 2 Pr 1.96 1.96 0.95 b2

95% confidence interval for slope β 2 Modifying ( b ) 2 2 Pr 1.96 1.96 0.95 b2 Pr 1.96 b 1.96 0.95 b2 2 2 b2 Pr b 1.96 b 1.96 0.95 2 b2 2 2 b2 Thus, the 95% confidence interval takes the form b 1.96 b 2 2

Confidence interval for slope β 2 Recall the 95% confidence interval b 1.96 b 2 2 If n is very large, we can substitute the true but unknown standard deviation σ b2 by the estimated standard error However, in small samples the estimation of σ b2 causes an additional source of variation that should be taken into account in the confidence interval -> we need to take the critical value from the Student s t distribution rather than from N(0,1) (i.e., 1.96 used above) b t s. e.( b ) 12 2 crit 2

Student s t distribution t distribution depends on the degrees of freedom (df): it converges to N(0,1) as df increases For OLS estimator, df = n - K where K = number of unknown model parameters (β s) 13

Student s t distribution Confidence interval b t s. e.( b ) 2 crit 2 Example of critical t values at 5% significance level with different sample sizes (df= n-2) Excel function =TINV(prob; df) n t crit 20 2.101 50 2.011 100 1.984 200 1.972 500 1.965 1,000 1.962 2,000 1.961 5,000 1.960 14

Classic approach: Hypothesis testing 1) State the null hypothesis (H 0 ) and the alternative hypothesis (H 1 ). 2) Specify the probability model under H 0 and the necessary assumptions. 3) Compute the test statistic (S) with a known probability distribution under H 0. 4) Identify the acceptance and rejection regions, given the known probability distribution of S and the pre-assigned significance level. 5) Accept H 0 if S falls within the acceptance region; Reject H 0 if S falls within the rejection region. 15

Testing hypotheses concerning β s Three alternative approaches: 1) Confidence intervals 2) t-test 3) p-value 16

Tests of β s using confidence intervals We can use confidence intervals for testing hypotheses Applies to all two-sided tests Significance test: H 0 : β 2 = 0 (x has no effect on y) H 1 : β 2 0 Tests of theoretical restrictions: H 0 : β 2 = β* (based on theory) H 1 : β 2 β* 17

Tests of β s using confidence intervals At the given significance level α: Estimate the 100%(1- α) confidence interval and state the hypotheses to be tested. H 0 : β 2 = β*, H 1 : β 2 β* Accept H 0 if β* is contained within the 100%(1- α) confidence interval Reject H 0 if β* falls outside the 100%(1- α) confidence interval 18

SUMMARY OUTPUT Excel output of the hedonic model - Multiple regression Regression Statistics Multiple R 0,905971 R Square 0,820784 Adjusted R Square 0,81225 Standard Error 80593,26 Observations 67 ANOVA df SS MS F Significance F Regression 3 1,87E+12 6,25E+11 96,17703 1,76E-23 Residual 63 4,09E+11 6,5E+09 Total 66 2,28E+12 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 141366,7 36401,58 3,883532 0,000249 68623,96 214109,5 size m2 6972,404 857,5745 8,130378 2,11E-11 5258,678 8686,13 nr. bedrooms -65182,6 20401,6-3,19498 0,002186-105952 -24413,3 age -2820,1 495,0578-5,6965 3,46E-07-3809,39-1830,8

t-test To derive the test statistic, recall that b 2 2 b2 ~ N 0,1 a Further, using the standard error estimated from data, we have b2 2 ~ tn ( 2) s. e.( b ) 2 20

t-test H 0 : β 2 = β* H 1 : β 2 β* If H 0 is true, then we can use the test statistic (t stat): t b 2 * s. e.( b ) 2 If H 0 is true, then our test statistic follows Student s t distribution with (n-2) degrees of freedom. 21

t-test H 0 : β 2 = β* H 1 : β 2 β* Acceptance region: If -t crit < t < t crit, then maintain H 0 Rejection region: t < -t crit or t > t crit, then reject H 0 22

Significance test In the case of the significance test: H 0 : β 2 = 0, H 1 : β 2 0 Test statistic: t b 2 s. e.( b ) 2 The value of this test statistic reported as a part of the Stata output. The reported value should be compared with t crit obtained from statistical tables (or e.g. Excel) 23

SUMMARY OUTPUT Excel output of the hedonic model - Multiple regression Regression Statistics Multiple R 0,905971 R Square 0,820784 Adjusted R Square 0,81225 Standard Error 80593,26 Observations 67 ANOVA df SS MS F Significance F Regression 3 1,87E+12 6,25E+11 96,17703 1,76E-23 Residual 63 4,09E+11 6,5E+09 Total 66 2,28E+12 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 141366,7 36401,58 3,883532 0,000249 68623,96 214109,5 size m2 6972,404 857,5745 8,130378 2,11E-11 5258,678 8686,13 nr. bedrooms -65182,6 20401,6-3,19498 0,002186-105952 -24413,3 age -2820,1 495,0578-5,6965 3,46E-07-3809,39-1830,8

p-value From reported t-statistics, it is not always directly obvious whether coefficient is statistically significant at 5%, 1%, or perhaps 10% significance levels. Need to compare with t crit that depend on n and α The p- value indicates directly the probability of obtaining the observed t or higher when H 0 is true. The probability of Type I error when H 0 is true. The p- value indicates directly the smallest significance level α at which H 0 can be rejected. 25

One-sided vs. two-sided test Two-sided test H 0 : β 2 = β* H 1 : β 2 β* One-sided tests H 0 : β 2 = β* H 1 : β 2 < β* [or H 1 : β 2 > β*] The sign or direction of the deviation from the null hypothesis is known from theory or experience. 26

One-sided vs. two-sided test Two-sided test H 0 : β 2 = β* H 1 : β 2 β* One-sided tests H 0 : β 2 = β* H 1 : β 2 < β* 27

One-sided vs. two-sided test Example: critical t values at 5% significance levels, df = n-2 one-sided two-sided 20 1,734 2,101 50 1,677 2,011 100 1,661 1,984 200 1,653 1,972 500 1,648 1,965 1000 1,646 1,962 2000 1,646 1,961 5000 1,645 1,960 28

One-sided vs. two-sided test Impacts of one-sided testing: Decrease in the critical t value Easier to reject H 0 Increases both the size and the power of the test 29

Comparison of the 3 approaches Confidence intervals + applies to both significance tests and theoretical restrictions - two-sided tests only - fixed significance level (= 1 confidence level) p-value - significance test - two-sided tests only + any significance level can be used t-statistic - significance test (but t-stats for theoretical restrictions can be computed) + applies to both one-sided and two-sided tests + any significance level can be used - need to find critical value of the t-stat from statistical tables (or Excel) 30

Interpretation of the test Important: if H 0 is accepted, it does not mean that H 0 has been proved to be true. The null hypothesis is assumed to be true from the start of the test; if there is not enough evidence to reject the null, it simply continues to be assumed true. Statistical test can fail to reject H 0 even when H 0 is false Statistical power of the test! 31

Two possible types of error Accept H 0 Reject H 0 H 0 is true Correct Type I error H 0 is false Type II error Correct 32

Size of a test The probability of a type I error is called the size of the test. This is directly controlled for by setting the significance level α. Setting α = 5% means that we tolerate 5% risk of rejecting H 0 when it is in fact true. 33

Power of a test The power of the test is the probability that it will correctly lead to rejection of a false null hypothesis: power = 1 Prob(type II error) = 1 - β For a given α, we would like β to be as small as possible Most powerful test Tradeoff: decreasing the probability of type I error, probability of type II error increases, and vice versa. 34

Topic: Next time Mon 21 Sept Dummy variables 35