Properties and Hypothesis Testing

Similar documents
Simple Linear Regression

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Random Variables, Sampling and Estimation

1 Inferential Methods for Correlation and Regression Analysis

Topic 9: Sampling Distributions of Estimators

Statistical Properties of OLS estimators

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Common Large/Small Sample Tests 1/55

Topic 9: Sampling Distributions of Estimators

Linear Regression Models, OLS, Assumptions and Properties

Statistical inference: example 1. Inferential Statistics

Topic 9: Sampling Distributions of Estimators

Lesson 11: Simple Linear Regression

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

(all terms are scalars).the minimization is clearer in sum notation:

Chapter 13, Part A Analysis of Variance and Experimental Design

Algebra of Least Squares

Sample Size Determination (Two or More Samples)

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

STATISTICAL INFERENCE

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Regression, Inference, and Model Building

Stat 200 -Testing Summary Page 1

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

ECON 3150/4150, Spring term Lecture 3

Frequentist Inference

Expectation and Variance of a random variable

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Final Examination Solutions 17/6/2010

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

11 Correlation and Regression

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

A statistical method to determine sample size to estimate characteristic value of soil parameters

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

Linear Regression Models

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Efficient GMM LECTURE 12 GMM II

Describing the Relation between Two Variables

CLRM estimation Pietro Coretto Econometrics

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

MA Advanced Econometrics: Properties of Least Squares Estimators

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

UNIT 11 MULTIPLE LINEAR REGRESSION

Problem Set 4 Due Oct, 12

GG313 GEOLOGICAL DATA ANALYSIS

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

4. Hypothesis testing (Hotelling s T 2 -statistic)

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Correlation Regression

1 Review of Probability & Statistics

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Last Lecture. Wald Test

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Mathematical Notation Math Introduction to Applied Statistics

Simple Linear Regression

This is an introductory course in Analysis of Variance and Design of Experiments.

MA238 Assignment 4 Solutions (part a)

Stat 319 Theory of Statistics (2) Exercises

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Statistics 511 Additional Materials

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Statistical Inference About Means and Proportions With Two Populations

Topic 18: Composite Hypotheses

Stat 139 Homework 7 Solutions, Fall 2015

TAMS24: Notations and Formulas

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

6 Sample Size Calculations

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

Chapter 6 Sampling Distributions

5. Likelihood Ratio Tests

Lecture 2: Monte Carlo Simulation

Sampling, Sampling Distribution and Normality

Estimation for Complete Data

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Transcription:

Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data. 3. Pael data. The first cosists o observig various ecoomic uit (e.g. firms, coutries, households, idividuals) at oe poit i time. For example, we observe the wages, experiece ad educatio of may idividuals, oly oce ad at all at the same time. The secod cosists o observig the same ecoomic uit at differet poit i time. For example, we observe daily stock prices over may years. Fially, the third combies the characteristics of the first ad the secod. That is, we observe various ecoomic uits at repeated poits i time. For example, we have iformatio about the iflatio, uemploymet ad GDP of a group of coutries ad over may years. 3.2 Assumptios of the model Whe the regressors i our ecoometric model are o stochastic, we will make the followig six assumptios. 1. The model is liear i the parameters ad it is correctly specified. Equatio 2.1 is liear i β, while Equatio 2.2 is ot. Y = β 1 + β 2 X+ u (3.1) Y = β 1 X β 2 + u (3.2) 23

24 3 Properties ad Hypothesis Testig 2. There is some variatio i the regressor i the sample. We eed variatio i the variable X to idetify the relatioship. Cosider the OLS estimator for β 2 : b 2 = i=0 (X i X)(Y i Ȳ) i=0 (X i X) 2. (3.3) If there is o variatio i X, the the deomiator is zero ad we caot obtai b 2. 3. The expected value of the disturbace term is zero. E(u i )=0 for all i. (3.4) Some u i will be egative, some will be positive, but o average they will be zero. If a costat is icluded i the model, the coditio is satisfied automatically. 4. The disturbace term is homoscedastic. Homoscedasticity meas that the variace of the error terms u i is costat across all observatios i. Hece, we ca write: σ 2 u i = σ 2 u for all i. (3.5) Because the error term has zero mea (from assumptio 3), the the populatio variace of u i is equal to: E(u 2 i)=σ 2 u for all i. (3.6) σ 2 u is a populatio parameter, therefore it is ukow ad eed to be estimated. 5. The values of the disturbace terms have idepedet distributios. u i is distributed idepedetly of u j for all j i. (3.7) This meas that there is o autocorrrelatio i the error term. This meas that the populatio covariace betwee u i ad u j is zero: σ ui u j = 0. (3.8) With assumptios 1 through 5, we says that OLS coefficiets are BLUE: Best Liear Ubiased Estimators. They are best, because they have the smallest variace across all ubiased estimators. 6. The disturbace term has a ormal distributio. u i N[0,σ 2 u] for all i. (3.9) The error term is distributed ormal with mea zero ad variace σ 2 u. This assumptio becomes useful at the time of performig t tests, F tests, ad costructig cofidece itervals for β 1 ad β 2 usig the regressio results. The justificatio for this assumptio depeds o the cetral limit theorem. This oe state that if a radom variable is the composite result of the effects of a large umber of

3.4 Precisio of the coefficiets 25 other radom variables (that are ot ecessarily ormal), it will have a approximately ormal distributio. 3.3 Ubiasedess of the coefficiets Recall that a estimator ˆθ is ubiased if E( ˆθ)=θ. The expected value of the estimator is equal to the true populatio parameter. For the slope coefficiet i the OLS regressio we have: where b 2 = i=0 (X i X)(Y i Ȳ) i=0 (X i X) 2 (3.10) = β 2 + i=0 (X i X)u i i=0 (X i X) 2 = β 2 + a i = a i u i (X i X) i=0 (X i X) 2. (3.11) Thus, this shows that b 2 is equal to its true value, β 2, plus a liear combiatio of the values of the error terms. If we take expectatios of b 2 we have: E(b 2 )=E(β 2 )+E ( ) a i u i = β2 + E(a i u i )=β 2 + a i E(u i )=β. (3.12) The term a i goes out of the expectatio because a i is oly a fuctio of costat Xs. I additio, the last equality holds because E(u i )=0. Hece, b 2 is a ubiased estimator of β 2, E(b 2 )=β 2. 3.4 Precisio of the coefficiets We are also iterested o how precise b 1 ad b 2 are i estimatig the populatio parameters β 1 ad β 2. A measure of this precisio are their populatio variaces, give by: σ 2 b 1 = σ 2 u σ 2 b 2 = ( 1 + X i=0 (X i X) 2 ), ad (3.13) σ 2 u i=0 (X i X) 2 (3.14)

26 3 Properties ad Hypothesis Testig Oe cocer i the implemetatio of the above formulas is that σ 2 u is a ukow populatio parameter ad eed to be estimated. A atural estimator for this regressio variace is the variace of the regressio errors. Because the populatio regressio errors u i are also ukow, we use the sample couterparts e i ad adjust for the correspodig degrees of freedom. Hece, we have: Su 2 = 1 2 e 2 i. (3.15) This Su 2 is the ubiased estimator of σu 2, ad 2 are the degrees of freedom. We subtract two from the sample size because we are estimatig two parameters: the regressio costat ad oe slope coefficiet. The, we use the followig formulas to estimate the stadard errors of b 1 ad b 2 : S b1 = S b2 = S 2 u ( 1 + X i=0 (X i X) 2 ), ad (3.16) Su 2 i=0 (X i X) 2. (3.17) 3.5 The Gauss-Markov theorem The Gauss-Markov theorem simply states that whe assumptios 1 through 5 above are satisfied, the OLS estimators are Best Liear Ubiased Estimators (BLUE) of the regressio parameters. Best refers to smallest variace. 3.6 Hypotheses testig Hypothesis testig is simply a method of makig decisios usig data. It starts with the formulatio of the ull ad the alterative hypotheses ad the uses some test statistics to assess the truth of the ull hypothesis. 3.6.1 Formulatio of the ull hypothesis The formulatio of the ull hypothesis starts with a relatioship i mid. For example, that the percetage rate of price iflatio (p) depeds o the percetage rate of wage iflatio (w) followig the liear equatio: p i = β 1 + β 2 w i + u i (3.18)

3.6 Hypotheses testig 27 The, you wat to test the hypothesis that the price iflatio is equal to the wage iflatio. This is deoted by H 0 ad it is kow as the ull hypothesis. I additio, we also defie a alterative hypothesis, deoted by H 1 ad represets the coclusio of the test if the ull hypothesis is rejected. For our example the ull ad the alterative hypothesis are writte as: I geeral, the ull ad alterative hypotheses are: H 0 : β 2 = 1 (3.19) H 1 : β 2 1 (3.20) H 0 : β 2 = β 0 2 (3.21) H 1 : β 2 β 0 2. (3.22) 3.6.2 t-tests Recall that β 2 is ukow ad that we have to use the estimate b 2. The, the decisio rule to reject the ull hypothesis should compare the estimate b 2 with the hypothesized value β2 0. Ituitively, if the values are far apart, the there is evidece agaist the ull. This compariso should take ito accout the fact that b 2 is subject to some samplig variatio (it is ot the actual β 2 ). We will use the followig statistic: z= b 2 β 0 2 σ b2 (3.23) The umerator is just the distace betwee the regressio estimate ad the hypothesized value, with the deomiator is the stadard deviatio of b 2, give by the square root of the expressio i Equatio 3.14. z is the umber of stadard deviatios betwee b 2 ad β 2. For a kow σ b2, this oe follows a ormal distributio. However σ b2 is ukow ad we eed to use the estimate of the stadard error of b 2. This oe is give by S b2 ad it is preseted i Equatio 3.17. The we use the followig t-statistic: t = b 2 β 0 2 S b2 (3.24) To kow if the deviatios betwee b 2 ad β2 0 are sigificatly large, we compare this t-statistic with the critical values from the table t distributio with 2 degrees of freedom. The ull hypothesis is ot rejected if the followig coditio is met: t 2,α/2 b 2 β 0 2 S b2 t 2,α/2 (3.25) Where t 2,α/2 is just the otatio of the critical value tha comes from the t distributio with 2 degrees of freedom ad at sigificace level α. The sigificace

28 3 Properties ad Hypothesis Testig Fig. 3.1 Acceptace regio for the t-test. level is the probability that we reject the ull hypothesis whe i fact it is true. The rejectio regios are illustrated i Figure 3.1. 3.6.3 Cofidece itervals The cofidece iterval idicates the reliability of a estimate. The cofidece iterval for the populatio parameter β 2 ca be derived from Equatio 3.25 i the followig way: 1 α = P ( t 2,α/2 b 2 β 2 S b2 t 2,α/2 ) 1 α = P ( t 2,α/2 S b2 b 2 β 2 t 2,α/2 S b2 ) 1 α = P ( b 2 t 2,α/2 S b2 β 2 b 2 +t 2,α/2 S b2 ) (3.26) The meaig of the above equatio is that the populatio parameter β 2 will be betwee the lower cofidece limit b 2 t 2,α/2 S b2 ad the upper cofidece limit b 2 +t 2,α/2 S b2 with probability (1 α) or 100 (1 α)%. The p values provide a alterative approach to reportig the sigificace of regressio coefficiets or whe carryig out more geeral hypothesis testig. As you ca see from Equatio 3.25 ad Figure 3.1, differet sigificace levels α ca yield a differet coclusio i the rejectio or ot of the ull hypothesis. The p value of a hypothesis test represet the miimum sigificace level at which the ull is rejected. The, whe the p value is below the sigificace level α we reject the ull.

3.6 Hypotheses testig 29 Fig. 3.2 Cofidece iterval for β 2. 3.6.4 F test A useful tool if we wat to test if there is o relatioship betwee X ad Y if the F test. I the simple liear regressio model with oly oe slope coefficiet, the ull ad the alterative i a F test are: H 0 : β 2 = 0 (3.27) H 1 : β 2 0. (3.28) This test is build o the idea of testig how good is the regressio model i explaiig the variatio i Y. I Equatio 2.15 we already separated the variatio of Y ito its explaied ad uexplaied compoets. These are: (Y i Ȳ) 2 = (Ŷ i Ȳ) 2 + (Y i Ŷ i ) 2 (3.29) T SS = ESS+RSS. (3.30) The total sum of squares (TSS) is the summatio of the explaied sum of squares (ESS) ad the residual sum of squares (RSS). The, the F statistic for goodess of fit of a regressio is writte as the explaied sum of squares, per explaatory variable, divided by the residual sum of squares, per remaiig degrees of freedom: F = ESS/(k 1) RSS/( k) (3.31)

30 3 Properties ad Hypothesis Testig Fig. 3.3 Regressio output i MS Excel. where k is the total umber of coefficiets we are estimatig, hece (k 1) is the umber of slope coefficiets. That is, the total umber of parameters we are estimatig mius the costat parameter. If we divide the umerator ad the deomiator by T SS, the the F statistics ca be writte i terms of the R 2 as follows: F = (ESS/T SS)/(k 1) (RSS/T SS)/( k) = R2 /(k 1) (1 R 2 )/( k) (3.32) If this F statistic is greater that the critical value from the table F distributio with (k 1) ad ( k) degrees of freedom, F k 1, k, we reject the ull hypothesis ad coclude that the regressio model does ot sigificatly explai the variatio i variable Y. For the simple regressio model with oly oe slope coefficiet, k = 2, we have: R 2 F = (1 R 2 )/( 2). (3.33) If this F statistic>f 1, 2 we reject the ull hypothesis preseted i Equatio 3.28.

3.7 Computer output 31 3.7 Computer output The computer regressio output is very similar across differet statistical packages. Figure 3.3 shows the output usig MS Excel for the estimatio of the followig simple regressio model: wage = β 1 + β 2 exper i + u i (3.34) To obtai the regressio estimated coefficiets we use Equatios 2.4 ad 2.5: b 2 = (X i X)(Y i Ȳ) (X i X) 2 = 0.091 (3.35) b 1 = Ȳ b 2 X = 4.642 (3.36) The total sum of squares, estimates sum of squares, ad residual sum of squares are obtaied usig 2.15 ad 2.15: T SS = ESS = RSS = The regressio R 2 comes from Equatio 2.18: R 2 = 1 From the square root of Equatio 3.15: 1 S u = 2 (Y i Ȳ) 2 = 27347.439 (3.37) (Ŷ i Ȳ) 2 = 1505.539 (3.38) (Y i Ŷ i ) 2 = 25841.901 (3.39) e2 i (Y = 0.055 (3.40) 2 i Ȳ) e 2 i = 4.532 (3.41) The, the stadard errors of the coefficiets are computer usig Equatios 3.17 ad 3.17: ( 1 S b1 = Su 2 + X ) i=0 (X i X) 2 = 0.233 (3.42) Su S b2 = 2 i=0 (X = 0.011 (3.43) i X) 2 The F statistic uses Equatio 3.32:

32 3 Properties ad Hypothesis Testig The t statistics use Equatio 3.24: F = R2 /(k 1) (1 R 2 = 73.291 (3.44) )/( k) t = b 1 S b1 = 19.961 (3.45) t = b 2 S b2 = 8.561 (3.46) Fially, for the 95% upper ad lower cofidece levels, we use Equatio 3.26: b 1 t 2,α/2 S b1 = 4.186 (3.47) b 1 +t 2,α/2 S b1 = 5.099 (3.48) b 2 t 2,α/2 S b2 = 0.071 (3.49) b 2 +t 2,α/2 S b2 = 0.112 (3.50)