Testing and Model Selection

Similar documents
1.5 Testing and Model Selection

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Practice Questions for the Final Exam. Theoretical Part

Exercise Sheet 6: Solutions

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Model Estimation Example

CHAPTER 6: SPECIFICATION VARIABLES

Multiple Regression Analysis

Föreläsning /31

Ch. 5 Hypothesis Testing

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

Brief Suggested Solutions

Heteroskedasticity. Part VII. Heteroskedasticity

Correlation and Regression

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Economics 671: Applied Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Statistical Distribution Assumptions of General Linear Models

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Lectures 5 & 6: Hypothesis Testing

Testing Hypothesis after Probit Estimation

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Generalized Linear Models

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

CHAPTER 1: BINARY LOGIT MODEL

Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test

Chapter 10 Nonlinear Models

The GARCH Analysis of YU EBAO Annual Yields Weiwei Guo1,a

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Statistics and econometrics

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Ordinary Least Squares Regression Explained: Vartanian

Exercises Chapter 4 Statistical Hypothesis Testing

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Chapter 1 Statistical Inference

Institute of Actuaries of India

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Analysis of Cross-Sectional Data

Linear Regression With Special Variables

Bristol Business School

ARDL Cointegration Tests for Beginner

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Answers to Problem Set #4

Some General Types of Tests

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Statistical Inference. Part IV. Statistical Inference

1/34 3/ Omission of a relevant variable(s) Y i = α 1 + α 2 X 1i + α 3 X 2i + u 2i

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Stat 5101 Lecture Notes

9 Generalized Linear Models

4. Nonlinear regression functions

Econometric Analysis of Cross Section and Panel Data

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Greene, Econometric Analysis (6th ed, 2008)

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Empirical Economic Research, Part II

INTERVAL ESTIMATION AND HYPOTHESES TESTING

LATVIAN GDP: TIME SERIES FORECASTING USING VECTOR AUTO REGRESSION

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Simple logistic regression

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Master s Written Examination - Solution

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Hint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Introduction to Eco n o m et rics

Ordinary Least Squares Regression Explained: Vartanian

Chapter 8 Heteroskedasticity

388 Index Differencing test ,232 Distributed lags , 147 arithmetic lag.

The outline for Unit 3

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Lecture 3: Multiple Regression

Multiple Regression Analysis

7. Estimation and hypothesis testing. Objective. Recommended reading

Lecture 6: Hypothesis Testing

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Chapter 11 Handout: Hypothesis Testing and the Wald Test

Mid-term exam Practice problems

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Introductory Econometrics

CORRELATION, ASSOCIATION, CAUSATION, AND GRANGER CAUSATION IN ACCOUNTING RESEARCH

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Applied Econometrics (QEM)

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Generalized linear models

F79SM STATISTICAL METHODS

Transcription:

Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses (e.g. LR statistic) and to choosing between specifications (e.g. Akaike information criterion). Here is some probit output from the manual p. 66: the model describes how 3 explanatory variables influence the improvement in student s grades. Similar statistics are produced when other models are fitted. What do they mean? To fix ideas take an elementary testing set-up: 16 independent observations y1,..., y16 from a N, 1 population and we wish to test the null hypothesis H0 against the alternative hypothesis HA where H0 : 10 HA : 10. We know that the population variance is 1. The usual procedure is to base the test on the sample mean Y. The distribution of Y is known to be 1 N, 16. (The 1 16 Y 10 If H0 is true, is N 0, 1. 1/4 comes from vary n. At significance level 5% we reject H0 if Y 10 1/4 z0.05 1. 96. Suppose y 9. 5, then y 10 1. 96 1/4 and we reject H0 at the 5% level. Alternatively we can compute the probability value corresponding to, viz. 0. 0456. Most packages including EViews report the outcomes of significance tests in terms of p-values

What is going on? 1. We compare the MLE of, the sample mean 9. 5 here, with the hypothesised value 10 and asks whether the numbers are close, given sampling fluctuations. The variance of the MLE is 1/16. In more complicated situations this is called the Wald test after Abraham Wald who discussed its use in more general settings in the 1940s This are two other test principles used in econometrics. (It turns out that in the present case all methods produce the same procedure. In more complicated cases the 3 methods lead to different procedures.) The other methods are best understood using a diagram of ln L ;y the log likelihood function for lnl -0 8 9 10 11 1 µ -30-40 -50-60 log likelihood 1. This method compares the value of the log likelihood at 9. 5 (its maximum value) with its value at 10; if the difference is is close to zero (taking into account sampling fluctations) this is evidence in favour of H0. The difference in the log likelihood values is the log of the ratio of the likelihoods and this is called the likelihood ratio test. It was introduced in the late 190s by Neyman and Egon Pearson The next method considers the first derivative of the log likelihood at the hypothesised value 10 and asks if it is close to zero given sampling fluctations. If it is, the evidence favours

H0. This procedure has two names the score test because the slope of the log likelihood is called the score and the Lagrange multiplier test because we could look at the multiplier associated with the constraint 10 and reject the hypothesis that 10 if the multiplier is big. The multiplier turns out to be the same as the slope! The score comes from C. R. Rao in the 40s and the LM from S. D. Silvey in the 50s. After some algebra all of these principles lead to the procedure given above. In more complex situations the procedures lead to different test statistics. The trinity in practice The descriptions of the 3 methods give only the root idea of each method. Econometric models rarely have only one unknown parameter and so the methods, as implemented, are more complex. Also the estimators concerned are not usually exactly normal but only normal in large samples. Important extensions is a vector. is a vector but we are interested in a function h of, say H0 : h 0 versus HA : h 0. In the probit model with 1 explanatory variable 0, 1 we may want to test whether 1 0. In the regression model 0, 1,, we might be interested in whether 0 1, i.e. in testing H0 : 0 1 0. We describe in general terms how the 3 test principles work when there is a model with parameter vector and we wish to test

H0 : h 0 versus HA : h 0. We call h 0 the restriction. The hypothesis H0 says that the restriction is satisfied. There may be more than one restriction being tested, e.g. in regression or probit with explanatory variables we might test 0, 3 0. The Wald test takes the unrestricted maximum likelihood estimate of and asks whether h is close to 0 subject to the usual qualification about sampling variability. The likelihood ratio test takes the unrestricted maximum likelihood estimate and evaluates the likelihood at this value, L. It also takes R the restricted maximum likelihood estimate and evaluates the likelihood at this value, L R. The likelihood ratio is L R /L. The test statistic EViews calculates as the likelihood ratio test is a transform of this, ln L R /L ln L R ln L. The Lagrange multiplier test uses R the restricted maximum likelihood estimate and evaluates the score at this point. We ask whether the score is close to 0, with the usual qualification. There is a large sample for these tests linked to the large sample theory of maximum likelihood estimators. In large samples the estimators are approximately normally distributed and the test statistics are approximately chi-squared. Notes on the relationship between the standard normal and : if Z N 0, 1 then Z is a chi-squared random variable with 1 degree of freedom, written 1. if Z1 and Z are independent N 0, 1 then Z 1 Z is a chi-squared random variable with degree of freedom, written. if Z1,..., Zk are independent N 0, 1 then Z 1... Zk is a chi-squared random variable with k degree of freedom, written k. The probability densities for different degrees of freedom look so

If the hypothesis has components, e.g. 0, 3 0, the test statistic (W, LR or LM) is a chi-squared random variable with degree of freedom. The EViews output above contains the values Log likelihood, ln L 1. 81880 Restricted log likelihood, ln L R 0. 59173 LR statistic (3 df), ln L R ln L 15. 54585 Probability(LR stat) 0. 001405 The prob value is based on the upper tail of the 3 distribution: the null hypothesis is 1 3 0, i.e. the hypothesis that the explanatory variables do not influence the probablility of an improvement. ML and Model Selection Criteria One way to choose between different specifications (e.g. between the probit and logit models) is to use a model selection criterion. Several appear in EViews. All are associated with maximum likelihood. The best known is the Akaike information criterion (AIC). The Akaike information criterion (AIC) is an adjusted log likelihood value adjusted for the number of parameters in the model. It is often written AIC n ln L p where n is the number of observations and p is the number of estimated parameters, the number of elements in.

For a given data set the AIC is calculated for all the models under consideration and the model with the smallest AIC chosen. (n is the same across models.) In the probit/logit models of modal split there are parameters 0, 1 and so p. Here with an equal number of parameters the choice between the models would depend on the value of the maximised log likelihood ln L. There are other criteria based on different penalties for the number of parameters. Instead of the fixed p penalty of the AIC, the penalty may also depend on the sample size, n. The Schwarz criterion has 1 p ln n instead of p. The Hannan-Quinn criterion has p ln ln n. These criteria were introduced because they have a consistency property. It is reasonable to require that as the number of observations tends to infinity the probability of choosing the right model should tend to unity. The AIC (and the R and R from regression) do not have this consistency property. Schwarz and Hannan-Quinn have this property.

Count Data Poisson Regression PE ch 16.6 and EViews 658ff. The probit and logit models are produced by allowing the probability in the Bernoulli model to vary across individuals as x varies. In the same way regressors can be hung on the parameters of the other distributions of first year Statistics the Poisson and exponential. Poisson or count data models focus on the number of occurrences of an event. Here the outcome variable is y 0, 1,, 3,... Examples include: The number of visits to a doctor a person makes during a year. The number of children in a household. The number of televisions in a household. We have a sample of persons, or households which differ in ways (age, health, income,...) that affect the probability distribution of the number of visits, children or TVs The probability distribution used is the Poisson. From the first year if Y is a Poisson random variable, then its probability function is given by P Y y p y y e : y 0, 1,,... y! This probability function has one parameter,, which is the mean (and variance) of Y. Suppose that units in our sample vary in accordance with the value of some variable x. The simplest specification of a relationship between i and xi would be i 0 1xi. However the right hand side might be negative, which is unacceptable. A simple way of guaranteeing that it is positive is to specify i e 0 1xi. This is the Poisson regression model. The parameters could be estimated by maximum likelihood, i.e. maximise L 1, ;y i yie i where i e yi! 0 1xi.

Another 1st year distribution is the exponential distribution with density f y e y, for y 0 and 0. The expected value is 1. y 5 4 3 1 0 0.0 0.5 1.0 1.5.0 x expos: large, small This is the simplest model used for durations or waiting times. Suppose we have a cross-section of individuals who have been unemployed and there is data on their personal characteristics (age, qualifications, etc) and how long they were unemployed. Their experience could be modelled using the exponential with the parameter varying across individuals according to (Again i 0. i e 0 1xi. A lot of empirical labour economics is based on this model and extensions of it. EViews does not cover these duration models and we won t consider any empirical examples.