Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses (e.g. LR statistic) and to choosing between specifications (e.g. Akaike information criterion). Here is some probit output from the manual p. 66: the model describes how 3 explanatory variables influence the improvement in student s grades. Similar statistics are produced when other models are fitted. What do they mean? To fix ideas take an elementary testing set-up: 16 independent observations y1,..., y16 from a N, 1 population and we wish to test the null hypothesis H0 against the alternative hypothesis HA where H0 : 10 HA : 10. We know that the population variance is 1. The usual procedure is to base the test on the sample mean Y. The distribution of Y is known to be 1 N, 16. (The 1 16 Y 10 If H0 is true, is N 0, 1. 1/4 comes from vary n. At significance level 5% we reject H0 if Y 10 1/4 z0.05 1. 96. Suppose y 9. 5, then y 10 1. 96 1/4 and we reject H0 at the 5% level. Alternatively we can compute the probability value corresponding to, viz. 0. 0456. Most packages including EViews report the outcomes of significance tests in terms of p-values
What is going on? 1. We compare the MLE of, the sample mean 9. 5 here, with the hypothesised value 10 and asks whether the numbers are close, given sampling fluctuations. The variance of the MLE is 1/16. In more complicated situations this is called the Wald test after Abraham Wald who discussed its use in more general settings in the 1940s This are two other test principles used in econometrics. (It turns out that in the present case all methods produce the same procedure. In more complicated cases the 3 methods lead to different procedures.) The other methods are best understood using a diagram of ln L ;y the log likelihood function for lnl -0 8 9 10 11 1 µ -30-40 -50-60 log likelihood 1. This method compares the value of the log likelihood at 9. 5 (its maximum value) with its value at 10; if the difference is is close to zero (taking into account sampling fluctations) this is evidence in favour of H0. The difference in the log likelihood values is the log of the ratio of the likelihoods and this is called the likelihood ratio test. It was introduced in the late 190s by Neyman and Egon Pearson The next method considers the first derivative of the log likelihood at the hypothesised value 10 and asks if it is close to zero given sampling fluctations. If it is, the evidence favours
H0. This procedure has two names the score test because the slope of the log likelihood is called the score and the Lagrange multiplier test because we could look at the multiplier associated with the constraint 10 and reject the hypothesis that 10 if the multiplier is big. The multiplier turns out to be the same as the slope! The score comes from C. R. Rao in the 40s and the LM from S. D. Silvey in the 50s. After some algebra all of these principles lead to the procedure given above. In more complex situations the procedures lead to different test statistics. The trinity in practice The descriptions of the 3 methods give only the root idea of each method. Econometric models rarely have only one unknown parameter and so the methods, as implemented, are more complex. Also the estimators concerned are not usually exactly normal but only normal in large samples. Important extensions is a vector. is a vector but we are interested in a function h of, say H0 : h 0 versus HA : h 0. In the probit model with 1 explanatory variable 0, 1 we may want to test whether 1 0. In the regression model 0, 1,, we might be interested in whether 0 1, i.e. in testing H0 : 0 1 0. We describe in general terms how the 3 test principles work when there is a model with parameter vector and we wish to test
H0 : h 0 versus HA : h 0. We call h 0 the restriction. The hypothesis H0 says that the restriction is satisfied. There may be more than one restriction being tested, e.g. in regression or probit with explanatory variables we might test 0, 3 0. The Wald test takes the unrestricted maximum likelihood estimate of and asks whether h is close to 0 subject to the usual qualification about sampling variability. The likelihood ratio test takes the unrestricted maximum likelihood estimate and evaluates the likelihood at this value, L. It also takes R the restricted maximum likelihood estimate and evaluates the likelihood at this value, L R. The likelihood ratio is L R /L. The test statistic EViews calculates as the likelihood ratio test is a transform of this, ln L R /L ln L R ln L. The Lagrange multiplier test uses R the restricted maximum likelihood estimate and evaluates the score at this point. We ask whether the score is close to 0, with the usual qualification. There is a large sample for these tests linked to the large sample theory of maximum likelihood estimators. In large samples the estimators are approximately normally distributed and the test statistics are approximately chi-squared. Notes on the relationship between the standard normal and : if Z N 0, 1 then Z is a chi-squared random variable with 1 degree of freedom, written 1. if Z1 and Z are independent N 0, 1 then Z 1 Z is a chi-squared random variable with degree of freedom, written. if Z1,..., Zk are independent N 0, 1 then Z 1... Zk is a chi-squared random variable with k degree of freedom, written k. The probability densities for different degrees of freedom look so
If the hypothesis has components, e.g. 0, 3 0, the test statistic (W, LR or LM) is a chi-squared random variable with degree of freedom. The EViews output above contains the values Log likelihood, ln L 1. 81880 Restricted log likelihood, ln L R 0. 59173 LR statistic (3 df), ln L R ln L 15. 54585 Probability(LR stat) 0. 001405 The prob value is based on the upper tail of the 3 distribution: the null hypothesis is 1 3 0, i.e. the hypothesis that the explanatory variables do not influence the probablility of an improvement. ML and Model Selection Criteria One way to choose between different specifications (e.g. between the probit and logit models) is to use a model selection criterion. Several appear in EViews. All are associated with maximum likelihood. The best known is the Akaike information criterion (AIC). The Akaike information criterion (AIC) is an adjusted log likelihood value adjusted for the number of parameters in the model. It is often written AIC n ln L p where n is the number of observations and p is the number of estimated parameters, the number of elements in.
For a given data set the AIC is calculated for all the models under consideration and the model with the smallest AIC chosen. (n is the same across models.) In the probit/logit models of modal split there are parameters 0, 1 and so p. Here with an equal number of parameters the choice between the models would depend on the value of the maximised log likelihood ln L. There are other criteria based on different penalties for the number of parameters. Instead of the fixed p penalty of the AIC, the penalty may also depend on the sample size, n. The Schwarz criterion has 1 p ln n instead of p. The Hannan-Quinn criterion has p ln ln n. These criteria were introduced because they have a consistency property. It is reasonable to require that as the number of observations tends to infinity the probability of choosing the right model should tend to unity. The AIC (and the R and R from regression) do not have this consistency property. Schwarz and Hannan-Quinn have this property.
Count Data Poisson Regression PE ch 16.6 and EViews 658ff. The probit and logit models are produced by allowing the probability in the Bernoulli model to vary across individuals as x varies. In the same way regressors can be hung on the parameters of the other distributions of first year Statistics the Poisson and exponential. Poisson or count data models focus on the number of occurrences of an event. Here the outcome variable is y 0, 1,, 3,... Examples include: The number of visits to a doctor a person makes during a year. The number of children in a household. The number of televisions in a household. We have a sample of persons, or households which differ in ways (age, health, income,...) that affect the probability distribution of the number of visits, children or TVs The probability distribution used is the Poisson. From the first year if Y is a Poisson random variable, then its probability function is given by P Y y p y y e : y 0, 1,,... y! This probability function has one parameter,, which is the mean (and variance) of Y. Suppose that units in our sample vary in accordance with the value of some variable x. The simplest specification of a relationship between i and xi would be i 0 1xi. However the right hand side might be negative, which is unacceptable. A simple way of guaranteeing that it is positive is to specify i e 0 1xi. This is the Poisson regression model. The parameters could be estimated by maximum likelihood, i.e. maximise L 1, ;y i yie i where i e yi! 0 1xi.
Another 1st year distribution is the exponential distribution with density f y e y, for y 0 and 0. The expected value is 1. y 5 4 3 1 0 0.0 0.5 1.0 1.5.0 x expos: large, small This is the simplest model used for durations or waiting times. Suppose we have a cross-section of individuals who have been unemployed and there is data on their personal characteristics (age, qualifications, etc) and how long they were unemployed. Their experience could be modelled using the exponential with the parameter varying across individuals according to (Again i 0. i e 0 1xi. A lot of empirical labour economics is based on this model and extensions of it. EViews does not cover these duration models and we won t consider any empirical examples.