The Logit Model: Estimation, Testing and Interpretation

Similar documents
Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Inferences About Two Proportions

Introductory Econometrics. Review of statistics (Part II: Inference)

High-Throughput Sequencing Course

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

[y i α βx i ] 2 (2) Q = i=1

Lecture 10: Introduction to Logistic Regression

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Introduction to Machine Learning. Lecture 2

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Single-level Models for Binary Responses

ECON 594: Lecture #6

Binary Dependent Variables

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Introduction to General and Generalized Linear Models

Chapter 10. Hypothesis Testing (I)

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

Statistical Analysis of List Experiments

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Some Basic Concepts of Probability and Information Theory: Pt. 2

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Exercises Chapter 4 Statistical Hypothesis Testing

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

ECON 4160, Autumn term Lecture 1

Maximum-Likelihood Estimation: Basic Ideas

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

Chapter 8 Heteroskedasticity

Solutions for Examination Categorical Data Analysis, March 21, 2013

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Chapter 11. Regression with a Binary Dependent Variable

9 Generalized Linear Models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Probability & Statistics - FALL 2008 FINAL EXAM

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Fundamental Probability and Statistics

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

HT Introduction. P(X i = x i ) = e λ λ x i

POLI 7050 Spring 2008 March 5, 2008 Unordered Response Models II

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Lecture 2: Review of Probability

Ling 289 Contingency Table Statistics

POLI 443 Applied Political Research

Testing and Model Selection

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Maximum Likelihood (ML) Estimation

Binary Models with Endogenous Explanatory Variables

This paper is not to be removed from the Examination Halls

Statistical Distribution Assumptions of General Linear Models

Generalized logit models for nominal multinomial responses. Local odds ratios

Unit 9: Inferences for Proportions and Count Data

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

STAT 7030: Categorical Data Analysis

8 Nominal and Ordinal Logistic Regression

Binary Logistic Regression

Applied Econometrics (QEM)

Psychology 282 Lecture #4 Outline Inferences in SLR

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Inference and Regression

Unit 9: Inferences for Proportions and Count Data

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Math 180B Problem Set 3

Example. χ 2 = Continued on the next page. All cells

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

A discussion on multiple regression models

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

BMI 541/699 Lecture 22

Inference for Proportions, Variance and Standard Deviation

Data Mining 2018 Logistic Regression Text Classification

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Confidence Intervals, Testing and ANOVA Summary

STA 2101/442 Assignment 2 1

Sampling, Confidence Interval and Hypothesis Testing

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Naïve Bayes classification

14.30 Introduction to Statistical Methods in Economics Spring 2009

Lecture 12: Effect modification, and confounding in logistic regression

Logistic Regressions. Stat 430

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Föreläsning /31

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Math 494: Mathematical Statistics

Modeling Binary Outcomes: Logit and Probit Models

Count data page 1. Count data. 1. Estimating, testing proportions

Binary choice 3.3 Maximum likelihood estimation

Transcription:

The Logit Model: Estimation, Testing and Interpretation Herman J. Bierens October 25, 2008 1 Introduction to maximum likelihood estimation 1.1 The likelihood function Consider a random sample Y 1,..., Y n from the Bernoulli distribution: Pr[Y j 1] p 0 Pr[Y j 0] 1 p 0, where p 0 is unknown. For example, toss n times a coin for which you suspect that it is unfair: p 0 60.5, and for each tossing j assign Y j 1if the outcome is heads and Y j 0if the outcome is tails. The question is how to estimate p 0 and how to test the null hypothesis that the coin is fair: p 0 0.5. The probability function involved can be written as f(y p 0 ) Pr[Y j y] p y 0 (1 p 0 ) 1 y ( p0 if y 1, 1 p 0 if y 0. Next, let y 1,..., y n be a given sequence of zeros and ones. Thus, each y j is either 0 or 1. The joint probability function of the random sample Y 1,Y 2,..., Y n is defined as f n (y 1,..., y n p 0 )Pr[Y 1 y 1 and Y 2 y 2... and Y n y n ]. 1

Because the random variables Y 1,Y 2,..., Y n are independent, we can write hence Pr[Y 1 y 1 and Y 2 y 2... and Y n y n ] Pr[Y 1 y 1 ] Pr[Y 2 y 2 ]... Pr[Y n y n ] f(y 1 p 0 ) f(y 2 p 0 )... f(y n p 0 ) ny f(y j p 0 ), f n (y 1,..., y n p 0 ) ny ny p y j 0 (1 p 0 ) 1 y j p y j 0 P n y j ny (1 p 0 ) 1 y j p0 (1 p 0 ) n P n y j. Replacing the given non-random sequence y 1,..., y n by the random sample Y 1,Y 2,..., Y n and the unknown probability p 0 by a variable p in the interval (0, 1) yields the likelihood function P n L n (p) f n (Y 1,..., Y n p) p Y j (1 p) n P n Y j For the case p p 0 the likelihood function can be interpreted as the joint probability that we draw a particular sample Y 1,..., Y n. 1.2 Maximum likelihood estimation The idea of maximum likelihood (ML) estimation is now to choose p such that L n (p) is maximal. In other words, choose p such that the probability of drawing this particular sample Y 1,..., Y n is maximal. Note that maximizing L n (p) is equivalent to maximizing ln (L n (p)), i.e., ln (L n (p)) Y j ln(p)+ n Y j ln(1 p) n ³ Y ln(p)+(1 Y )ln(1 p), 2

where Y 1 n Y j isthesamplemean. Therefore,theMLestimator bp in this case can be obtained from the first-order condition for a maximum of ln (L n (p)) in p bp: 0 d ln (L à n(bp)) n Y d ln(bp)! d ln(1 bp) +(1 Y ) dbp dbp dbp à n Y d ln(bp)! d ln(1 bp) d(1 bp) +(1 Y ) dbp d(1 bp) dbp à n Y 1! bp +(1 Y ) 1 1 bp ( 1) à Y n bp 1 Y! n Y (1 bp) bp ³ 1 Y 1 bp bp (1 bp) Ã! Y bp n bp (1 bp) wherewehaveusedthefactthatd ln(x)/dx 1/x. Thus, in this case the ML estimator bp of p 0 isthesamplemean: bp Y. Note that this is an unbiased estimator: E (bp) 1 n P n E (Y j )p 0. 1.3 Large sample statistical inference It can be shown (but this requires advanced probability theory) that if the sample size n is large then n (bp p 0 ) is approximately normally distributed, i.e., n (bp p0 ) 1 X n (Y j p 0 ) N[0, σ 2 n 0 ], where σ 2 0 var(y j )E h (Y j p 0 ) 2i (1 p 0 ) 2 p 0 +( p 0 ) 2 (1 p 0 ) p 0 (1 p 0 ). 3

Thus, for large sample size n, n (bp p0 ) N[0, 1]. (1) qp 0 (1 p 0 ) This result can be used to test hypotheses about p 0. In particular, under the null hypothesis that the coin is fair, p 0 0.5, wehave 2 n (bp 0.5) n (bp 0.5) N[0, 1], 0.5 0.5 Therefore, 2 n (bp 0.5) can be used as the test statistic of the standard normal test of the null hypothesis p 0 1/2, as follows. Recall that for a standard normal random variable U, Pr [ U > 1.96] 0.05. Thus, under the null hypothesis p 0 1/2 one would expect that Pr h 2 i n (bp 0.5) > 1.96 0.05 Pr h 2 i n (bp 0.5) 1.96 0.95 If 2 n (bp 0.5) > 1.96 then we reject the null hypothesis p 0 1/2at the 5% significance level, because this is not what one would expect if the null hypothesis is true, and if 2 n (bp 0.5) 1.96 then we accept this null hypothesis, as this result is then in accordance with the null hypothesis p 0 1/2. The result (1) can also be used to endow the unknown probability p 0 with aconfidence interval, for example the 95% confidence interval, as follows. The result (1) implies Pr n (bp p0 ) q p0 (1 p 0 ) 1.96 0.95, which, after some straightforward calculations, can be shown to be equivalent to Pr h i p n p 0 p n 0.95 where q p n n.bp +(1.96)2 /2 1.96 n.bp (1 bp)+(1.96) 2 /4 n +(1.96) 2 q p n n.bp +(1.96)2 /2+1.96 n.bp (1 bp)+(1.96) 2 /4 n +(1.96) 2 4

The interval h p n, p n i is now the 95% confidence interval for p0. 1.4 An application election polls Consider a presidential election with two candidates, candidate A and candidate B, and let p 0 be the fraction of likely voters who favor candidate A, just before the election is held. To predict the outcome of the election, a polling agency draws a random sample of size n 3000, for example, from the population of likely voters. 1 Suppose that 1800 of the respondents express a preference for candidate A. Thus, the fraction of respondents favoring candidate A is bp 0.6. Substituting n 3000 and bp 0.6 in the formulas for p n and p n yields p n 0.58, p n 0.62 Thus, the 95% confidence interval of 100 p 0 is [58, 62]. The polling results are therefore stated as: 60% of the likely voters will vote for candidate A, with a margin of error of ±2 points. 2 Motivation for maximum likelihood estimation A more formal motivation for ML estimation is based on the fact that for 0 <x<1 and x>1, ln(x) <x 1. This is illustrated in the following picture: 1 How to draw such a sample is beyond the scope of this lecture note. 5

ln(x) x 1. The inequality ln(x) <x 1 is strict for x 6 1, and ln(1) 0. Consequently, taking x f(y j p)/f(y j p 0 ), we have the inequality Ã! f(yj p) ln f(y j p) f(y j p 0 ) f(y j p 0 ) 1. Taking expectations, it follows that " Ã!# " # f(yj p) f(yj p) E ln E 1 f(y j p 0 ) f(y j p 0 ) f(1 p) f(1 p 0 ) Pr[Y j 1]+ f(0 p) f(0 p 0 ) Pr[Y j 0] 1 hence p p 0 + 1 p (1 p 0 ) 1 p 0 1 p 0 p +1 p 10, (2) " E [ln (f(y j p))] E [ln (f(y j p 0 ))] E ln Ã!# f(yj p) 0, f(y j p 0 ) and therefore, E [ln (L n (p))] E [ln (L n (p 0 ))]. (3) Thus, E [ln (L n (p))] is maximal for p p 0, anditcanbeshownthatthis maximum is unique. 6

3 Maximum likelihood estimation of the Logit model 3.1 The Logit model with one explanatory variable Next, let (Y 1,X 1 ),..., (Y n,x n ) be a random sample from the conditional Logit distribution: Pr[Y j 1 X j ] 1 1+exp( α 0 β 0 X j ), (4) Pr[Y j 0 X j ] 1 Pr[Y j 1 X j ] exp ( α 0 β 0 X j ) 1+exp( α 0 β 0 X j ) where the X j s are the explanatory variables and α 0 and β 0 are unknown parameters to be estimated. This model is called a Logit model, because Pr[Y j 1 X j ]F (α 0 + β 0 X j ) (5) where 1 F (x) 1+exp( x) is the distribution function of the logistic (Logit) distribution. The conditional probability function involved is (6) f(y X j, α 0, β 0 ) Pr[Y j y X j ] F (α 0 + β 0 X j ) y (1 F (α 0 + β 0 X j )) 1 y ( F (α0 + β 0 X j ) if y 1, 1 F (α 0 + β 0 X j ) if y 0. Now the conditional log-likelihood function is ln (L n (α, β)) ln (f(y j X j, α, β)) Y j ln (F (α + βx j )) + (1 Y j )(α + βx j ) 7 (1 Y j )ln(1 F (α + βx j )) ln (1 + exp ( α βx j )). (7)

Similar to (3) we have E [ln(l n (α, β)) X 1,..., X n ] E [ln(l n (α 0, β 0 )) X 1,..., X n ]. Again, this result motivates to estimate α 0 and β 0 by maximizing ln (L n (α, β)) to α and β: ln ³ L n (bα, β) b max ln (L n(α, β)). α,β However, there is no longer an explicit solution for bα and β. b These ML estimators have to be solved numerically. Your econometrics software will do that for you. 3.2 Pseudo t-values It can be shown that if the sample size n is large then n (bα α0 ) N(0, σ 2 α), n ³b β β 0 N(0, σ 2 β ). Given consistent estimators bσ 2 α and bσ 2 β of the unknown variances σ 2 α and σ 2 β, respectively (which are computed by your econometrics software), we then have ³ n (bα α0 ) n β b β 0 N(0, 1), N(0, 1). bσ α bσ β These results can be used to test whether the coefficients α 0 and β 0 are zero or not. In particular the null hypothesis β 0 0is of interest, because this hypothesis implies that the conditional probability Pr[Y j 1 X j ] does not depend on X j. Under the null hypothesis β 0 0we have bt β n b β bσ β N(0, 1). Recall that the 5% critical value of the two-sided standard normal test is 1.96. Thus, for example, the null hypothesis β 0 0is rejected at the 5% significance level in favor of the alternative hypothesis β 0 60if bt β > 1.96, andacceptedif bt β 1.96. The statistic t b β is called the pseudo t-value of β b because it is used in the same way as the t-value in linear regression, and bσ β is called the standard error of β. b Your econometric software will report the ML estimators together with their corresponding pseudo t-values and/or standard errors. 8

3.3 The general Logit model The general Logit model takes the form Pr[Y j 1 X 1j,...X k,j ] 1 1+exp( β1x 0 1j... βk 0X kj) 1 1+exp ³ P k i1 βi 0 X ij, (8) where one of the X ij equals 1 for the constant term, for example, let X kj 1, and the β 0 i s are the true parameter values. This model can be estimated by ML in the same way as before. Thus, the log-likelihood function is ln (L n (β 1,..., β k )) (1 Y j ) kx β i X ij i1 Ã Ã ln 1+exp!! kx β i X ij, (9) and the ML estimators b β 1,..., b β k are obtained by maximizing ln (L n (β 1,..., β k )): ln ³ L n ( b β 1,..., b β k ) max β 1,...,β k ln (L n (β 1,..., β k )). Again, it can be shown that if n is large then for i 1,..., k, ³ n β b i βi 0 N[0, σ 2 i ]. i1 Given consistent estimators bσ 2 i of the variances σ2 i, it follows then that ³ n β b i βi 0 bσ i N[0, 1] for i 1,..., k. Your econometrics software will report the ML estimators bβ i together with their corresponding pseudo t-values t b i nβ b i /bσ i and/or standard errors bσ i. 3.4 Testing joint significance Now suppose you want to test the joint null hypothesis H 0 : β1 0 0, β0 2 0,..., β0 m 0, (10) 9

where m<k. There are two ways to do that. One way is akin to the F test in linear regression: Re-estimate the Logit model under the null hypothesis: ln ³ L n (0,..,0, e β m+1,..., e β k ) max β m+1,...,β k ln (L n (0,.., 0, β m+1,..., β k )). and compare the log-likelihoods 2. Itcanbeshownthatunderthenullhypothesis (10) and for large samples, Ã Ln (0,.., 0, β e LR m 2ln m+1,..., β e! k ) L n ( β b 1,..., β b χ 2 m k ), where the degrees of freedom m corresponds to the number of restrictions imposed under the null hypothesis. This is the so-called likelihood ratio test, which is conducted right-sided. For example, choose the 5% significance level, look up in the table of the χ 2 distribution the critical value c such that for a χ 2 m distributed random variable Z m, Pr[Z m >c]0.05. Then the null hypothesis (10) is rejected at the 5% significance level if LR m >cand accepted if LR m c. An alternative test of the null hypothesis (10) is the Wald test, which is conducted in the same way as for linear regression models. 3 Under the null hypothesis (10) the Wald test statistic has also a χ 2 m distribution. 4 Interpretation of the coefficients of the Logit model 4.1 Marginal effects Consider the Logit model (5). If β 0 > 0 then Pr[Y j 1 X j ]F (α 0 + β 0 X j ) is an increasing function of X j : where F 0 is the derivative of (6): dp [Y j 1 X j ] dx j β 0.F 0 (α 0 + β 0 X j ), 2 Your econometric software will report the log-likelihood function value. 3 In EasyReg International the Wald test can be conducted simply by point-and-click. 10

F 0 (x) exp( x) (1 + exp( x)) 2 1+exp( x) (1 + exp( x)) 2 1 (1 + exp( x)) 2 1 1+exp( x) 1 2 F (x) F (x)2 (1 + exp( x)) F (x)(1 F (x)). Therefore, the marginal effect of X j on Pr[Y j 1 X j ] depends on X j : dp [Y j 1 X j ] β 0.F (α 0 + β 0 X j )(1 F (α 0 + β 0 X j )), dx j which renders the interpretation of β 0 difficult. However, the coefficient β 0 can be interpreted in terms of relative changes in odds. 4.2 Odds and odds ratios The odds is the ratio of the probability that something is true divided by the probability that it is not true. Thus, in the Logit case (4), Odds (X) Pr[Y j 1 X j ] Pr[Y j 0 X j ] F (α 0 + β 0 X j ) 1 F (α 0 + β 0 X j ) exp(α 0 + β 0 X j ). (11) The odds ratio is the ratio of two odds for different values of X j, say X j x and X j x + x: Odds (x + x) Odds (x) exp(α + βx + β x) exp(α + βx) exp(β x), where x is a small change in x. Then Ã! 1 Odds (x + x) Odds (x) exp(β 0 x) 1 lim lim x 0 x Odds (x) x 0 x exp(β 0 x) 1 β 0 lim β 0 d exp(u) β 0 exp(0) β 0. β 0 x 0 β 0 x du u0 Thus, β 0 may be interpreted as the relative change in the odds due to a small change x in X j : Odds (x + x) Odds (x) Odds (x) 11 Odds (x + x) Odds (x) 1 β 0 x (12)

If X j is a binary variable itself, X j 0or X j 1, then the only reasonable choices for x + x and x are 1 and 0, respectively, so that then Odds (1) (1) Odds (0) 1Odds Odds (0) Odds (0) exp(β 0 ) 1. Only if β 0 is small we may then use the approximation exp(β 0 ) 1 β 0. If not, one has to interpret β 0 in terms of the log of the odds ratio involved: Ã! Odds (1) ln β 0. Odds (0) The interpretation of the coefficients β 0 i,i 1,..., k 1 in the general Logit model (8) is similar as in the case (12): Odds (X 1j,..., X i 1,j,X i,j + X i,j,x i+1,j,..., X k,j ) Odds (X 1j,..., X i 1,j,X i,j,x i+1,j,..., X k,j ) 1 β 0 i X i,j if X i,j is small. For example, β 0 i may be interpreted as the percentage change in Odds(X 1j,.., X k,j ) due to a small percentage change 100 X i,j 1 in X i,j. 12