GOV 2001/ 1002/ E-2001 Section 8 Ordered Probit and Zero-Inflated Logit

Size: px

Start display at page:

Download "GOV 2001/ 1002/ E-2001 Section 8 Ordered Probit and Zero-Inflated Logit"

Herbert Carr
5 years ago
Views:

1 GOV 2001/ 1002/ E-2001 Section 8 Ordered Probit and Zero-Inflated Logit Solé Prillaman Harvard University March 26, / 51

2 LOGISTICS Reading Assignment- Becker and Kennedy (1992), Harris and Zhao (2007) (sections 1 and 2), and UPM ch Re-replication- Due by 6pm Wednesday, April 2 on Canvas. Class Party- Tentatively April 19 at Gary s house. 2 / 51

3 RE-REPLICATION Re-replication- Due April 2 at 6pm. You will receive all of the replication files from another team. It is your responsibility to hand-off your replication files. (Re-replication teams are posted on Canvas.) Go through the replication code and try to improve it in any way you can. Provide a short write-up of thoughts on the replication and ideas for their final paper. Aim to be helpful, not critical! 3 / 51

4 OUTLINE The Ordered Probit Model Zero-Inflated Logistic Regression Binomial Model 4 / 51

5 ORDERED CATEGORICAL VARIABLES Suppose our dependent variable is an ordered scale. For example: Customers tell you how much they like your product on a 5-point scale from a lot to very little. Voters identify their ideology on a 7-point scale: very liberal, moderately liberal, somewhat liberal, neutral, somewhat conservative, moderately conservative, and very conservative. Foreign businesses rate their host country from not corrupt to very corrupt. What are the problems with using a linear model to study these processes? 5 / 51

6 ORDERED PROBIT: THE INTUITION How can we derive the ordered probit? Suppose there is a latent (unobserved) data distribution, Y f stn (y µ i ). This latent distribution has a systematic component, µ i = x i β. Any realizations, y i, are completely unobserved. What you do observe is whether y i is between some threshold parameters. 6 / 51

7 ORDERED PROBIT: THE INTUITION Threshold parameters τ j for j = 1,..., m Although y i is unobserved, we do observe which of the categories it falls into / 51

8 ORDERED PROBIT: DERIVING THE LIKELIHOOD In equation form, y ij = { 1 if τj i < y i τ j 0 otherwise Our stochastic component is still Bernoulli: where M j=1 π ij = 1 What does Y look like? Pr(Y ij π) = π y i1 i1 πy i2 i2 πy i3 i / 51

9 ORDERED PROBIT: DERIVING THE LIKELIHOOD Like the regular probit and logit, the key here is deriving π ij. You use this to derive the probability that y i will fall into category j: π ij = Pr(Y ij = 1) = Pr(τ j 1 < y i < τ j ) = = τj τ j 1 f (y i µ i)dy i τj τ j 1 f (y i x iβ)dy i = F(τ j x i β) F(τ j 1 x i β) = Φ(τ j x i β) Φ(τ j 1 x i β) where F is the cumulative density of Y i and Φ is the CDF of the standardized normal. 9 / 51

10 ORDERED PROBIT The latent model: 1. Y i f stn (y i µ i). 2. µ i = X i β 3. Y i and Y j are independent for all i j. The observed model: 1. Y ij f bern (y ij π ij ). 2. π ij = Φ(τ j X i β) Φ(τ j 1 X i β) Note: for the ordered logit Y i is distributed logistic and π ij = eτ j X i β 1+e τ j X i β eτj 1 Xiβ 1+e τ j 1 X i β 10 / 51

11 ORDERED PROBIT: DERIVING THE LIKELIHOOD We want to generalize to all observations and all categories L(τ, β y) = = = = n Pr(Y ij π) i=1 n π y i1 i1 πy i2 i2 πy i3 i3... n m [π ij ] y ij j=1 n m [ ] yij Φ(τ j X i β) Φ(τ j 1 X i β) i=1 i=1 i=1 j=1 11 / 51

12 ORDERED PROBIT: DERIVING THE LIKELIHOOD Then we take the log to get the log-likelihood ( n m [ ] yij ) l(τ, β y) = ln Φ(τ j X i β) Φ(τ j 1 X i β) l(τ, β y) = n i=1 i=1 j=1 m y ij ln[φ(τ j X i β) Φ(τ j 1 X i β)] j=1 How many parameters are there to estimate in this model? j+k 12 / 51

13 WHY DOES X NOT CONTAIN AN INTERCEPT? In the binary probit model, we have one cutoff point, say τ 1 Pr(Y = 1 Xβ) = 1 Pr(Y = 0 Xβ) = 1 Φ(τ 1 Xβ) Here, τ 1 is both the cutoff point and the intercept. By including an intercept in Xβ we are setting τ 1 to zero. 13 / 51

14 WHY DOES X NOT CONTAIN AN INTERCEPT? Now in the ordered probit model, we have more than one cutoff point: If we included an intercept, P(y i1 = 1) = Pr(Xβ τ 1 ) P(y i2 = 1) = Pr(τ 1 Xβ τ 2 ) P(y i1 = 1) = Pr(Xβ + A τ 1 ) P(y i2 = 1) = Pr(τ 1 Xβ + A τ 2 ) Or equivalently we could write this: P(y i1 = 1) = Pr(Xβ τ 1 A) P(y i2 = 1) = Pr(τ 1 A Xβ τ 2 A) By estimating a cutoff point, we are estimating an intercept. 14 / 51

15 A WORKING EXAMPLE: COOPERATION ON SANCTIONS Lisa Martin (1992) asks what determines cooperation on sanctions? Her dependent variable Coop measures cooperation on a four-point scale. 15 / 51

16 ORDERED PROBIT: COOPERATION ON SANCTIONS Load the data in R library(zelig) data(sanction) head(sanction) mil coop target import export cost num ncost major loss modest loss little effect little effect little effect little effect 16 / 51

17 ORDERED PROBIT: COOPERATION ON SANCTIONS We re going to look at the covariates: target which is a measure of the economic health and political stability of the target country cost which is a measure of the cost of the sanctions mil which is a measure of whether or not there is military action in addition to the sanction 17 / 51

18 ORDERED PROBIT: USING ZELIG We estimate the model using Zelig the oprobit call: z.out <- zelig(factor(coop) target + cost + mil, model="oprobit", data=sanction) Note that you could use model = "ologit" for the ordered logit and get similar inferences. 18 / 51

19 These are a little hard to interpret, so we turn to our bag of tricks / 51 ORDERED PROBIT: USING ZELIG What does the output look like? z.out Call: zelig(formula = factor(coop) 1 + target + cost + mil, model = "opr data = sanction) Coefficients: Value Std. Error t value target cost mil Intercepts: Value Std. Error t value

20 ORDERED PROBIT: USING ZELIG Suppose we want to compare cooperation when there is or is not military action in addition to the sanction. x.low <- setx(z.out, mil = 0) x.high <- setx(z.out, mil = 1) 20 / 51

21 ORDERED PROBIT: USING ZELIG Now we can simulate values using these hypothetical military involvements: s.out <- sim(z.out, x = x.low, x1 = x.high) summary(s.out) Model: oprobit Number of simulations: 1000 Values of X (Intercept) target cost mil Values of X1 (Intercept) target cost mil Expected Values: P(Y=j X) mean sd 2.5% 97.5% / 51

22 ORDERED PROBIT: USING ZELIG And then you can use the plot(s.out) command to visualize Predicted Values: Y X Y=4 Y=3 Y=2 Y= Percentage of Simulations Expected Values: P(Y=j X) Density First Differences: P(Y=j X1)-P(Y=j X) Density / 51

23 ORDERED PROBIT: WITHOUT ZELIG Make a matrix for the y s indicating what category it is in: y <- sanction$coop # Find all of the unique categories of y y0 <- sort(unique(y)) m <- length(y0) Z <- matrix(na, nrow(sanction), m) # Fill in our matrix with logical values if # the observed value is in each category # Remember R can treat logical values as 0/1s for (j in 1:m){Z[,j] <- y==y0[j]} X <- cbind(sanction$target, sanction$cost, sanction$mil) 23 / 51

24 ORDERED PROBIT: WITHOUT ZELIG Create the log-likelihood function ll.oprobit <- function(par, Z, X){ beta <- par[1:ncol(x)] tau <- par[(ncol(x)+1):length(par)] ystarmu <- X%*%beta m <- length(tau) + 1 probs =cprobs = matrix(nrow=length(ystarmu), ncol=m) for (j in 1:(m-1)){cprobs[,j] <- pnorm(tau[j]- ystarmu)} probs[,m] <- 1-cprobs[,m-1] probs[,1] <- cprobs[,1] for (j in 2:(m-1)){probs[,j] <- cprobs[,j] - cprobs[,(j-1)]} sum(log(probs[z])) } 24 / 51

25 ORDERED PROBIT: WITHOUT ZELIG Optimize par <- c(rep(1,3),0,1,2) optim(par, ll.oprobit, Z=Z, X=X, method="bfgs", control=list(fnscale=-1)) out$par [1] / 51

26 OUTLINE The Ordered Probit Model Zero-Inflated Logistic Regression Binomial Model 26 / 51

27 WHAT IS ZERO-INFLATION? Let s return to binary data. What if we knew that something in our data was mismeasured? For example, what if we thought that some of our data were sytematically zero rather than randomly zero? This could be when: 1. Some data are spoiled or lost 2. Survey respondents put zero to an ordered answer on a survey just to get it done. If our data are mismeasured in some systematic way, our estimates will be off. 27 / 51

28 A WORKING EXAMPLE: FISHING You re trying to figure out the probability of catching a fish in a lake from a survey. People were asked: How many children were in the group How many people were in the group Whether they caught a fish. 28 / 51

29 A WORKING EXAMPLE: FISHING The problem is, some people didn t even fish! These people have systematically zero fish. 29 / 51

30 ZERO-INFLATED LOGIT MODEL We re going to assume that whether or not the person fished is the outcome of a Bernoulli trial. { 0 with probability ψi Y i = Logistic with probability 1 ψ i ψ i is the probability that you do not fish. This is a mixture model because our data is a mix of these two types of groups each with their own data generation process. 30 / 51

31 ZERO-INFLATED LOGIT MODEL Given that you fished, the logistical model is what we have done before: 1. Y i f bern (y i π i ). 2. π i = 1 1+e X i β 3. Y i and Y j are independent for all i j. So the probability that Y is 0: P(Y i = 0 fished) = 1 and the probability that Y is 1: P(Y i = 1 fished) = e X iβ e X iβ 31 / 51

32 ZERO-INFLATED LOGIT MODEL Given that you did not fish, what is the model? So the probability that Y is 0: P(Y i = 0 not fished) = 1 and the probability that Y is 1: P(Y i = 1 not fished) = 0 32 / 51

33 ZERO-INFLATED LOGIT MODEL We can write out the distribution of Y i as (stochastic component): ( ) ψ i + (1 ψ i ) 1 1 if y 1+e P(Y i = y i β, ψ i ) Xβ i = 0 ( (1 ψ i ) if y i = 1 ) 1 1+e Xβ And we can put covariates on ψ (systematic component): ψ = e z iγ 33 / 51

34 ZERO-INFLATED LOGIT: DERIVING THE LIKELIHOOD The likelihood function is proportional to the probability of Y i : L(β, γ Y i ) P(Y i β, γ) [ ( = ψ i + (1 ψ i ) 1 = [ ( (1 ψ i ) e X iβ )] Yi e X iβ [ ( e z iγ e z iγ [( ) ( e z iγ e X iβ )] 1 Yi ) ( 1 )] Yi )] 1 1 Yi 1 + e X iβ 34 / 51

35 ZERO-INFLATED LOGIT: DERIVING THE LIKELIHOOD Multiplying over all observations we get: L(β, γ Y) = n [ i= e z iγ + [( e z iγ ( 1 ) ( ) ( e z 1 iγ e X iβ )] Yi )] 1 Yi e X iβ 35 / 51

36 ZERO-INFLATED LOGIT: DERIVING THE LIKELIHOOD Taking the log we get: l(β, γ) = = n i=1 { [ ( Y i ln (1 ψ) e X iβ ( 1 )] + 1 (1 Y i ) ln[ψ + (1 ψ) 1 + e X iβ n { [( ) ( 1 1 Y i ln e z iγ 1 + e X iβ i=1 [ ( 1 (1 Y i ) ln 1 + e z iγ e z iγ How many parameters do we need to estimate? ) } ] )] + ) ( 1 )]} e X iβ 36 / 51

37 LET S PROGRAM THIS IN R Load and get the data ready: fish <- read.table(" sep=",", header=t) X <- fish[c("child", "persons")] Z <- fish[c("persons")] X <- as.matrix(cbind(1,x)) Z <- as.matrix(cbind(1,z)) y <- ifelse(fish$count>0,1,0) 37 / 51

38 LET S PROGRAM THIS IN R Write out the Log-likelihood function ll.zilogit <- function(par, X, Z, y){ beta <- par[1:ncol(x)] gamma <- par[(ncol(x)+1):length(par)] phi <- 1/(1+exp(-Z%*%gamma)) pie <- 1/(1+exp(-X%*%beta)) sum(y*log((1-phi)*pie) + (1-y)*(log(phi + (1-phi)*(1-pie)))) } 38 / 51

39 LET S PROGRAM THIS IN R Optimize to get the results par <- rep(1,(ncol(x)+ncol(z))) out <- optim(par, ll.zilogit, Z=Z, X=X,y=y, method="bfgs", control=list(fnscale=-1), hessian=true) out$par [1] / 51

40 PLOTTING TO SEE THE RELATIONSHIP These numbers don t mean a lot to us, so we can plot the predicted probabilities of a group having not fished (i.e. predict ψ. First, we have to simulate our gammas: varcv.par <- solve(-out$hessian) library(mvtnorm) sim.pars <- rmvnorm(10000, out$par, varcv.par) # Subset to only the parameters we need (gammas) # Better to simulate all though sim.z <- sim.pars[,(ncol(x)+1):length(par)] 40 / 51

41 PLOTTING TO SEE THE RELATIONSHIP We then generate predicted probabilities of not fishing for different sized groups. person.vec <- seq(1,4) Zcovariates <- cbind(1, person.vec) exp.holder <- matrix(na, ncol=4, nrow=10000) for(i in 1:length(person.vec)){ exp.holder[,i] <- 1/(1+exp(-Zcovariates[i,]%*%t(sim.z))) } 41 / 51

42 PLOTTING TO SEE THE RELATIONSHIP Using these numbers, we can plot the densities of probabilities, to get a sense of the probability and the uncertainty. plot(density(exp.holder[,4]), col="blue", xlim=c(0,1), main="probability of a Structural Zero", xlab="probability") lines(density(exp.holder[,3]), col="red") lines(density(exp.holder[,2]), col="green") lines(density(exp.holder[,1]), col="black") legend(.7,12, legend=c("one Person", "Two People", "Three People", "Four People"), col=c("black", "green", "red", "blue"), lty=1) 42 / 51

43 PLOTTING TO SEE THE RELATIONSHIP Probability of a Structural Zero Density One Person Two People Three People Four People Probability 43 / 51

44 OUTLINE The Ordered Probit Model Zero-Inflated Logistic Regression Binomial Model 44 / 51

45 BINOMIAL MODEL Suppose our dependent variable is the number of successes in a series of independent trials. For example: The number of heads in 10 coin flips. The number of times you voted in the last six elections. The number of Supreme Court cases the government won in the last ten decisions. We can use a generalization of the binary model to study these processes. 45 / 51

46 BINOMIAL MODEL Stochastic Component Y i Binomial(y i π i ) ( ) N P(Y i = y i π i ) = π y i i (1 π i ) N y i y i π y i i : There are y i successes each with probability of π i (1 πi ) N y i : There are N yi failures each with probability ( 1 π i N ) y i : Number of ways to distribute yi successes in N trials; order of successes does not matter. 46 / 51

47 BINOMIAL MODEL Systematic Component Why? π i = e x iβ 47 / 51

48 BINOMIAL MODEL Derive the likelihood: L(π i y i ) = P(y i π i ) n ( ) N = π y i i (1 π i ) N y i lnl(π i y i ) = = i=1 n i=1 y i [ ln ( N y i ) ] + lnπ y i i + ln(1 π i ) N y i n [y i lnπ i + (N y i )ln(1 π i )] i=1 48 / 51

49 BINOMIAL MODEL We can operationalize this in R by coding the log likelihood up ourselves. First, let s make up some data to play with: x1 <- rnorm(1000,0,1) x2 <- rnorm(1000,9,.5) pi <- inv.logit(-5 +.4*x1 +.6*x2) y <- rbinom(1000,10,pi) 49 / 51

50 BINOMIAL MODEL Write out the Log-likelihood function ll.binom <- function(par, N, X, y){ pi <- 1/(1 + exp(-1*x%*%par)) out <- sum(y * log(pi) + (N - y)*log(1-pi)) return(out) } 50 / 51

51 BINOMIAL MODEL Optimize to get the results my.optim <- optim(par = c(0,0,0), fn = ll, y = y, X = cbind(1,x1,x2), N = 10, method = "BFGS", control=list(fnscale=-1), hessian=t) my.optim$par [1] Given that pi <- inv.logit(-5 +.4*x1 +.6*x2) the output doesn t look too bad. 51 / 51

GOV 2001/ 1002/ Stat E-200 Section 8 Ordered Probit and Zero-Inflated Logit

GOV 2001/ 1002/ Stat E-200 Section 8 Ordered Probit and Zero-Inflated Logit Solé Prillaman Harvard University March 25, 2015 1 / 56 LOGISTICS Reading Assignment- Becker and Kennedy (1992), Harris and Zhao