Multivariate probit regression using simulated maximum likelihood

Size: px

Start display at page:

Download "Multivariate probit regression using simulated maximum likelihood"

Charla Small
5 years ago
Views:

1 Multivariate probit regression using simulated maximum likelihood Lorenzo Cappellari & Stephen P. Jenkins ISER, University of Essex 1

2 Overview Introduction and motivation The model and the method of Simulated Maximum Likelihood (SML) The mvprobit program (ml, method lf) Illustrations mvprobit in action Further remarks about program use 2

3 Introduction and motivation Evaluation of probit model likelihood functions requires calculation of Normal probability distribution functions. Algorithms exist for accurately calculating accurate univariate and bivariate Normal pdfs, but not for trivariate or higher dimensional Normal distributions (at least not in Stata). Instead,... Recent literature on calculating multivariate Normal pdfs using simulation-based methods Here: multivariate probit model estimated using simulated ML ( GHK simulator): mvprobit cf. triprobit at SSC-IDEAS 3

4 The model M equation multivariate probit model: y * im = β m X im + ε im, m = 1,, M y im = 1 if y * im > 0 and 0 otherwise ε im, m = 1,, M, are error terms distributed as multivariate normal, each with a mean of zero, and variance-covariance matrix V, where V has values of 1 on the leading diagonal and correlations ρ jk = ρ kj as off-diagonal elements. Structure like a SUR model but depvars are binary (and need not have same set of X in every equation) M different choices at a point in time OR choices on one item at M points in time (panel model with free correlations) 4

5 Estimation principles (M = 3) Log-likelihood function for a sample of N independent observations: L = Σ i w i logφ 3 (µ i ; Ω) w i is an optional weight Φ 3 (µ i ; Ω) is standard trivariate normal cdf, where µ i = (K i1 β 1 X i1, K i2 β 2 X i2, K i3 β 3 X i3 ) with K ik = 2y ik 1, for each j, k = 1,,3. Ω has elements Ω jk, where Ω jj = 1 for j = 1,, 3; Ω 21 = Ω 12 = K i1 K i2 ρ 21, Ω 31 = Ω 13 = K i3 K i1 ρ 31, Ω 32 = Ω 23 = K i3 K i2 ρ 32. Log-likelihood function depends on the trivariate standard normal distribution function Φ 3 (.)! Evaluate using Geweke-Hajivassiliou-Keane (GHK) smooth recursive conditioning simulator 5

6 The GHK simulator Exploits the fact that a multivariate normal distribution function can be expressed as the product of sequentially conditioned univariate normal distribution functions which can be easily and accurately evaluated. Trivariate case: 8 joint probabilities corresponding to the eight possible combinations of successes (y im = 1) and failures (y im = 0). Focus on Pr(every outcome is a success): Pr(y 1 = 1, y 2 = 1, y 3 = 1) = Pr(ε 1 β 1 X 1, ε 2 β 2 X 2, ε 3 β 3 X 3 ) = Pr(ε 3 β 3 X 3 ε 2 < β 2 X 2, ε 1 < β 1 X 1 ) Pr(ε 2 < β 2 X 2 ε 1 < β 1 X 1 ) Pr(ε 1 < β 1 X 1 ) Expression involves conditioning upon unobservable variables (that are correlated with each other). However if a good approximation for these conditional distributions can be found, then the likelihood function only requires evaluation of univariate integrals. How may the approximations be derived? 6

7 The GHK simulator (ctd.) Cholesky decomposition of the covariance matrix for the errors: E(εε ) V = Cee C where C is the lower triangular Cholesky matrix corresponding to V and e ~ Φ 3 (0, I 3 ), i.e. three uncorrelated standard normal variates. Hence: ε 1 = C 11 e 1 ε 2 = C 11 e 1 + C 22 e 2 ε 3 = C 31 e 1 + C 32 e 2 + C 33 e 3 and C jk is the jkth element of matrix C. rewrite Pr(three successes) as Pr(ε 1 β 1 X 1, ε 2 β 2 X 2, ε 3 β 3 X 3 ) = Pr[e 3 (β 3 X 3 C 32 e 2 C 31 e 1 )/C 33 e 2 < (β 2 X 2 C 21 e 1 )/C 22, e 1 β 1 X 1 /C 11 ] Pr[e 2 (β 2 X 2 C 21 e 1 )/C 22 e 1 β 1 X 1 /C 11 ] Pr[e 1 β 1 X 1 /C 11 ]. The standard normal variates, e, that now appear in the decomposition are uncorrelated with each other (by construction). The first two conditional probabilities can be further rewritten as unconditional probabilities defined in terms of truncated standard7 normal variates:.

8 The GHK simulator (ctd.) Rewriting in terms of truncated standard normal variates: Pr(ε 1 β 1 X 1, ε 2 β 2 X 2, ε 3 β 3 X 3 ) = Pr[ε 3 (β 3 X 3 C 32 e * 2 C 31 e 1* )/C 33 ] Pr[ε 2 (β 2 X 2 C 21 e 1* )/C 22 ] Pr[ε 1 β 1 X 1 /C 11 ] = Q 3 Q 2 Q 1, say, where e 1 * and e 2 * are truncated univariate standard normal variates with upper truncation points at β 1 X 1 /C 11 and (β 2 X 2 C 21 e 1* )/C 22 respectively. Computation of Q 1 is straightforward and, if one had some specific values for e * 1 and e 2*, then one could also compute Q 2 and Q 3, and hence the overall multivariate probability. Similar arguments for all the other probabilities of success/failure (and for M > 3). 8

9 The GHK simulator (ctd.) GHK simulator: (i) Derive values for e 1*, e * 2 via random draws from upper truncated standard normal distributions with truncation points as above (use random number generator, plus inversion formula); (ii) Recursively compute a multivariate probability value from the Qs. (iii) Replicate R times, and then (iv) Calculate the simulated probability as the arithmetic mean of the values of the simulated probabilities from each replication. SML estimator: ML with the multivariate normal probabilities calculated at each iteration using the GHK simulator numerically intensive! SML estimator consistent as N and R. (Asymptotic, like ML!) Simulation bias 0 when R raised with N. [ (R / N) sufficient.] 9

10 The mvprobit program mvprobit equation1 equation2... equationm [weight] [if exp] [in range] [, draws(#) seed(#) beta0 atrho0(matrix_name) robust cluster(varname) constraints(numlist) level(#) maximize_options ] where each equation is specified as ( [eqname:] depvar [=] [varlist] [, noconstant] ) by may be used; all types of weights allowed; has standard features of estimation commands including access to estimated results; no limit on M (in principle) Options draws(#): number of random draws in simulation (R). Default = 5. seed(#): initial value of random-number seed used in simulation process. (Default = ). beta0: estimates of the marginal probit regressions are reported. atrho0(matrix_name): starting values for the off-diagonal elements of the correlation matrix V that differ from the default starting values (all zero). Remaining options: same as the corresponding ones for biprobit. 10

11 Prediction using mvppred mvppred newvarname_prefix [if exp] [in range] [, statistic] where statistic is one of xb the linear prediction for each equation; the default. stdp the standard error of the linear predictions for each equation. pmarg the marginal success probability for each equation. pall the joint probabilities: (i) Pr(y im = 1, for all m = 1,,M), and (ii) Pr(y im = 0, for all m = 1,,M). Only one statistic may be chosen at a time. For statistics xb, stdp, and pmarg, results are stored in the variables newvarname_prefixi, for equations i = 1,, M. For the pall statistics, results are stored in the variables newvarname_prefix1s for predicted probability (i) and newvarname_prefix0s for predicted probability (ii). [Options for prediction restricted to the all successes and all failures cases = the only two cases that could be programmed without M being fixed. (Number of joint probabilities is 2 M.)] 11

12 Illustrations (1) (i) syntax, options etc., and (ii) accuracy of SML relative to ML (mvprobit vs. biprobit) School data (Pyndyck & Rubenfeld; Stata manuals), N = 95, M = 2 (2) Simulated data, N = 5000, M = 4. School data: private: whether children attend a private school, vote: whether the household head had voted for an increase in the property tax, years: # years family has been at the present residence logptax: log(property tax); loginc:log(income). 12

13 biprobit estimates:. use clear. biprobit (private=years logptax loginc) (vote=years logptax loginc), nolog Seemingly unrelated bivariate probit Number of obs = 95 Wald chi2(6) = 9.59 Log likelihood = Prob > chi2 = Coef. Std. Err. z P> z [95% Conf. Interval] private years logptax loginc _cons vote years logptax loginc _cons /athrho rho Likelihood ratio test of rho=0: chi2(1) = Prob > chi2 = predict p11,p11. predict p00,p00. predict xbb1, xb1. predict xbb2, xb2. predict stdpb1, stdp1. predict stdpb2, stdp2. predict pmargb1, pmarg1. predict pmargb2, pmarg2 Predictions of joint & marginal probabilities, linear predictions, etc. 13

14 mvprobit estimates (R = 100) are close to ML estimates:. mvprobit (private = years logptax loginc) (vote=years logptax loginc), nolog > dr(100) Multivariate probit (SML, # draws = 100) Number of obs = 95 Wald chi2(6) = 9.64 Log likelihood = Prob > chi2 = Coef. Std. Err. z P> z [95% Conf. Interval] private years logptax loginc _cons vote years logptax loginc _cons /atrho rho Likelihood ratio test of rho21 = 0: chi2(1) = Prob > chi2 = mvppred pall, pall (Pr(all zeros), Pr(all ones) will be stored in variables pall0s, pall1s). mvppred xbm, xb (xb will be stored in variables xbmi, i = 1,...,#eqs). mvppred stdpm, stdp (stdp will be stored in variables stdpmi, i = 1,...,#eqs). mvppred pmargm, pmarg (pmarg will be stored in variables pmargmi, i = 1,...,#eqs) Predictions 14

15 SML predictions very similar to ML counterparts. su pall1s p11 pall0s p00 xbm1 xbb1 xbm2 xbb2 /* > */ stdpm1 stdpb1 stdpm2 stdpb2 pmargm1 pmargb1 pmargm2 pmargb2 Variable Obs Mean Std. Dev. Min Max pall1s p pall0s p xbm xbb xbm xbb stdpm stdpb stdpm stdpb pmargm pmargb pmargm pmargb Less similarity between SML and ML with smaller R, we found. More on choice of R below. N = 95 is small, so potential finite sample biases anyway. Raise N and M in simulated data example 15

16 Simulated data: M = 4, N = set seed set obs 5000 obs was 0, now matrix R = (1,.25,.5,.75 \.25, 1,.75,.5 \.5,.75, 1,.75 \.75,.5,.7 > 5, 1). drawnorm u1 u2 u3 u4, corr(r). corr u* (obs=5000) u1 u2 u3 u u u u u Correlation structure (V). ge x1 = uniform()-.5. ge x2 = uniform() + 1/3. ge x3 = 2*uniform() +.5. ge x4 =.5*uniform() - 1/3. * Equations. ge y1s =.5 + 4*x1 + u1. ge y2s = 3 +.5*x1-3*x2 + u2. ge y3s = 1-2*x1 +.4*x2 -.75*x3 + u3. ge y4s = *x1 -.3*x2 + 3*x3 -.4*x4 + u4.. ge y1 = y1s>0. ge y2 = y2s>0. ge y3 = y3s>0. ge y4 = y4s>0 Equations 16

17 . mvprobit (y1=x1) (y2=x1 x2) (y3 = x1 x2 x3)(y4=x1 x2 x3 x4), dr(75) mvprobit estimates (R = 75) Estimates quite close to those in the true model NB warning message at iteration 1 not a problem as model later converged LR test of MVP against M independent univariate probits (V identity matrix) Iteration 0: log likelihood = Warning: cannot do Cholesky factorization of rho matrix Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = Multivariate probit (SML, # draws = 75) Number of obs = 5000 Wald chi2(10) = Log likelihood = Prob > chi2 = Coef. Std. Err. z P> z [95% Conf. Interval] y1 x _cons y2 x x _cons y3 x x x _cons y4 x x x x _cons /atrho /atrho /atrho /atrho /atrho /atrho rho rho rho rho rho rho Likelihood ratio test of rho21 = rho31 = rho41 = rho32 = rho42 = rho43 = 0: chi2(6) = Prob > chi2 =

18 Using mvprobit: further remarks Choosing number of random draws: higher R increases accuracy but also run-time; Choosing seed (different random numbers give different simulated probabilities) Re-estimated the earlier models for each value of R = 1,, 150, and several alternative seed values 18

19 SML estimate of ρ 21 : R = 1,, 150; seed = , 999 (a) two-equation model, generated data (N = 5000) Seed = Seed = 999 ρ 21 ( true ) rho21.2 rho Number of random draws Number of random draws (a) two-equation model, School data (N = 95) Seed = Seed = rho rho ρ 21 (ML) Number of random draws Number of random draws With large N, SML estimator OK regardless of seed when R > N 19

20 Using mvprobit: further remarks Run-time: Four-equation model took c hours using Stata 7/SE on Pentium P4/1.4Ghz; c. 5.3 hours using Stata 7/IC on Sun Solaris) increases linearly in R (for given N, M) increasing number of explanatory variables, ceteris paribus, has no big effect atrho option use reduced run-time relatively little (since got good estimates of V within a few iterations anyway) Runtime increases substantially as M increases (several days/weeks if N = 1000s and M > 6 or 7!) Capacity: R*M temporary variables created: set memory For large models (many covariates, many M): set matsize 20

Multivariate probit regression using simulated maximum likelihood

The Stata Journal (2003) 3, Number 3, pp. 278 294 Multivariate probit regression using simulated maximum likelihood Lorenzo Cappellari Università del Piemonte Orientale and University of Essex Stephen