Florian Heiss; Viktor Winschel: Estimation with Numerical Integration on Sparse Grids

Florian Heiss; Viktor Winschel: Estimation with Numerical Integration on Sparse Gris Munich Discussion Paper No. 2006-15 Department of Economics University of Munich Volkswirtschaftliche Fakultät Luwig-Maximilians-Universität München Online at http://epub.ub.uni-muenchen.e/916/

Estimation with Numerical Integration on Sparse Gris Florian Heiss Viktor Winschel April 12, 2006 Abstract For the estimation of many econometric moels, integrals without analytical solutions have to be evaluate. Examples inclue limite epenent variables an nonlinear panel ata moels. In the case of one-imensional integrals, Gaussian quarature is known to work efficiently for a large class of problems. In higher imensions, similar approaches iscusse in the literature are either very specific an har to implement or suffer from exponentially rising computational costs in the number of imensions a problem known as the curse of imensionality of numerical integration. We propose a strategy that shares the avantages of Gaussian quarature methos, is very general an easily implemente, an oes not suffer from the curse of imensionality. Monte Carlo experiments for the ranom parameters logit moel inicate the superior performance of the propose metho over simulation techniques. JEL Classification: C15, C25, C51 Keywors: Estimation, Quarature, Simulation, Mixe Logit We woul like to thank Alexaner Luwig, Axel Börsch-Supan, Melanie Lührmann, Daniel McFaen, Paul Ruu, an Joachim Winter for valuable comments an suggestions. University of Munich, Department of Economics, mail@florian-heiss.e University of Mannheim, Department of Economics, winschel@rumms.uni-mannheim.e 1

1 Introuction Many econometric moels imply likelihoo an moment functions that involve multiimensional integrals without analytically tractable solutions. This problem arises frequently in microeconometric latent epenent variable (LDV) moels in which all or some of the enogenous variables are only partially observe. Other sources inclue unobserve heterogeneity in nonlinear moels an moels with ynamically optimizing agents. There are ifferent approaches for numerical integration. It is well known that Gaussian quarature can perform very well in the case of oneimensional integrals of smooth functions as suggeste by Butler an Moffit (1982). Quarature can be extene to multiple imensions. The irect extension is a tensor prouct of one-imensional quarature rules. However, computing costs rise exponentially with the number of imensions an become prohibitive for more than four or five imensions. This phenomenon is also known as the curse of imensionality of numerical integration. The main problem of this prouct rule is that the class of functions in which it elivers exact results is not restricte to polynomials of a given total orer. Unlike in the univariate case, general efficient quarature rules for this class are much harer to irectly erive an often intractable (Ju 1998, Cools 2003, Ch. 7.5). They are therefore usually consiere impractical for applie research (Bhat 2001, Geweke 1996). These problems le to the avancement an preominant use of simulation techniques for the numerical approximation of multiimensional integrals in the econometric literature, see for example McFaen (1989) or Börsch-Supan an Hajivassiliou (1993). Hajivassiliou an Ruu (1994) provie an overview over the general approaches of simulation an Train (2003) provies a textbook treatment with a focus on iscrete choice moels, one of the major classes of moels for which these methos were evelope an frequently use. This paper proposes an investigates the performance of a ifferent approach that can be trace back to Smolyak (1963). Sparse gris integration (SGI) has been avance in recent research in numerical mathematics, see for example Novak an Ritter (1999). The class of functions for which it is exact is confine to polynomials of a given total orer. This ramatically ecreases computational costs in higher imensions. It is base on one-imensional quarature but extens it to higher imensions in a more careful way than the tensor prouct rule. This implies that it is easily implemente an very general since only one-imensional quarature noes an weights have to be erive. 1

After iscussing the general approaches to numerical integration, SGI is introuce in Section 3. Section 4 presents the Monte Carlo esign an results. They irectly aress the question of interest for estimation: Which metho elivers the best estimates with a given amount of computing costs? The experiments are base on a panel ata ranom parameters logit moels which are wiely use in applie iscrete choice analysis. They vary the panel ata imensions, the number of alternatives, the imension of unobserve taste components an the parameterization of the ata generating process. The results show that the SGI methos clearly outperform simulations base on both pseuo an quasi ranom numbers. Section 5 conclues. 2 Univariate Numerical Integration We start the iscussion with univariate integration in orer to introuce the notation an to lay the groun for an extension to the multivariate case. Write a general univariate integration problem as I 1 [g = g(x) w(x) x. (1) Ω In limite epenent variable (LDV) an other econometric moels, integrals often arise in the calculation of expecte values. In this case, x represents a ranom variable, g(x) is a function which is in the nontrivial case nonlinear in x an w(x) represents the p..f. of x with support Ω. The integral I 1 [g is the expecte value of g(x). A leaing example for this kin of problem in microeconometric panel ata moels are ranom effects (RE) moels like the RE probit moel iscusse for example by Butler an Moffit (1982) x represents the iniviual ranom effect, g(x) is the probability of the sequence of observe outcomes conitional on the explanatory variables, parameters an x. The integral I 1 [g is the marginal (with respect to x) probability. This value is neee for example for the evaluation of the likelihoo function. Since for nonlinear functions g(x) the integral has in general no closeform solution, it has to be approximate numerically. One possible approach is Monte Carlo simulation. Given a number R of replications, a set of ranom numbers or noes [x 1,..., x R is generate such that each x r is a raw from the istribution characterize by w(x). The simulate integral is then equal to S 1,R [g = 1 R g(x r ). (2) R 2 r=1

Uner very weak conitions, the simulate value is unbiase an R-consistent by a law of large numbers. In the one-imensional case, other strategies like simple Riemann sums with regularly-space noes are straightforwar - we will come back to equivalent strategies when iscussing multiple integration. A somewhat ifferent strategy is taken by Gaussian an relate quarature rules. They provie a formula which elivers the exact integral for a class of functions typically polynomials that approximate g(x). Gaussian quarature rules construct noes an weights such that the rule is exact for polynomials of a given orer with a minimal number of noes. Butler an Moffit (1982) suggest using Gaussian quarature for the RE probit moel an show evience that it works very well compare to simulation. Define a sequence of quarature rules V = {V i : i N}. Each rule V i specifies a set of noes X i R an a corresponing weight function w i : X i R that is appropriate for w an Ω. The inex i enotes the increasing level of accuracy which epens on the number of noes in X i. The approximation of I 1 [g by V i is then given as V i [g = x X i g(x)w i (x). (3) If V i is a Gaussian rule with R noes, then V i [g = I 1 [g if g is a polynomial of orer 2R 1 or less. The sets of noes an weights epen on w an Ω, but not on g. While they are not trivial to etermine, for the most common cases they are tabulate in the literature an efficient software is available, see for example Mirana an Fackler (2002) for implementations in Matlab. Given noes an weights, Gaussian quarature rules are straightforwar to implement, since equation (3) merely requires to calculate a weighte sum of function values. For general functions g, the sequence of approximations V i [g converges to I 1 [g uner weak assumptions as the number of function evaluations rise. A sufficient conition is that g is boune an Riemann-integrable. There are various results concerning the spee of convergence for aitional smoothness properties. For example if g has n boune erivatives, many Gaussian quarature approximations converge to the true value at a rate of R n. This is much better than the R-consistency of Monte Carlo simulation if g(x) is sufficiently smooth. A more etaile iscussion of Gaussian quarature can be foun in the literature, see for example Davis an Rabinowitz (1984). 3

3 Multivariate Numerical Integration 3.1 Formulation of the Problem Many econometric moels involve integration problems over several imensions. In the RE probit moel mentione above, these problems arise for example if the error term follows a ranom process over time, has an error components structure, or multiple outcomes are moele as in the multinomial probit moel. In the Monte Carlo section below, we iscuss a multinomial choice moel with an error components structure the mixe or ranom parameters logit moel. Write the integration problem in the mulivariate case as I D [g = g(x 1,..., x D ) w(x 1,..., x D ) x D x 1. (4) Ω 1 Ω D We restrict the iscussion to the case in which the weight function w(x 1,..., x D ) can be ecompose as w(x 1,..., x D ) = D w(x ) (5) an where Ω = Ω for all = 1,..., D. In the interpretation with [x 1,..., x D representing ranom variables an w(x 1,..., x D ) their joint p..f., this restriction is equivalent to assuming that the ranom variables are inepenently an ientically istribute. Inepenence is crucial for the remainer of this paper, while ientical istributions are merely assume for notational convenience to save on another subscript. This structure of the weighting function is less restrictive than it might seem. If inepenence is violate in the original formulation of the problem, a change of variables often leas to such a structure. If for example z enotes a vector of jointly normally istribute ranom variables with mean µ an covariance matrix Σ, then x = L 1 (z µ) with LL = Σ is istribute i.i.. stanar normal. Monte Carlo simulation is the most commonly use technique in the econometric literature for the numerical approximation of integrals of the form (4) in the multivariate case D > 1. The only ifference to the univariate case in equation (2) is that for each replication, a vector [x 1,r,..., x D,r is rawn from w(x 1,..., x D ) at which the function g is evaluate. The result of R-consistency is inepenent of the number of imensions D. This oes of course not imply properties of the approximation with a finite number of replications. =1 4

Quasi-Monte Carlo methos an antithetic sampling algorithms istribute the noes more evenly than pseuo-ranom noes generate by a stanar ranom number generator. Therefore, they generally achieve both a better approximation quality with a finite number of replications an in many cases also faster convergence rates. As argue above, eterministic integration schemes such as Gaussian quarature often work very well for univariate problems. Their extension to multiple imensions is not as straightforwar as for simulation. A natural goal for a quarature rule in multiple imensions is exactness for multivariate polynomials of a given total orer, also known as complete polynomials (Ju 1998, Ch. 6.12). To be more specific, consier a D-variate polynomial g(x 1,..., x D ) = T t=1 a t D =1 x j t, (6) for some T N, [a 1,..., a T R T an [j t,1,..., j t,d N D for all t = 1,..., T. Define the total orer of g as the maximal sum of exponents max t=1,...,t D =1 j t, Unlike in the univariate case, general efficient quarature rules for this class in the multivariate case are much harer to irectly erive an often intractable (Ju 1998, Ch. 7.5). As a result, there has been a large literature on very specific problems. For a collection of these rules, see for example Cools (2003). They are not only limite to very narrowly efine problems, but often also ifficult to implement. They are therefore usually consiere impractical for applie research (Bhat 2001, Geweke 1996). We focus on two approaches which are easily implemente an cover a wie range of applications since they merely combine one-imensional quarature rules. We first iscuss the well known prouct rule extension (Tauchen an Hussey 1991) in Section 3.2. It oes not restrict its attention to polynomials of a given total orer an suffers from exponentially rising cost with increasing imensions. It is therefore inefficient for moerate imensions D > 1 an infeasible for higher imensions, say D > 5. In Section 3.3, we suggest to use an extension of Gaussian quarature to higher imensions which is computationally efficient, easy to implement, an very general. 3.2 Multivariate Quarature: The Prouct Rule Univariate quarature rules can be extene easily to multiple imensions by the prouct rule. Define the tensor prouct over univariate quarature rules with potentially ifferent accuracy levels in each imension inicate 5

by the multi-inex [i 1,..., i D as (V i1 V id )[g = x 1 X i1... x D X id g(x 1,..., x D ) D w i (x ), (7) where the noes X i1,...x id an weights w i1,..., w id are those implie by the unerlying one-imensional quarature rules V i1,..., V id. The wiely known prouct rule T D,k for D-variate Gaussian quarature with accuracy level k (Tauchen an Hussey 1991) is simply this tensor prouct with the same accuracy in each imension (V k V k ) [g. If V k is exact for all univariate polynomials of orer o or less, then T D,k is exact for a special class of D-variate polynomials. Instea of a boun on the total orer (that is the sum of exponents), it restricts the maximum exponent of all monomials to be at most o. For example a secon-orer Taylor approximation in two imensions is a polynomial of total orer 2: it contains terms in x 2 1, x2 2, an x 1x 2. Instea of being exact in this class of functions, the prouct rule is aitionally exact for terms in x 2 1 x 2, x 1 x 2 2, an x2 1 x2 2. In higher imensions an higher orer, the number of these terms, which iminish at a higher rate in the approximations, rises quickly which makes the prouct rule inefficient an causes the curse of imensionality (Ju 1998, Ch. 6.12). The prouct rule evaluates the function g at the full gri of points X k X k. In D imensions, the prouct rule therefore requires R D evaluations of the function g if the unerlying univariate rule V k is base on R noes. This exponential growth of computational costs with the number of imensions is labelle curse of imensionality of multivariate integration. While for example Gaussian quarature exactly evaluates a univariate polynomial of orer 7 with 4 function evaluations, the corresponing prouct rule with 20 imensions requires 4 20 = 1, 099, 511, 627, 776 evaluations which is generally prohibitive. 3.3 Multivariate Quarature on Sparse Gris We suggest to exten univariate quarature rules to multiple imensions with a substantially smaller number of function evaluations in higher imensions than the prouct rule. This can be achieve by combining the univariate rules in a ifferent way than the prouct rule. The basic iea goes back to Smolyak (1963) an is a general metho for multivariate extensions of univariate approximation an integration operators. =1 6

The construction of Smolyak can be efine as follows. For an unerlying sequence of univariate quarature rules, efine V 0 [g = 0 an the ifference of the approximation when increasing the level of accuracy from i 1 to i as i [g = V i [g V i 1 [g i N. (8) With i = [i 1,..., i D, efine for any q N 0 { } D N D q = i N D : i = D + q =1 (9) an N D q = for q < 0. For example, N 2 2 = {[1, 3, [2, 2, [3, 1}. The Smolyak rule with accuracy level k N for D-imensional integration is efine as A D,k [g = k 1 q=0 i N D q ( i1 id ) [g. (10) For D = 1, the unerlying one-imensional quarature rule emerges as a special case: A 1,k [g = k (V i [g V i 1 [g) = V k [g. (11) i=1 This integration rule is esigne to be exact for polynomials of a given total orer: Theorem 1 Assume that the sequence of univariate quarature rules V = {V i : i N} is efine such that V i is exact for Ω, w, an all univariate polynomials of orer 2i 1 or less. This implies the Smolyak rule A D,k using V as the univariate basis sequence is exact for D-variate polynomials of total orer 2k 1 or less. See the appenix for a proof. It is instructive to express A D,k [g irectly in terms of the univariate quarature rules instea of their ifferences. It can be written as (Wasilkowski an Woźniakowski 1995) A D,k [g = k 1 q=k D ( ) D 1 ( 1) k 1 q (V i1 V id )[g. (12) k 1 q i N D q 7

Figure 1: Construction of the sparse gri in two imensions Univariate noes: Prouct rule: X 1 X 2 X 3 X 3 X 3 X 1 X 2 X 1 X 3 X 2 X 1 X 2 X 2 Sparse gri: X 2,3 X 3 X 1 This rule is a weighte sum of prouct rules with ifferent combinations of accuracy levels i = [i 1,..., i D. Their sum is boune which has the effect that the tensor prouct rules with a relatively fine sequence of noes in one imension are relatively coarse in the other imensions. This is analogous to the boun on the sum of exponents for multivariate polynomials of a total orer. Figure 1 emonstrates the construction of the sparse gri by the Smolyak rule for a simple example with D = 2 an k = 3. The noes for a sequence of univariate quarature rules X 1, X 2, an X 3 are shown in the top of the figure. The prouct rule X 3 X 3 evaluates the function at all two-imensional combinations of noes prescribe by X 3 which are shown in the upper right part of the figure. As equation (12) shows, the sparse gris rule combines tensor proucts of lower egree X i X j such that 3 i + j 4. The noes of these proucts as well as the resulting sparse gri are shown in the lower part of the figure. 8

The set of noes use by the sparse gris rule (12) can be written as X D,k = k 1 q=k D i N D q (X i1 X id ). (13) The number of noes in X D,k epens on the univariate noes X 1,..., X k an can in general not be easily calculate. We first iscuss the spee in which it grows as D if Gaussian quarature is use for the sequence of unerlying univariate rules. Remember that the prouct rule T D,k with unerlying Gaussian quarature rules uses k D noes so that the logarithm of the noes is of orer O(D 1 ) as D. Theorem 2 Consier the sparse-gris rule A D,k with unerlying Gaussian quarature rules V = {V i : i N} such that each X i use by V i has i noes. For a given accuracy k an rising D, the logarithm of noes in X D,k is of orer O(log(D) 1 ). We give a proof in the appenix. At least asymptotically, the number of noes (its logarithm) oes not rise exponentially (linearly) as for the prouct rule, but only polynomially (logarithmically). This is of course only of limite use in practice since realistic D are far from infinity. We therefore give precise numbers for ifferent imensions below. Before, we iscuss alternatives to Gaussian quarature as the unerlying univariate rules. Theorem 1 requires that in the sequence of quarature rules V 1, V 2... each V i is exact for all univariate polynomials of orer 2i 1 or less. As iscusse above, Gaussian quarature rules achieve this requirement on univariate exactness with a minimal number of i noes for each V i. Obviously, a low number of univariate quarature noes helps to obtain a low total number of noes in the sparse gris rule. In the example presente in Figure 1, the sets of univariate noes are neste in the sense that X i X j if i j. Because the noes are neste, the sets X 1 X 2 an X 2 X 1 o not a any istinct noes to the sparse gri an also the other sets share a substantial number of points. This makes the union of the tensor proucts a much smaller set than in the other extreme case in which each set contains ifferent noes so X i X j = if i j. Gaussian quarature rules are close to the latter case generally, only the mipoint is share by rules with an o number of noes. An example for neste sequences of univariate quarature rules are Kronro-Patterson sequences (Patterson 1968). A Kronro-Patterson rule with accuracy level i as a number of points to the set of noes X i 1 of the 9

preceing accuracy level an upates the weights. So by esign, X i X j if i < j. The aitional noes are chosen such that a maximum polynomial exactness is achieve. Because of the restriction that all noes in X i 1 are to be reuse, Kronro-Patterson rules generally require a higher number of noes to achieve the same univariate polynomial exactness as Gaussian quarature rules which optimally choose the noes without the requirement of neste sets. 1 With this approach, the goal in constructing univariate sequences is to a as few noes as possible to reuce the computational costs but enough to fulfill the requirements on polynomial exactness of Theorem 1. Petras (2003) iscusses this problem for the case of unweighte integration (Gauss-Legenre equivalent) an Genz an Keister (1996) for the normal p..f. weights (Gauss-Hermite equivalent). We supply the sequences of both approaches with the accompanying Matlab an Stata coe to this paper. Table 1 shows the number of function evaluations require by ifferent multivariate integration rules to achieve a given egree of polynomial exactness. The prouct rule suffers from the curse of imensionality. The number of noes for the Smolyak rule also rises with the number of imensions, but substantially slower. As iscusse, while in one imension Gaussian quarature is more efficient, in higher imensions the Kronro-Patterson rules nee fewer noes. Sparse gris integration can be easily implemente in practice. It is not necessary to construct the gri of noes an the corresponing weights accoring to equation (12) for each approximation of the integral. Since they o not epen on the integran, it suffices to o this once or use precalculate values. We provie general Matlab an Stata coe for these calculations. Existing coe using simulation has to be change only by using these noes an weights instea of rawing ranom numbers an by replacing raw means of the results with weighte sums. 4 Monte Carlo Experiments 4.1 The Ranom Parameters Logit Moel When using numerical integration in the context of estimation, the ultimate goal is to achieve goo parameter estimates. They obviously epen on the 1 With one or three integration noes, the Kronro-Patterson rule an the Gaussian rule coincie. With 2 m 1 noes for m > 1, Gaussian rules are exact for polynomials up to orer 2(2 m 1) 1, whereas Kronro-Patterson rules are only exact for polynomials up to orer 3 2 m 1 1. So the ratio of both approaches 3/4 as m rises. 10

Table 1: Number of function evaluations Prouct rule Smolyak rule Dimensions Gaussian Gaussian) KP Level k = 2, Polynomial exactness = 3 D = 1 2 2 3 D = 5 32 11 11 D = 10 1024 21 21 D = 20 1048576 41 41 Level k = 3, Polynomial exactness = 5 D = 1 3 3 3 D = 5 243 66 51 D = 10 59049 231 201 D = 20 3486784401 861 801 Level k = 4, Polynomial exactness = 7 D = 1 4 4 7 D = 5 1024 286 151 D = 10 1048576 1771 1201 D = 20 1099511627776 12341 10001 Level k = 5, Polynomial exactness = 9 D = 1 5 5 7 D = 5 3125 1001 391 D = 10 9765625 10626 5281 D = 20 95367431640625 135751 90561 approximation quality of the involve integrals. In this section we present Monte Carlo experiments to asses the relative performance of the numerical integration algorithms. Different ranom parameters logit (RPL) or mixe logit moels are implemente. The RPL moel is wiely use for stuying choices between a finite set of alternatives. McFaen an Train (2000) provie an introuction to this moel an a iscussion of its estimation by simulation methos. This moel has also been use before to stuy the performance of ifferent simulation methos (Bhat 2001, Hess, Train, an Polak 2006). Consier a ranom sample of N iniviuals. The ata has a panel structure, so that for each of the subjects T choices are observe. In each of these choice situations, the iniviual is confronte with a set of J alternatives an chooses one of them. These alternatives are escribe by K strictly exogenous attributes. The (K 1) vectors x itj collect these attributes of al- 11

ternative j = 1,..., J in choice situation t = 1,..., T of iniviual i = 1,..., N. Ranom utility maximization (RUM) moels of iscrete choices assume that the iniviuals pick the alternative which results in the highest utility. The researcher obviously oes not observe these utility levels. They are moele as latent variables for which the observe choices provie an inication. Let the utility that iniviual i attaches to alternative j in choice situation t be represente by the ranom coefficients specification U itj = x itjβ i + e itj. (14) It is given by a linear combination of the attributes of the alternative, weighte with iniviual-specific taste levels β i. These iniviual taste levels are istribute across the population accoring to a parametric joint p..f. f(β i ; θ) with support Ψ R K. The i.i.. ranom variables e itj capture unobserve utility components. They are assume to follow an Extreme Value Type I (or Gumbel) istribution. Our goal is to estimate the parameters θ of the taste level istribution. Let y itj enote an inicator variable that has the value 1 if iniviual i chooses alternative j in choice situation t an 0 otherwise. 2 Denote the vector of observe iniviual outcomes as y i = [y itj ; t = 1,..., T, j = 1..., J an the matrix of all strictly exogenous variables as x i = [x itj ; t = 1,..., T, j = 1..., J. Then, the probability that the unerlying ranom variable Y i equals the observe realization y i conitional on x i an the iniviual taste levels β i can be expresse as P i (β i ) = Pr(Y i = y i x i, β i ) = T t=1 J j=1 exp(x itj β i) y itj J j=1 exp(x itj β i). (15) The likelihoo contribution of iniviual i is equal to the joint outcome probability as a function of θ. It can be written as P i (θ) = Pr(Y i = y i x i, θ) = Pi (β i )f(β i ; θ) β i. (16) A solution for this K-imensional integral oes in general not exist in close form an has to be approximate numerically. 4.2 Approximating the probabilities We start with a simple case of the general moel. Let J = 2 so that the moel simplifies to a binary choice moel. Also let there only be K = 1 2 Note that by efinition, P J j=1 yitj i, t. Ψ 12

explanatory variable for which x it2 x it1 = 1 for all t. Let the taste level β i be normally istribute over the population with mean 1 an variance σ 2. Let the iniviual have chosen T 1 times alternative 1 an T 2 times alternative 2, so that there are in total T 1 + T 2 observations. The likelihoo contribution in equation (16) can then be simplifie to P i (θ) = P i (z)φ(z) z with P i (z) = (1 + exp( 1 σz)) T 1 (1 + exp(1 + σz)) T 2. (17) This univariate integral can either be simulate or approximate using stanar Gaussian quarature. Figure 2 shows the function Pi (z) for two ifferent cases an epicts the numerical approaches to its integration. The simulate probability with R simulation raws can be represente as the sum of R rectangles each of which has a with of 1/R an a height that correspons to the function value at ranomly chosen points. Quarature exactly integrates a polynomial of a given egree that represents an approximation to the integran. How well Pi (z) is approximate by a low-orer polynomial epens on the parameters. With high T an σ 2, the function has large areas in which it is numerically zero (or unity). These areas create a problem for the polynomial fit. In Figure 2, two cases are presente. In the simple case of Moel 1 with T 1 = 0, T 2 = 1, an σ 2 = 1, the function Pi (z) an therefore its integral is alreay well approximate by a thir-orer polynomial. A ninth-orer polynomial is inistinguishable from the original function. In orer to integrate this ninth-orer polynomial exactly, Gaussian quarature rules only nee R = 5 function evaluations. In the secon moel with T = 20, T 1 = 8, T 2 = 12, an σ 2 = 5, the problem of large tails with zero conitional probability is evient. A thirorer polynomial oes a poor job in approximating the function an there are noticeable ifferences between the original function an its ninth-orer polynomial approximation. A 19 th -orer polynomial for which Gaussian quarature nees 10 function evaluations however is again inistinguishable from the true function. This can of course be arbitrarily problematic with even higher σ 2 an T. With a sharp an narrow peak, approximation by simulation can have poor properties, too. Intuitively, this is since only a small fraction of simulation raws are within the nonzero area. This iagnosis irectly suggests a remey. The function can be transforme by a change of variables that leas to a better-behave function. In the context of simulation, this approach is importance sampling an for 13

Figure 2: Approximating probabilities in one imension Moel 1: T = 1, T 1 = 0, T 2 = 1, σ 2 = 1 Simulation (R = 20) Quarature (R = 2, 5, 10) Moel 2: T = 20, T 1 = 8, T 2 = 12, σ 2 = 4 Simulation (R = 20) Quarature (R = 2, 5, 10) univariate quarature, it works similarly well, see Liu an Pierce (1994). For the remainer of this paper, we will not use these improvements an leave the exploration of their avantages in the multivariate case for future research. For the two moels epicte in Figure 2 an two more extreme cases, Table 2 presents performance measures of simulation an Gaussian quarature approximations of the choice probabilities (17). The numbers presente are absolute errors for the quarature approximations an root mean square errors for simulations which have been performe 1,000 times for each moel an number of raws. All errors are efine relative to the value obtaine by a Riemann sum with 10,000 gri points. For all moels, Gaussian quara- 14

Table 2: Approximating probabilities in one imension: RMSE R = 2 R = 5 R = 10 R = 100 R = 1000 Moel 0: T = 1, T 0 = 0, T 1 = 1, σ 2 =.25 Simulation 0.0944 0.0569 0.0421 0.0130 0.0043 Quarature 0.0046 0.0008 0.0002 0.0000 Moel 1: T = 1, T 0 = 0, T 1 = 1, σ 2 = 1 Simulation 0.1846 0.1111 0.0822 0.0253 0.0085 Quarature 0.0102 0.0012 0.0002 0.0000 Moel 2: T = 20, T 0 = 8, T 1 = 12, σ 2 = 4 Simulation 1.0206 0.6745 0.4841 0.1516 0.0484 Quarature 0.8115 0.2472 0.0024 0.0000 Moel 3: T = 30, T 0 = 10, T 1 = 20, σ 2 = 9 Simulation 1.5222 0.9658 0.6591 0.2121 0.0677 Quarature 1.0000 0.6336 0.0402 0.0000 The reporte numbers are root mean square errors relative to the true value ture with 10 noes performs better than simulation with 1,000 raws. To stuy more complex moels, we turn to a setup where J = T = 5. The number of explanatory variables K etermines the imensionality of the integration problem. We chose K = 3, 5, 10, an 20 for the Monte Carlo stuies reporte in Table 3. The iniviual taste levels are specifie as i.i.. normal ranom variables with mean 1 an variance 2/K to hol the total variance of U itj constant as K changes. Instea of using one set of preefine ata, we raw 1,000 samples from the joint istribution of y i an x i, where the x itj are specifie as inepenent uniform ranom variables an the conitional istribution of the Bernoulli ranom variables y itj is given in equation (15). For each of these raws, we approximate the joint outcome probability using simulation an Smolyak integration with ifferent numbers of noes. For the calculation of the mean square errors, we approximate the true value by simulation with 200,000 raws. The rows enote as simulation represent simulate probabilities using a stanar ranom number generator. They perform worst in all cases. The quasi Monte Carlo results are obtaine using moifie latin hypercube sequences (MLHS) which are shown to work well for the estimation of RPL moels by Hess, Train, an Polak (2006). 3 This metho works much better than the stanar simulation. The prouct rule performs better than 3 We also experimente with Halton sequences which o not seem to make too much of a ifference compare to MLHS. Results can be requeste from the authors. 15

Table 3: RMSE of probabilities: σ 5 = 2/K, J = T = 5 K = 3, R = 7 8 87 125 495 512 Simulation 0.3373 0.2971 0.0879 0.0776 0.0372 0.0382 Quasi MC 0.2287 0.1802 0.0382 0.0331 0.0161 0.0152 Prouct rule 0.0669 0.0112 0.0048 Sparse gris 0.0303 0.0050 0.0020 K = 5, R = 11 32 151 243 903 1024 Simulation 0.2663 0.1448 0.0705 0.0536 0.0303 0.0278 Quasi MC 0.1486 0.0796 0.0310 0.0257 0.0127 0.0130 Prouct rule 0.0567 0.0277 0.0171 Sparse gris 0.0243 0.0049 0.0043 K = 10, R = 21 201 1024 1201 Simulation 0.1987 0.0654 0.0284 0.0255 Quasi MC 0.1076 0.0317 0.0135 0.0128 Prouct rule 0.0420 Sparse gris 0.0173 0.0211 0.0035 K = 20, R = 41 801 10001 Simulation 0.1428 0.0324 0.0095 Quasi MC 0.0795 0.0156 0.0048 Sparse gris 0.0125 0.0160 0.0027 The reporte numbers are root mean square errors relative to the true value both simulation methos in low imensions, especially K = 3. In five imensions, its avantage isappears an in ten imensions, it is clearly the worst metho. For K = 20, we i not obtain results since it is not computationally feasible. In all cases, the sparse gris metho clearly outperforms all other methos. Table 5 in the appenix shows results for the more ifficult case σ 2 = 5/K an J = T = 5. While all errors rise, the relative performances remain unchange. 4.3 Estimating the parameters One of the main reasons why approximations of the outcome probabilities are interesting is that they are require for most estimators of the parameters θ. We iscuss maximum simulate (or approximate) likelihoo estimation such that the estimators are efine as ˆθ = arg max θ log ( Pi (θ) ), i 16

where P i (θ) is some approximation of the iniviual joint outcome probability P i (θ). Alternatively, estimators coul be base on simulate scores or moments. We chose this estimator since it is easiest to implement an by far the most wiely use for these kins of moels. For a iscussion of the various approaches, see for example Hajivassiliou an Ruu (1994). It is clear that the quality of approximation of P i (θ) translates into the properties of the estimators. We specify a number of ifferent moels an estimate the parameters using simulation, antithetic sampling, an numerical integration using ifferent egrees of accuracy. As a starting point, a reference moel is specifie with N = 1000, T = 5, J = 5, K = 10, µ = 1, an σ = 0.5. Then each of these numbers is varie separately to asses their impact on the approximation errors of the ifferent methos. For each of these settings, estimates were obtaine for 100 artificial ata sets. The K-imensional vectors of properties of the alternatives x itj were rawn from a stanar uniform istribution. The moel parameters µ an σ are constraine to be equal for all properties to simplify the estimation an estimate for each ata set. We use the same methos as iscusse in the previous section pseuo-ranom Monte Carlo (PMC), quasi-ranom Monte Carlo (QMC) an sparse gris integration (SGI). Table 4 shows results for ifferent imensions of integration. The simulationbase estimates are much better for µ than for σ. This can be explaine by the fact that while the simulate probabilities P i (µ, σ) are unbiase for the true values P i (µ, σ), the log transformation introuces ownwar bias. This bias epens on the simulation variance which in turn epens on σ. This tens to bias ˆσ ownwars. As preicte from the results for the approximate probabilities, stanar simulation is ominate by quasi-ranom simulation. SGI is again clearly the best metho an for example in ten imensions requires only 21 noes for the same accuracy for which QMC nees 201 an PMC 1201 function evaluations. In the appenix, results for other moel parameter choices are presente. Table 6 shows variations of µ an σ an Table 7 varies N, T, an J. The basic fining are unaffecte by these changes. As σ increases, the approximation error rises an therefore all methos perform worse. 4 With a larger number of i.i.. cross-sectional observation units N or longituinal observations T, the estimators improve. Their relative avantages remain unaffecte. 4 If σ has a very large value, all methos fail to give reasonable estimation results. As iscusse above, aaptive rescaling might solve this problem. 17

Table 4: Errors of the estimate parameters with ifferent K RMSE (ˆµ) RMSE (ˆσ) R PMC QMC SGI PMC QMC SGI Dimension K = 2 9 0.0486 0.0458 0.0448 0.3648 0.1648 0.1177 45 0.0458 0.0452 0.0448 0.1712 0.1246 0.1177 961 0.0449 0.0449 0.0448 0.1179 0.1170 0.1177 Dimension K = 4 9 0.0411 0.0361 0.0340 0.3857 0.1985 0.0902 81 0.0341 0.0340 0.0339 0.1319 0.0963 0.0923 1305 0.0338 0.0339 0.0339 0.0921 0.0916 0.0923 Dimension K = 10 21 0.0481 0.0353 0.0272 0.2951 0.1554 0.0668 201 0.0298 0.0277 0.0272 0.0874 0.0708 0.0654 1201 0.0276 0.0271 0.0271 0.0691 0.0662 0.0654 Dimension K = 14 29 0.0507 0.0348 0.0240 0.2493 0.1321 0.0601 393 0.0252 0.0247 0.0252 0.0665 0.0618 0.0632 3361 0.0251 0.0252 0.0249 0.0629 0.0637 0.0631 Dimension K = 20 41 0.0655 0.0465 0.0290 0.2323 0.1390 0.0620 801 0.0298 0.0285 0.0276 0.0634 0.0599 0.0564 10001 0.0289 0.0291 0.0291 0.0594 0.0585 0.0585 The reporte numbers are RMSEs relative to the true value 5 Conclusions Multiimensional integrals are prevalent in econometric estimation problems. Only for special cases, close-form solutions exist. With a flexible moel specification, the researcher frequently has to resort to numerical integration techniques. Previously iscusse methos of quarature in multiple imensions are either very specific an ifficult to implement or suffer from the curse of imensionality exponentially rising computational costs with increasing imensions. We suggest a metho that solves both problems. It merely requires the erivation of quarature noes an weights for univariate integration. These are wiely available in software implementations an tabulations. This makes the metho easy to implement an very broaly applicable. 18

The reason why prouct rules suffer from the curse of imensionality is that the class of functions for which they eliver exact results is not limite to polynomials of a given total orer. The propose metho of integration on sparse gris is confine to this set an therefore requires a ramatically lower number of function evaluations in higher imensions. The increase of computational costs is only polynomial instea of exponential. As a result, this metho can be use as an efficient alternative to simulation methos. An intuitive explanation of the avantage of quaraturebase methos over simulation is that it efficiently uses smoothness properties of the integran to recover its shape over the whole omain. After introucing the metho an iscussing its properties, we present extensive Monte Carlo evience for the ranom parameters logit moel. The results show that the computational costs to achieve a negligible approximation error are ramatically lower with the suggeste approaches than with simulation estimators. Recent research in numerical mathematics suggests possible refinements of integration on sparse gris. First, instea of preefining an approximation level in terms of the number of noes, a critical value of the approximation error can be specifie an the require number of function evaluations can be etermine automatically. Secon, the approximation oes not have to be refine in each imension symmetrically. It is also possible to invest more effort in the most relevant imensions. These imensions can also be etermine automatically in an aaptive fashion (Gerstner an Griebel 2003). Thir, quarature-base methos can be refine to efficiently hanle functions that are not well-behave. This can be achieve either by a change of variables or by piecewise integration. These extensions are left for future research. 19

Appenix Proof of Theorem 1 Note that A D,k [ T t=1 a t D =1 x j t, = [ T D a t A D,k t=1 =1 x j t, Therefore, it suffices to establish polynomial exactness for any of the T monomials. Consier g = D =1 xj for some sequence j 1,..., j D with (i) D =1 j 2k 1. The theorem states that this implies A D,k [g = I D [g. For the sequence of unerlying univariate quarature rules V 1, V 2,... we have by assumption (ii) V k [x j = I 1 [x j if j 2k 1. For the univariate case D = 1, we know that A 1,k = V k (see equation (11)) so the theorem follows immeiately from (ii). For the multivariate case, a proof is presente via inuction over D. Suppose that polynomial exactness has been establishe for D 1 imensions: [ D 1 [ D 1 (iii) A D 1, k =1 xj = I D 1 =1 xj if D 1 =1 j 2 k 1 It remains to be shown that this implies polynomial exactness for D imensions. First note that because of the multiplicative structure of the monomial integran, [ D D ( i1 id ) = i [x j. =1 Rewrite the general Smolyak rule (10) by separating the sum over the D th imension: A D,k [g = k i D =1 k i D q=0 x j i N D 1 q Combining these two expressions, we get [ D k A D,k = =1 x j i D =1 =1 ( i1 id ) [g. id [ x j D D A D 1,k id +1 20. [ D 1 =1 x j.

[ By (ii), we know that whenever 2(i D 1) > j D, V id x j D D = V id 1 [x j D D = I [x j D D. Therefore, id [x j D D an the summans are zero unless j D 2(i D 1). This in turn implies together with (i) that for nonzero summans [ D 1 =1 j D 1 2(k i D + 1) 1 an therefore A D 1,k id +1 =1 xj = [ D 1 I D 1 =1 xj by (iii). With i D =.5(j D + 1) for o j D an i D =.5j D for even j D, it follows that A D,k [ D =1 x j [ D 1 = I D 1 =1 [ D 1 = I D 1 =1 x j x j i D i D =1 id [ x j D D V i D [ x j D D By (ii), V i D [ x j D D = I 1 [ x j D D an the theorem follows. Proof of Theorem 2 Let R D,k enote the number of istinct noes in X D,k. For D k, it can be boune as k 1 D R D,k i, (18) q=0 i N D q =1 since the univariate quarature rule V i has i noes. The inequality comes from the fact that the mipoints appear repeately in the unerlying quarature rules. Note that the average element of i N D q is D+q D an that the prouct is maximize if all elements have the same value. Therefore D ( ) D + q D i i N D q. D =1 The number of vectors i N D q is ( ) D 1+q D 1. So ( ) ( ) R D,k R D 1 + k D + k D D,k = k. D 1 D As D, ( ) D+k D ( D exp(k), D 1+k ) D 1 D k k! an therefore log k log((k 1)!) + k log(d) = O(log(D) 1 ). ( ) RD,k 21

Further results Table 5: RMSE of probabilities: σ 2 = 5, J = T = 10 K = 3, R = 7 8 87 125 495 512 Simulation 0.7284 0.8184 0.2093 0.1860 0.0902 0.0923 Quasi MC 0.6234 0.5849 0.1498 0.1147 0.0636 0.0600 Prouct rule 0.2060 0.0480 0.0221 Sparse gris 0.2696 0.0217 0.0055 K = 5, R = 11 32 151 243 903 1024 Simulation 0.6628 0.3921 0.2264 0.1652 0.0807 0.0726 Quasi MC 0.5226 0.3517 0.1550 0.1106 0.0653 0.0564 Prouct rule 0.1828 0.0917 0.0566 Sparse gris 0.2434 0.0567 0.0151 K = 10, R = 21 201 1024 1201 Simulation 0.6708 0.2025 0.0950 0.0816 Quasi MC 0.5278 0.1783 0.0784 0.0713 Prouct rule 0.1574 Sparse gris 0.1798 0.1058 0.0573 K = 20, R = 41 801 10001 Simulation 0.4928 0.1169 0.0348 Quasi MC 0.3971 0.1176 0.0286 Prouct rule Sparse gris 0.1034 0.0756 0.0395 The reporte numbers are root mean square errors relative to the true value 22

Table 6: Errors of the estimate parameters with ifferent µ an σ RMSE (ˆµ) RMSE (ˆσ) R PMC QMC SGI PMC QMC SGI Parameters µ = 0.5, σ = 0.5 21 0.0268 0.0226 0.0208 0.2922 0.1533 0.0625 201 0.0211 0.0208 0.0209 0.0813 0.0638 0.0603 1201 0.0209 0.0209 0.0209 0.0631 0.0625 0.0605 Parameters µ = 2, σ = 0.5 21 0.0842 0.0598 0.0460 0.3050 0.1545 0.0712 201 0.0510 0.0470 0.0475 0.1034 0.0767 0.0738 1201 0.0478 0.0475 0.0472 0.0803 0.0732 0.0735 Parameters µ = 1, σ = 0.25 21 0.0264 0.0253 0.0257 0.1759 0.0984 0.0919 201 0.0250 0.0252 0.0255 0.0929 0.0888 0.0893 1201 0.0255 0.0256 0.0256 0.0868 0.0955 0.0923 Parameters µ = 1, σ = 1 21 0.1070 0.0867 0.0490 0.4095 0.3053 0.1754 201 0.0409 0.0363 0.0343 0.0994 0.0824 0.0746 1201 0.0316 0.0309 0.0303 0.0614 0.0597 0.0553 The reporte numbers are RMSEs relative to the true value 23

Table 7: Errors of the estimate parameters with ifferent N, T, an J RMSE (ˆµ) RMSE (ˆσ) R PMC QMC SGI PMC QMC SGI N = 500 21 0.0511 0.0441 0.0417 0.2993 0.1792 0.1101 201 0.0416 0.0414 0.0418 0.1284 0.1102 0.1126 1201 0.0413 0.0417 0.0417 0.1023 0.1109 0.1121 N = 2000 21 0.0417 0.0275 0.0165 0.3124 0.1530 0.0547 201 0.0185 0.0171 0.0166 0.0769 0.0575 0.0515 1201 0.0168 0.0166 0.0166 0.0548 0.0517 0.0512 T = 3 21 0.0513 0.0414 0.0374 0.3254 0.1836 0.1157 201 0.0378 0.0369 0.0383 0.1283 0.1220 0.1253 1201 0.0380 0.0381 0.0381 0.1208 0.1240 0.1234 T = 10 21 0.0195 0.0189 0.0281 0.2284 0.1375 0.0457 201 0.0253 0.0269 0.0291 0.0572 0.0466 0.0377 1201 0.0284 0.0288 0.0294 0.0393 0.0385 0.0376 J = 3 21 0.0619 0.0456 0.0363 0.2996 0.1485 0.0845 201 0.0389 0.0362 0.0370 0.1080 0.0931 0.0978 1201 0.0376 0.0372 0.0371 0.0972 0.0960 0.0964 J = 10 21 0.0339 0.0271 0.0214 0.2638 0.1415 0.0540 201 0.0226 0.0215 0.0216 0.0705 0.0563 0.0561 1201 0.0215 0.0215 0.0215 0.0569 0.0558 0.0559 The reporte numbers are RMSEs relative to the true value 24

References Bhat, C. (2001): Quasi-Ranom Maximum Simulate Likelihoo Estimation of the Mixe Multinomial Logit Moel, Transportation Research B, 35, 677 693. Börsch-Supan, A., an V. Hajivassiliou (1993): Smooth Unbiase Multivariate Probability Simulators for Maximum Likelihoo Estimation of Limite Depenent Variable Moels, Journal of Econometrics, 58, 347 368. Butler, J. S., an R. Moffit (1982): A Computationally Efficient Quarature Proceure for the One-Factor Multinomial Probit Moel, Econometrica, 50(3), 761 764. Cools, R. (2003): An Encyclopeia of Cubature Formulas, Journal of Complexity, 19, 445 453. Davis, P. J., an P. Rabinowitz (1984): Methos of Numerical Integration. Acaemic Press, New York, 2n en. Genz, A., an C. Keister (1996): Fully symmetric interpolatory rules for multiple intergrals over infinite regions with Gaussian weights, Journal of Computational an Applie Mathematics, 71, 299 309. Gerstner, T., an M. Griebel (2003): Dimension-Aaptive Tensor- Prouct Quarature, Computing, 71, 65 87. Geweke, J. (1996): Monte Carlo Simulation an Numerical Integration, in Hanbook of Computational Economics Vol. 1, e. by H. M. Amman, D. A. Kenrick, an J. Rust, pp. 731 800. Elsevier Science, Amsteram. Hajivassiliou, V. A., an P. A. Ruu (1994): Classical Estimation Methos for LDV Moels Using Simulation, in Hanbook of Econometrics Vol. IV, e. by R. F. Engle, an D. L. McFaen, pp. 2383 2441. Elsevier, New-York. Hess, S., K. E. Train, an J. W. Polak (2006): On the Use of a Moifie Latin Hypercube Sampling (MLHS) Metho in the Estimation of a Mixe Logit Moel for Vehicle Choice, Transportation Research Part B, 40, 147 163. Ju, K. L. (1998): Numerical Methos in Economics. MIT Press, Cambrige, Mass. 25

Liu, Q., an D. A. Pierce (1994): A note on Gauss Hermite quarature, Biometrika, 81, 624 629. McFaen, D. (1989): A Metho of Simulate Moments for Estimation of Discrete Choice Moels Without Numerical Integration, Econometrica, 57, 995 1026. McFaen, D., an K. Train (2000): Mixe MNL Moels for Discrete Response, Journal of Applie Econometrics, 15, 447 470, Unveröffentlichtes Manuskript, University of California, Berkeley. Mirana, M. J., an P. L. Fackler (2002): Applie Computational Economics an Finance. MIT Press, Cambrige MA. Novak, E., an K. Ritter (1999): Simple cubature formulas with high polynomial exactness, Constructive Approximation, 15, 499 522. Patterson, T. N. L. (1968): The optimum aition of points to quarature formulae, Mathematics of Computation, 22, 847 856. Petras, K. (2003): Smolyak Cubature of Given Polynomial Degree with Few Noes for Increasing Dimension, Numerische Mathematik, 93, 729 753. Smolyak, S. A. (1963): Quarature an Interpolation Formulas for Tensor Proucts of Certain Classes of Functions, Soviet Mathematics Doklay, 4, 240 243. Tauchen, G., an R. Hussey (1991): Quarature-Base Methos for Obtaining Approximate Solutions to Nonlinear Asset Pricing Moels, Econometrica, 59(2), 371 396. Train, K. (2003): Discrete Choice Methos with Simulation. Cambrige University Press. Wasilkowski, G. W., an H. Woźniakowski (1995): Explicit cost bouns of algorithms for multivariate tensor prouct problems, Journal of Complexity, 8, 337 392. 26