Using copulas to model time dependence in stochastic frontier models

Size: px

Start display at page:

Download "Using copulas to model time dependence in stochastic frontier models"

Bethanie Bates
5 years ago
Views:

1 Using copulas to model time dependence in stochastic frontier models Christine Amsler Michigan State University Artem Prokhorov Concordia University May 2011 Peter Schmidt Michigan State University Abstract We consider stochastic frontier models in the setting where there is dependence between cross sections. This setting is typical for panel data. Current methods of modeling time dependence in this setting are either unduly restrictive or computationally infeasible. Some impose restrictive assumptions on the nature of dependence such as the scaling property. Others involve T - dimensional integration, where T is the number of cross-sections, which may be large. Moreover, no known multivariate distribution has the property of having the commonly used, convenient marginals such as normal/half-normal. We show how to use copulas to resolve these issues. The range of dependence we allow for is unrestricted and the computational task involved is easy compared to the alternatives. Also, the resulting estimators are relatively more efficient. We propose two alternative specifications. One applies a copula function to the distribution of the composed error term. This permits the use of FMLE and GMM. The other applies a copula to the distribution of the one-sided error term. This allow for a simulated FMLE and a semiparametric estimation of inefficiencies. A well-known application demonstrates usefulness of the two specifications. JEL Classification: C13 Keywords: Copula, Stochastic Frontier, MLE, GMM, MSLE, MSM Helpful comments from participants of the Canadian Econometrics Study Group Meeting in Montreal, the International Conference on Panel Data in Bonn and the XI European Workshop on Efficiency and Productivity Analysis in Pisa as well as seminar participants at University of New South Wales and Ewha University are gratefully acknowledged.

2 1 Introduction We consider the stochastic frontier model of Aigner et al. (1977) and Meeusen and van Den Broeck (1977): y it = x itβ + v it u it (1) In this model, i denotes individual firms, countries, production units, etc, and t denotes time (when there is no ambiguity we will omit the subscripts). We assume that v it s are independent over i and t and distributed as N(0, σ 2 v). We also assume that the distribution of u it is N(µ it, σ 2 it )+, i.e. it is obtained by truncation to the left at zero of the normal distribution with mean µ it and variance σ 2 it. This distribution is known as half-normal. The distribution parameters (µ it, σit 2 ) may be constant or they may depend on explanatory variables z it. For simplicity, we will assume in what follows that µ it = 0 and σ it = σ u, for any i and t, so that u it N(0, σ 2 u) +. The truncated error term u it represents technical inefficiency. Typically we assume it is independent over i but independence over t is not an attractive assumption. One would expect that inefficiency correlates positively over time firms that are relatively inefficient in one time period will probably also be inefficient in other time periods. There are basically four approaches currently available in the literature to model dependence in inefficiency over t. First, there is the QMLE approach which basically assumes independence of u it over t (see, e.g., Battese and Coelli, 1995). A variation of this approach is to assume that u it s are independent conditional on some explanatory variables z it (see, e.g., Wang, 2002). Second, one may assume that u it is time invariant (see, e.g., Pitt and Lee, 1981). Then, u it = u i for all t. Third, one may assume a multivariate truncated distribution for (u i1,..., u it ), e.g., multivariate truncated Normal (see, e.g., Pitt and Lee, 1981). Then, the likelihood becomes a highly complicated function with T -dimensional integrals. Moreover, Horrace (2005) showed that such a joint distribution function will not have truncated normal marginals. So if we wish to have normal/halfnormal marginals of the composed error but follow this approach to allow for dependence between them over t, it will not provide the right marginals. 2

3 Finally, one may assume scaling. The scaling property assumes that u it has the following multiplicative representation: u it = h(z it ; δ)u it, where u it is a random variable, h(z it; δ) is a known function, δ is a vector of unknown parameters and z it contains variables that affect technical inefficiency. The variables z it may include functions of inputs x it or various characteristics of the environment, in which the firm operates. There are many versions of the stochastic frontier model that satisfy the scaling property (see Alvarez et al., 2006, for a survey of such models). In some versions, u it is assumed to be time invariant and z it includes a time trend (see, e.g., Battese and Coelli, 1992; Kumbhakar, 1990). Other models assume u it is distributed as N(0, σ 2 it )+, where σ 2 it depends on a set of relevant variables z it (e.g., Caudill et al., 1995). But the key feature of the assumption is that u it does not depend on z it, and so changes in z it only affect the scale of the distribution of u it (through the nonrandom function h(z it, δ)) but not its shape (determined by the distribution of u it ). The four approaches discussed above have major drawbacks. The scaling assumption that firms with different z it do not differ in the shape of the distribution of technical inefficiency (differ only in the scale) is hardly realistic. The assumptions that inefficiencies are not correlated over time or that they are time invariant are even less realistic. Moreover, even if the scaling property holds there is still the question of how to allow for dependence in u it. Unless u it s are independent or time invariant, we are faced with the same difficulty as with u it s. There is no tractable joint distribution with convenient marginals. In this paper we develop a tractable method of allowing for dependence without scaling, independence or time-invariance. We use copulas to do that. A distinctive feature of copulas is that they allow to model marginal distributions separately from their dependence structure. As a result we have a flexible joint distribution function, whose marginals are specified by the researcher. The joint distribution can accommodate any degree of dependence. No simplifying assumptions such as scaling or time-invariance are used in constructing the likelihood. Meanwhile, the task of maximizing the joint likelihood remains simple because it involves no T -dimensional integration. The plan of the paper is as follows. Section 2 reviews the traditional QMLE and introduces a score-based GMM estimator. In Section 3, we describe how to use copulas to allow for dependence over time. We use the GMM to motivate two new estimators that arise in this setting. Section 3

4 4.2 addresses the computational issues associated with the new copula-based estimators, in which time dependence is unrestricted. Section 4.4 discusses a test for copula validity, especially relevant for models with given marginals. Section 5.2 proposes a copula based method for estimating inefficiencies. Section 6 provides an empirical illustration of the new copula-based estimators proposed in the paper. Section 7 concludes. 2 Quasi-MLE and score-based GMM Let f v (v) and f u (u) denote the densities of v and u, respectively. Then, f v (v) = 1 e v 2πσv 2 2σ 2 v f u (u) = 2 e u πσu 2 2σ 2 u (2) and the density of the composed error ε = v u can be obtained by convolution as follows f(ɛ) = f v (ɛ + u)f u (u)du (3) = 0 ( ɛ ) ( σ φ Φ ɛλ ), σ σ (4) where σ 2 = σ 2 v + σ 2 u, λ = σu σ v, and φ and Φ are standard normal density and distribution functions, respectively (see, e.g., Aigner et al., 1977; Greene, 2003, p.502). The QMLE is based on this marginal (single t) density of the composed error. Let θ = (β, σ 2, λ) and let f it denote the density of the composed error term evaluated at ε it = v it u it. Then, f it (θ) = f(v it u it ) = f(y it x itβ) (5) The QMLE maximizes the sample likelihood obtained under the assumption of independence over i and t. That is, ˆθ QMLE = arg max θ ln f it (θ) (6) It is well known that, under suitable regularity conditions, the QMLE is consistent even if there is no independence over t. The sandwich estimator of the QMLE asymptotic variance matrix is used to obtain the correct standard errors in this case (see, e.g., Hayashi, 2000, Section 8.7). However, generally the QMLE is inefficient. There is a more precise estimator, which uses the same information as the QMLE. 4 i t

5 To motivate this improved estimator, we rewrite the QMLE as a generalized method of moments (GMM) estimator. Let s it (θ) denote the score of the density function f it (θ), i.e. s it (θ) = θ ln f it (θ), (7) where θ denotes the gradient with respect to θ. Note that s it (θ) is a zero mean function when evaluated at the true parameter value θ o, i.e. Es it (θ o ) = 0, for any i and t (8) Note that the QMLE of θ solves s it (ˆθ QMLE ) = 0, i t which is also the first order condition solved by the GMM estimator based on the moment condition E t s it (θ) = 0. (9) Therefore, the QMLE and the GMM estimator based on (9) are identical. The key to improving efficiency of the QMLE is to notice that the GMM estimator based on (9) uses a suboptimal weighting matrix. The moment condition in (9) sums the zero-mean score functions s it over t while the theory of optimal GMM suggests using a different, data driven weighting mechanism. To be more precise, define the vector of T score functions for each individual i: s i1 (θ) s i (θ) =.. s it (θ) Note that each element of this vector is zero-mean. Thus, Es i (θ o ) = 0 (10) Under appropriate regularity conditions, applying the optimal GMM machinery to these moment conditions is known to produce an estimator whose asymptotic variance matrix is smaller than that of any other estimator using the same moment conditions, including the QMLE (see, e.g., Hansen, 1982). 5

6 In the setting of a general panel data model, Prokhorov and Schmidt (2009) call the optimal GMM estimator based on (10) the improved QMLE (IQMLE). The IQMLE is the solution to following optimization problem where s (θ) = 1 N ˆθ IQMLE = arg min s (θ) W s (θ), (11) θ i s i (θ) and W is the inverse of the estimated covariance matrix of s i (θ o). This estimator is consistent so long as the QMLE is but it is in general more efficient. However, neither of the two estimators allow to model time dependence in u it. 3 Copulas and full MLE Let the bold letters ε, u, v denote vectors of dimension T that contain entire histories of respective error terms, so, e.g., u i = (u i1,..., u it ), and let f ε, f u, f v denote the respective joint (over all t) densities. Explicit modelling of time dependence between u it s requires a model for the multivariate density f ε or f u. We would like this density to accommodate arbitrary dependence and to have the correct marginal densities. That is, if we use f u we would like it to have half-normal marginals, if we use f ε we would like it to accommodate normal/half-normal marginal densities of the composed error term. Copulas can do this. Briefly defined, a copula is a multivariate distribution function with uniform marginals (a rigorous treatment of copulas can be found, e.g., in Nelsen, 2006). If we let C(w 1,..., w T ) denote the copula cdf and c(w 1,..., w T ) denote the copula density then each margin w t, t = 1,..., T, is assumed to come from a uniform distribution on [0, 1]. Copula functions usually have at least one parameter that models dependence between w s. Let ρ denote a generic vector of such parameters. Each parametric copula function is known as a copula family. We provide several examples of copula families in the Appendix A. An important feature of copula functions is that they differ in the range of dependence they can cover. Some copulas can cover the entire range of dependence they are called comprehensive copulas while others can only accommodate a restricted range of dependence. For example, suppose T = 2 and we measure dependence by Pearson s correlation coefficient. Then, the Gaussian, Frank and Plackett copulas are comprehensive, while the Farlie-Gumbel-Morganstern copula can model correlations only between about 0.3 and This will be important in empirical applications 6

7 because the copula-based likelihood we construct should be able to capture a fairly high degree of dependence present in the data. Clearly, we would like to use comprehensive copulas since we are looking to model an arbitrary degree of dependence. By a theorem due to Sklar (1959), given the marginals and a continuous joint distribution H(ξ 1,..., ξ T ), there exists a unique copula such that H(ξ 1,..., ξ T ) = C(F 1 (ξ 1 ),..., F T (ξ T )) or h(ξ 1,..., ξ T ) = c(f 1 (ξ 1 ),..., F T (ξ T )) f 1 (ξ 1 )... f T (ξ T ). We now have a flexible joint distribution function, whose marginals we can specify in advance. As mentioned above, there are two ways of using copulas to model dependence over t, corresponding to whether we apply a copula to form f ε or to form f u. We now consider each of these possibilities. 4 Application of copula to (ε i1,..., ε it ) We start by using a copula to form f ε, the joint (over t) density of the composed error. Given the normal/half-normal marginals of ε s and a copula family C(w 1,..., w t ; ρ), the joint density can be written as follows f ε (ε i ; θ, ρ) = c(f i1 (θ),..., F it (θ); ρ) f i1 (θ)... f it (θ), where, as before, f it (θ) f(ε it ) = f(y it x itβ) is the pdf of the composed error term evaluated at ε it and F it (θ) ε it f(s)ds is the corresponding cdf. This approach gives rise to two estimators and several computational issues, which we will discuss next. 4.1 FMLE and GMM Given a sample of size N, the joint density f ε (ε i ; θ, ρ) permits construction of the sample loglikelihood as follows ln L(θ, ρ) = N (ln c(f i1 (θ),..., F it (θ); ρ) + ln f i1 (θ) ln f it (θ)) (12) i=1 7

8 Because the likelihood is based on the joint distribution of ε s, we will call the estimator that maximizes the likelihood, the full maximum likelihood estimator (FMLE), That is, (ˆθ FMLE, ˆρ FMLE ) = arg max ln L(θ, ρ) (13) θ,ρ The first term in the summation is what distinguishes the full log-likelihood from the quasi-loglikelihood the term reflects the dependence between the composed errors at different t. When viewed from the GMM perspective, the FMLE solves the first order conditions that are identical to the first order conditions solved by the GMM estimator based on the moment conditions E θ ln c i (θ o, ρ o ) + θ ln f i1 (θ o ) θ ln f it (θ o ) = 0, (14) ρ ln c i (θ o, ρ o ) where c i (θ, ρ) = c(f i1 (θ),..., F it (θ); ρ). Again, the key to developing a new estimator is the observation that the moment conditions in (14) are obtained by applying a specific (not necessarily optimal) weighting scheme. θ ln f i1 (θ)... Let ψ i (θ, ρ) θ ln f it (θ) and let (ˆθ GMM, ˆρ GMM ) be the optimal GMM estimator based θ ln c i (θ, ρ) ρ ln c i (θ, ρ) on the moment conditions Eψ i (θ o, ρ o ) = 0 (15) That is, (ˆθ GMM, ˆρ GMM ) is the solution to the following optimization problem where φ(θ, ρ) = 1 N (ˆθ GMM, ˆρ GMM ) = arg min θ,ρ ψ(θ, ρ) W ψ(θ, ρ), (16) i ψ i(θ, ρ) and W is the inverse of the estimated covariance matrix of ψ i (θ o, ρ o ). Prokhorov and Schmidt (2009) consider this estimator in the setting of a general panel data model and show that it is consistent so long as the FMLE above is consistent. Unlike the QMLE and IQMLE, both FMLE and GMM estimators allow for arbitrary time dependence in panel stochastic frontier models while offering important computational advantages over the available alternatives. We discuss these advantages in the next section. 8

9 4.2 Evaluation of integrals In this section we show that by applying a copula to (ε i1,..., ε it ), we circumvent the task of a T -dimensional integration imposed by traditional methods. In effect, we are replacing that task with the task of T evaluations of an one-dimensional integral, which is a much simpler task. To demonstrate the advantages of our approach, we first consider what happens if we use a multivariate truncated distribution to form the full log-likelihood. Suppose, for example, that the joint distribution of u is multivariate truncated normal with parameter matrix Σ (see Pitt and Lee, 1981). Then, { } 1 f u (u) = (2π) T/2 Σ 1/2 exp 2 u Σ 1 u /P 0, (17) where P 0 is the probability that u 0, which can be expressed as the following T -dimensional integral: P 0 = 0 0 (2π) T/2 Σ 1/2 exp { } 1 2 u Σ 1 u du (18) Given f u (u), the joint density of the composed error vector ε can be obtained as yet another T -dimensional integral as follows: f ε (ε) = 0 0 T t=1 (ε 1 t +u t ) 2 e 2σv 2 f u (u) du. (19) 2πσv Note that inside this integral there is the joint density f u (u), which is itself calculated using T - dimensional integration. The resulting likelihood is intractable. Another difficulty with using the multivariate truncated normal distribution is that its marginals are not truncated normal distributions unless the marginals are independent (see Horrace, 2005, Theorems 4 and 6). So if we want to maintain the assumption of truncated normal marginals and allow for dependence by assuming a truncated normal joint distribution, that approach is not acceptable. Moreover, to our knowledge, no known multivariate distribution, aside from those constructed using copulas, has truncated normal marginals. Instead of T -dimensional integrations, the FMLE and GMM estimators require repeated evaluation of F (ε) ε which serves as an argument of the copula function. f(s) ds Evaluation of this integral is a standard task in one-dimensional integration that can be accomplished in standard econometric software by 9

10 quadrature or by simulation. 1 Moreover, Tsay et al. (2009) propose an approximation that can be used specifically in the setting of a stochatic frontier model to speed up the evaluations. Wang et al. (2008) tabulate several quantiles of F (ε) using an alternative parametrization, in which σ is replaced with σ v 1 + λ 2. They obtain their quantiles by simulation. We chose the parametrization (σ, λ) rather than (σ v, λ) because it bounds the variance of each error component by the value of σ 2. This is convenient because given σ the quantiles do not become large negative numbers as λ increases (the cdf does not flatten out). Our cdf is effectively on the domain [ 4, 4] for all λ. Also when applying copulas to ε, we prefer using numerical integration rather than simulation MSLE and GMSM Integral evaluations by simulation generate an approximation error which need be accounted for. This section does that. The basic idea of simulation-based methods is that under suitable regularity conditions a simulator will converge to the object it simulates. estimators of f(ε) and F (ε), respectively. Let ˆf(ε) and ˆF (ε) denote the simulation-based We are interested in the properties of the MLE and GMM estimators that use the estimators instead of the true values of the two functions. Such estimation methods fall into the class of Maximum Simulated Likelihood (MSLE) and Generalized Method of Simulated Moments (GMSM) estimators. Statistical properties of MSLE and GMSM are well-studied (see, e.g., McFadden, 1989; Gouriéroux and Monfort, 1991; McFadden and Ruud, 1994). For example, it is well known that the so called direct simulator ˆf(ε) = 1 S S φ(ε + u s ), s=1 1 The normal/half-normal error structure permits an additional shortcut when evaluating F (ε), the composed error cdf. By the algebraic properties of the cdf, for a fixed λ, F σ(ε) = F 1(ε/σ), where the subscript denotes the value of σ used in evaluting F. Therefore we can express quantiles of F (ε) for any σ as σ times the corresponding quantile for σ = 1. This means that we need to evaluate the integral only at σ = 1. Then, we can obtain its value at any other σ by calculating the product. 2 Our experience suggests that numeral integration by quadrature is somewhat faster than simulation for the same level of precision. A GAUSS code, which can be used to obtain any number of quantiles at desired precision using both methods, is available from the authors. 10

11 where φ is the density of v and u s are draws from N(0, σu) 2 +, s = 1,..., S, converges to f(ɛ) as S, and ˆF (ε) = 1 ˆf(e), N converges to F (ε) as N. Moreover, by a well-known result from Gouriéroux and Monfort (1991), the estimator obtained by maximizing the simulated log-likelihood (ln c( ˆF i1 (θ),..., ˆF it (θ); ρ) + ln ˆf i1 (θ) ln ˆf it (θ)) (20) i has the same asymptotic distribution as the FMLE provided that S N N/S e ε So the simulation error is ignorable asymptotically. A similar result is available for the GMSM estimator. McFadden (1989) shows that, under usual regularity conditions, the GMM estimator based on moment conditions (15), in which f it is replaced with ˆf it and F it is replaced with ˆF it, is consistent for θ even for fixed S. However, the asymptotic variance matrix of the GMSM estimator has to be adjusted to reflect the simulation error unless S. Compared to the MSLE, the GMSM estimator is more complicated and is known to be numerically unstable (see, e.g., McFadden and Ruud, 1994). Moreover, in settings with large T and relatively small N as in our application, the GMSM may generate too many moment conditions to handle given a sample size. Therefore, the MSLE appears more practical. 4.4 Choosing Copula There are many tests that can help choose a copula (see, e.g., Genest et al., 2009, for a review of copula goodness of fit tests). This section proposes a variation of the test of Prokhorov and Schmidt (2009), which is particularly well suited for the case when we apply a copula to (ε i1,..., ε it ). The test is focused on the copula based moment conditions which we are doubtful about, while maintaining the validity of the marginals. 3 The test proceeds as follows. In the first step, we obtain 3 We modify the original test of Prokhorov and Schmidt (2009) to make it suitable for use with the standard QMLE in the first step rather IQMLE. 11

12 ˆθ QMLE. In the second step, we apply the GMM with a specific form of the weighting matrix to the copula based moment conditions E θ ln c i (ˆθ QMLE, ρ) ρ ln c i (ˆθ QMLE, ρ) = 0, (21) where the required form of the weighting matrix is given in Appendix B. This produces an estimate of ρ denote it by ˆρ and the optimized value of the GMM criterion function denote it by Q(ˆθ QMLE, ˆρ). To constuct the test statistic, we look at the produce of the sample size and Q(ˆθ QMLE, ˆρ). The claim is, under the null of a correct copula, N Q(ˆθ QMLE, ˆρ) is asymptotically distributed according to the χ 2 distribution with the degrees of freedom equal to the dimension of θ. We will reject the null hypothesis that the chosen copula is valid for large values of the test statistic. The proof of this claim is given in Appendix B. 5 Application of copula to (u i1,..., u it ) We now consider the alternative copula specification. We apply a copula to form the joint distribution of (u i1,..., u it ) rather than that of (ε i1,..., ε it ). As we mentioned before, a T -dimensional integration will be needed to form the likelihood in this case. This will be similar to the example of multivariate truncated normal in previous section but the use of a copula will permit to avoid some of the difficulties associated with that approach. As before, let f u (u; θ, ρ) denote the copula-based joint density of the one-sided error vector and let f u (u; θ) denote the marginal density of an individual one-sided error term. Then, f u (u; θ, ρ) = c(f u (u 1 ; θ),..., F u (u T ; θ); ρ) f u (u 1 ; θ) f u (u T ; θ), (22) where F u (u; θ) u 0 f u(s; θ) ds is the cdf of the half-normal error term. The joint density of the composed error vector ε can be obtained as follows f ε (ε; θ, ρ) = 0 0 f v (ε + u; θ)f u (u; θ, ρ)du 1 du T = E u(θ,ρ) φ(ε + u), (23) where E u(θ,ρ) denotes the expectation with respect to f u (u; θ, ρ) and f v (v; θ) = φ(v) is the multivariate normal pdf of v, where all v s are independent and have equal variance σ 2 v. As in the multivariate truncated normal example, this is a T -dimensional integral which has no analytical 12

13 form and is not easy to evaluate using numerical methods. Unlike that example, however, it involves only one T -dimensional integral and it has the form of an expectation over a distribution we can easily sample from. We discuss the estimators that are available in this setting, and the issues that arise implementing them, in the next section. 5.1 Full MSLE The key benefit of using a copula is that we can simulate from the distribution of u and estimate f ε (ε) by averaging φ(ε + u) over draws from that distribution. Let S denote the number of simulations. The direct simulator of f ε (ε) can be written as follows ˆf ε (ε; θ, ρ) = 1 S S φ(ε + u s (θ, ρ)), s=1 where u s (θ, ρ) is a draw from f u (u; θ, ρ) constructed in (22). Then, a full simulated log-likelihood can be obtained as follows ln L s (θ, ρ) = i ln ˆf ε (ε i ; θ, ρ), where, as before, ε i = (ε i1,..., ε it ) and ε it = y it x itβ. The full maximum simulated likelihood estimator (FSMLE) is then given by (ˆθ FMSL, ˆρ FMSL ) = arg max θ,ρ ln Ls (θ, ρ). (24) This method is analogous to the simulation-based estimation of univariate densities discussed in a previous section. This is a multivariate extension of that method. Similar asymptotic arguments suggest that the asymptotic distribution of (ˆθ FMSL, ˆρ FMSL ) is identical to the FMLE based on (23), provided that S, N, N/S and suitable regularity conditions hold (see, e.g., Gouriéroux and Monfort, 1991). Note that we do not have a GMSM estimator here because we simulate the joint density f ε directly. However, we have an estimator of inefficiencies. 5.2 Estimation of Technical Inefficiencies The use of a copula to form f u permits the construction of a new and improved estimator of technical inefficiencies u i, i = 1,..., N. 13

14 In a single cross section (single t) setting, Jondrow et al. (1982) propose estimating u it by the conditional expectation Eu it ε it, where Eu it ε it = σλ 1 + λ 2 ( ) φ εi λ 1 Φ σ ( εit λ σ ) ε itλ σ We have defined above four consistent estimators of θ. Using each of them, we can easily construct an analoguous estimator of Eu it ε it (see, e.g., Greene, 2005). However, this estimator does not explicitly incorporate the dependence information. We propose estimating technical inefficiencies by E(u ε) the expected technical inefficiency conditional on the composed error vector. The conditional expectation E(u ε) is with respect to the conditional density f(u ε), which we do not know. However, we now know how to form the joint distribution of (u 1,..., u T ) using a copula and we can draw from that distribution. As before, we omit the subscript i and write u = u 1. u T = ε 1. ε T v 1. v T = ε v where u t N(0, σ 2 u) +, v N(0, σ 2 vi T ), and u and v are independent. Assume we applied a copula to u as discussed above and we now have f u (u). Suppose for simplicity that we use a one-parameter copula. Then, f u (u; σ 2 u, ρ) = c [ F u (u 1 ; σ 2 u),..., F u (u T ; σ 2 u); ρ ] T f u (u t ; σu) 2 Note that f u and F u denote the pdf and cdf, respectively, of the half-normal distribution N(0, σ 2 u) +. Assume for now that (σ 2 u, σ 2 v, ρ) are known. The key observation is that we can draw from the joint distribution of u and v. First, we draw from the copula-based distribution of u. This is done as follows: draw {w 1,..., w T } from copula C generate u t = F 1 (w t ), t = 1,..., T, where F 1 (w) is the inverse cdf of F u (u). We obtain S such draws u s. t=1 14

15 Second, we draw v s N(0, σ 2 vi T ) and obtain ε s = v s u s, s = 1,..., S. We now have draws from the joint distribution of (u, ε) and can estimate E(u ε) by any nonparametric method, e.g., by the multivariate version of the Nadaraya-Watson estimator S s=1 Ê(u ε) = u sk ( ε s ε ) h S s=1 K ( ), ε s ε h where K ( ε s ε h is a standard result that ) = T t=1 K ( ) ε ts ε t h, K( ) is a kernel function and h is the bandwidth parameter. It Ê(u ε) E(u ε) as S, h 0 and Sh T Obviously, this procedure is not feasible unless we know the values of (σu, 2 σv, 2 ρ). But these values are available from the FMSLE. Similarly, to obtain the estimated inefficiencies, we evaluated the estimated conditional expectation at the residuals ˆε it = y it x ˆβ, it where ˆβ also comes from FMSLE. As alternatives, one may use the (I)QMLE, FMLE or GMM estimates of (β, σu, 2 σv). 2 However, these estimators do not produce an estimate of the copula parameter for u. 4 A way around this difficulty is to draw from the joint distribution of (u, ε) for different values of ρ until a certain property of the data, e.g., sample correlation of ˆε it over time, is matched by the simulated sample. Because of the various criteria that can be used in matching (especially if T is large and if ρ is not a scalar), this estimator is less practical then the one using the FMSLE. 6 Empirical Application: Indonesian Rice Farm Production We demonstrate the use of copulas with a well-known data set from Indonesian rice farms (see, e.g., Lee and Schmidt, 1993; Horrace and Schmidt, 1996, 2000). The data set is a panel of 171 Indonesian rice farmers for whom we have 6 annual observations of rice output (in kg), amount of seeds (kg), amount of urea (in kg), labor (in hours), land (in hectares), whether pesticides were used, which of 3 rice varieties was planted and in which of 6 regions the farmer s village is located. Therefore, there are 13 explanatory variables besides the constant. 4 The FMLE and GMM produce estimates of the dependence parameter in the copula for ε but this is not the parameter we need. 15

16 Table 1: Indonesian rice production: QMLE and Gaussian copula-based FMLE and FMSLE HS(2000) QMLE FMLE FMSLE Est. Est. St.Err Rob.St.Err. Est. St.Err. Est. St.Err. CONST SEED UREA TSP LABOR LAND DP DV DV DR DR DR DR DR σ λ ln L We start by stating the results from the QMLE defined in (6). They are given in the first four columns of Table 1. For reference, we give the ML estimates obtained by Horrace and Schmidt (2000) using just the first cross section of the data. Our results are based on all six cross sections. It is important to account for potential correlation between the cross sections. The robust standard errors we report for the QMLE do this. They are obtained using the sandwich formula, while the first set of standard errors are obtained using the estimated Hessian. Unfortunately Horrace and Schmidt (2000) do not report standard errors of their ML estimation or the log-likelihood value. The second set of results are the FMLE defined in (13). We use the Gaussian copula to construct 16

17 a multivariate distribution of ε. The Gaussian copula is obtained from the multivariate Normal distribution with the correlation matrix R by plugging in the inverse of the univariate standard normal distribution function Φ(x)(see Appendix A). It is easy to show (see, e.g., Cherubini et al., 2004, p. 148) that the resulting copula density can be written as follows c(w 1,..., w T ; R) = 1 R e 1 2 ζ (R 1 I)ζ where ζ = (Φ 1 (w 1 ),..., Φ 1 (w T )). In the Gaussian copula, the dependence structure between the marginals is represented by the correlation matrix R. So our FMLE results contain estimates of a 6 6 correlation matrix, which we do not report. The last set of estimates are the FMSLE defined in (24), where we use the Gaussian copula to construct and sample from a multivariate distribution of u. We use S = 1000 draws in evaluating ˆf ε (ε) by simulation. As in FMLE, the results include 15 estimated correlations. Now, these are correlations over t of Φ 1 (F u (u t )), not of Φ 1 (F ε (ε t )), though the two estimated correlation matrices (not reported here) turn out very similar. Next we consider the two moment based estimators, namely the IQMLE defined in (11) and the GMM estimator defined in (16). These estimators have some theoretical advantages such as relatively higher asymptotic efficiency but, in practice, they may be infeasible. In this application, for example, we have 16 parameters, 6 time periods and only 171 cross sectional observations. The vector of moment conditions on which the IQMLE is based is Moreover, the additional vector of copula-based moment conditions employed by the GMM estimator is 15 1 for the parameters of R and 16 1 for the parameters of the marginals. So we have as many as 127 moment conditions. Obtaining stable GMM estimates for this problem with only 171 observations proved impossible. Therefore, instead of estimating the full vector of parameters we considered a simplified problem of estimating only two parameters, the constant and one slope coefficient, while keeping the other parameters fixed at their QMLE values. We therefore have 12 and 14 moment conditions to use in IQMLE and GMM, respectively. This is a more appropriate setting for a moment based estimation using a sample of size 171. Table 2 reports our estimates along with the test statistic of Section 4.4 for the Gaussian copula. The moment based estimates in Table 2 have two versions. Version 1 is the estimates obtained by the two-step GMM. In the first step, we obtain preliminary parameter estimates using the 17

18 Table 2: Indonesian rice production: restricted QMLE, IQMLE and Gaussian copula-based GMM QMLE IQMLE-1 IQMLE-2 GMM-1 GMM-2 Est. St.Err. Est. St.Err. Est. St.Err. Est. St.Err. Est. St.Err. CONST SEED Copula validity test statistic 41.6 identity matrix for weighting. In the second step, we estimate the covariance matrix of the moment conditions and update the weighting by the inverse of the covariance matrix. Version 2 is the iterated GMM estimation, in which we carry on the updating for 5 more iterations. The iterated GMM estimates are more efficient relative to the QMLE and the two-step GMM, which is evidenced by the slightly smaller standard errors of this estimate. On the other hand, the GMM estimators are known to have substantial small sample biases which are coupled with the simulation biases discussed above. The dispersion of the parameter estimates reported in the table suggests that these biases may be an issue here. Moreover, the copula based estimators are subject to the bias caused by copula misspecification. In the table, we provide the copula validity test statistic we discussed in Section 4.4. The test strongly rejects the null that the Gaussian copula is valid. This indicates that the copula based estimates GMM-1 and GMM-2, though most efficient relative to the other reported estimates, may suffer from a copula misspecification bias. This can also be seen from the distance of the GMM parameter estimates from the other estimates that do not use a copula and so, are robust to copula misspecification. Finally, we use the procedure outlined in Section 5.2 to estimate technical inefficiencies. As a benchmark we report inefficiency estimates obtained using the analytical expression for Eu it ε it derived by Jondrow et al. (1982). These estimates are reported in Table 3. The estimates obtained using a copula are reported in Table 4. The copula based estimates employ the Gaussian copula to draw from the joint distribution of u and ε with the number of draws S = 10, 000. Given the simulated sample, we use the Gaussian kernel with a common bandwidth parameter chosen by cross-validation to obtain the improved estimates of inefficiencies. For each method, we report 18

19 Table 3: Indonesian rice production: inefficiency estimates based on JLMS formula Rank Fractile ˆσ û t = 1 t = 2 t = 3 t = 4 t = 5 t = Average estimated inefficiencies in each time period for ten farms, namely those that are at the.05,.15,...,.95 fractiles of the sample distribution of inefficiencies for t = 1. The first two columns report the rank and the quantile of the selected farms. The third and the fourth columns contain the sample variations and the sample means over t, respectively, of the inefficiency estimates, which are reported in columns 5 through 10. The last row of the tables reports the correponding column averages over the ten farms. A few important observations can be made from the tables. First, there is a substantial reduction in variability of inefficiencies over t when we use the copula based estimator. The sample variation ˆσ of the inefficiency estimates over time obtained using the traditional method is several times higher for all farms. The improved estimates of inefficiencies û it vary less over t reflecting the implied correlation between them. On the other hand, high and low inefficiencies obtained using both methods tend to agree. Neither method produces consistently higher or consistently lower inefficiencies for any of the selected farms. However, overall the copula based estimator seems to produces a slightly lower average estimated inefficiency. This may be caused for the copula misspecification bias. Finally, previous authors who used this data set have reported an increase 19

20 Table 4: Indonesian rice production: inefficiency estimates based on a copula Rank Fractile ˆσ û t = 1 t = 2 t = 3 t = 4 t = 5 t = Average in technical inefficiency in the six year period and attributed this to the fact that the first three years correspond to a wet growing season while the last three to a dry. Using our new inefficiency estimates we tend to confirm this result. 7 Concluding Remarks In this paper we have proposed a new, copula-based, method of accounting for dependence between cross sections in panel stochastic frontier models. Compared to available alternatives, this method is simple and flexible. It allows modeling arbitrary dependence and does not involve much computational cost. We propose several estimators based on this method, illustrate how one can test for copula validity and suggest a simulation-based method of estimating technical inefficiencies. Advantages of the proposed approach are not only computational. By assuming a copula we assume a joint distribution. This is important for estimation precision. We use the setting of the Generalized Method of Moments (GMM) to point out that copulas contain information that is not in general redundant. So besides the computational advantages, copulas offer improvement in precision of estimation. 20

21 An interesting extension of this paper would be to compare robustness of the copula-based estimators we propose, with that of the alternative estimators based on the scaling property. After all, both approaches potentially involve an incorrectly specified joint likelihood function. We leave this question for future research. References Aigner, D., C. A. K. Lovell, and P. Schmidt (1977): Formulation and estimation of stochastic frontier production function models, Journal of Econometrics, 6, Alvarez, A., C. Amsler, L. Orea, and P. Schmidt (2006): Interpreting and Testing the Scaling Property in Models where Inefficiency Depends on Firm Characteristics, Journal of Productivity Analysis, 25, Battese, G. and T. Coelli (1995): A model for technical inefficiency effects in a stochastic frontier production function for panel data, Empirical Economics, 20, Battese, G. E. and T. J. Coelli (1992): Frontier production functions, technical efficiency and panel data: With application to paddy farmers in India, Journal of Productivity Analysis, 3. Caudill, S. B., J. M. Ford, and D. M. Gropper (1995): Frontier Estimation and Firm-Specific Inefficiency Measures in the Presence of Heteroscedasticity, Journal of Business and Economic Statistics, 13, Cherubini, U., E. Luciano, and W. Vecchiato (2004): Copula Methods in Finance, Wiley Finance Series, John Wiley and Sons Ltd. Genest, C., B. Rémillard, and D. Beaudoin (2009): Goodness-of-fit tests for copulas: A review and a power study, Insurance: Mathematics and Economics, 44, Gouriéroux, C. and A. Monfort (1991): Simulation Based Inference in Models with Heterogeneity, Annales d Économie et de Statistique, Greene, W. (2005): Reconsidering heterogeneity in panel data estimators of the stochastic frontier model, Journal of Econometrics, 126, , doi: DOI: /j.jeconom Greene, W. H. (2003): Econometric Analysis, Upper Saddle River, NJ : Prentice Hall, 5th ed. Hansen, L. (1982): Large sample properties of generalized method of moments estimators, Econometrica, 50, Hayashi, F. (2000): Econometrics, Princeton University Press. Horrace, W. C. (2005): Some results on the multivariate truncated normal distribution, Journal of Multivariate Analysis, 94,

22 Horrace, W. C. and P. Schmidt (1996): Confidence statements for efficiency estimates from stochastic frontier models, Journal of Productivity Analysis, 7, (2000): Multiple Comparisons with the Best, with Economic Applications, Journal of Applied Econometrics, 15, Jondrow, J., C. A. K. Lovell, I. S. Materov, and P. Schmidt (1982): On the estimation of technical inefficiency in the stochastic frontier production function model, Journal of Econometrics, 19, , doi: DOI: / (82) Kumbhakar, S. C. (1990): Production frontiers, panel data, and time-varying technical inefficiency, Journal of Econometrics, 46, Lee, Y. H. and P. Schmidt (1993): A Production Frontier Model with Flexible Temporal Variation in Technical Inefficiency, in Measurement of Productive Efficiency: Techniques and Applications, ed. by H. Fried, C. Lovell, and P. Schmidt, Oxford University Press. McFadden, D. (1989): A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration, Econometrica, 57, McFadden, D. and P. A. Ruud (1994): Estimation by Simulation, The Review of Economics and Statistics, 76, Meeusen, W. and J. van Den Broeck (1977): Efficiency Estimation from Cobb-Douglas Production Functions with Composed Error, International Economic Review, 18, Nelsen, R. B. (2006): An Introduction to Copulas, vol. 139 of Springer Series in Statistics, Springer, 2 ed. Pitt, M. M. and L.-F. Lee (1981): The measurement and sources of technical inefficiency in the Indonesian weaving industry, Journal of Development Economics, 9, Prokhorov, A. and P. Schmidt (2009): Likelihood-based estimation in a panel setting: robustness, redundancy and validity of copulas, Journal of Econometrics, 153, Sklar, A. (1959): Fonctions de répartition à n dimensions et leurs marges, Publications de l Institut de Statistique de l Université de Paris, 8, Tsay, W.-J., J. Huang, Cliff, T.-T. Fu, and I. L. A. Ho (2009): Maximum Likelihood Estimation of Censored Stochastic Frontier Models: An Application to the Three-Stage DEA Method, IEAS Working Paper : academic research. Wang, H.-J. (2002): Heteroscedasticity and Non-Monotonic Efficiency Effects of a Stochastic Frontier Model, Journal of Productivity Analysis, 18,

23 Wang, W. S., C. Amsler, and P. Schmidt (2008): Goodness of fit tests in stochastic frontier models, mimeo, Michigan State University. A Some copula families - Independence or product copula: C(w 1,..., w T ) = w 1... w T - Gaussian or Normal copula: C(w 1,..., w T ; R) = Φ T (Φ 1 (w 1),..., Φ 1 (w T ); R) where Φ denotes the standard normal cdf and R denotes the correlation matrix. - Gumbel copula: [ C(w 1,..., w T ; ρ) = exp (( ln w 1) ρ ( ln w T ) ρ ) 1/ρ], ρ [1, ) - Clayton copula: [ ] C(w 1,..., w T ; ρ) = max (w ρ w ρ T ) 1/ρ, 0, ρ [ 1, ) {0} - Farlie-Gumbel-Morganstern (FGM) copula: where ρ j [ 1, 1] - General copula by inversion T T C(w 1,..., w T ; ρ) = w t(1 + t=1 t=2 1 j 1 <...<j k T start with cdf s K(x 1,..., x T ), w 1 = F 1(x 1),...,w T = F T (x T ) obtain x 1 = F 1 1 (w 1),...,x T = F 1 (wt ) and - Archimedean copulas T ρ j1...j k (1 w j1 )... (1 w jk )) C(w 1,..., w T ) = K(F 1 1 (w 1),..., F 1 T (w T )) start with a generator function ϕ : (0, 1) [0, ], ϕ < 0 and ϕ > 0 obtain C(w 1,..., w T ) = ϕ 1 (ϕ(w 1) ϕ(w T )) multivariate copulas can often be obtained recursively from a bivariate copula (not generally true for other families) e.g., Gumbel copula is Archimedean with ϕ(t) = ( log t) ρ, Frank copula with ϕ(t) = ln 1 e ρ 1 e ρt 23

24 B Copula validity test The test is similar to the test in Theorem 9 of Prokhorov and Schmidt (2009). We adapt the test to the first step QMLE. Also, let For simplicity, we consider T = 2. For t = 1, 2 and i = 1,..., N, let f ti(θ) = f(ε ti; θ), c i(θ, ρ) = c(f (ε 1i; θ), F (ε 2i; θ); ρ), [ g i(θ) = θ ln f 1i(θ) + θ ln f 2i(θ) ḡ(θ) 1 N r i(θ, ρ) = N g i(θ), i=1 θ ln c i(θ, ρ) ρ ln c i(θ, ρ) r(θ, ρ) 1 N. ], N r i(θ, ρ). Let (θ o, ρ o) denote the true parameter values and define the following matrices V o 11 Eg(θ o)g(θ o), i=1 V o 22 Er(θ o, ρ o)r(θ o, ρ o), V12 o = V21 o Eg(θ o)r(θ o, ρ o), D o 11 E θ g(θ o), D o 21 E θ r(θ o, ρ o), D o 22 E ρr(θ o, ρ o), where expectations are with respect to the true joint density h(ε 1, ε 2). Proposition 1 Let ˆθ QMLE be the QMLE of θ and let ˆρ be obtained by minimizing r(ˆθ QMLE, ρ) B 1 o r(ˆθ QMLE, ρ), where B o = V22 o D o 21D o 11 1 V o 12 V21D o o 11 1 D o 21 +D o 21(D o 11V o 11 1 D o 11) 1 D o 21. Then, N r(ˆθ QMLE, ˆρ) B 1 o r(ˆθ QMLE, ˆρ) a χ 2 p, (25) where p is the dimension of θ. First note that, by standard QMLE results, ˆθ QMLE satisfies N(ˆθQMLE θ o) = (D o 11) 1 Nḡ(θ o) + o p(1). (26) The first order condition for ˆρ can be written as [ ρ r(ˆθ QMLE, ˆρ)] B o 1 r(ˆθ QMLE, ˆρ) = 0, D o 22 B o 1 N r(ˆθ QMLE, ˆρ) = o p(1). (27) 24

25 Now, by the mean-value theorem, we have N r(ˆθqmle, ˆρ) = N r(θ o, ρ o) + D o 21 N(ˆθQMLE θ o) + D o 22 N(ˆρ ρo) + o p(1). (28) Substituting (26) into (28), pre-multiplying by D o 22 B o 1, and solving for N(ˆρ ρ o) using (27) yields N(ˆρ ρo) = (D o 22 B 1 o D o 22) 1 D o 22 Substituting (29) and (26) into (28) and simplifying results in B 1 o N r(θo, ρ o) +(D o 22 B 1 o D o 22) 1 D o 22 B 1 o D o 21D o 11 1 Nḡ(θ o) + o p(1). (29) N r(ˆθqmle, ˆρ) = R o N φ(θo, ρ o) + o p(1), (30) where R o = I D o 22(D o 22 B 1 o D o 22) 1 D o 22 B 1 o φ(θ o, ρ o) = r(θ o, ρ o) D o 21D o 11 1ḡ(θ o). B 1 2 Note that N φ(θ o, ρ o) N(0, B o), and thus B 1/2 o N φ(θo, ρ o) N(0, I). o [I B 2 1 o D o 22(D o 22 B 1 o D o 22) 1 D o 22 B 1 2 o ]B 2 1 o. Thus, the test statistic in (25) can be written as Also, note that R ob 1 o R o = i.e. as a quadratic form in standard normals with the coefficient matrix N φ(ˆθ, ˆρ) P φ(ˆθ, ˆρ), (31) P = I p+q B 1/2 o D o 22(D o 22 B 1 o D o 22) 1 D o 22 B 1/2 o. (32) This matrix is idempotent: it is the projection matrix orthogonal to B 1 2 o D o 22. The χ 2 -test in (25) follows immediately because tr(p) = p + q rank(d o 22) = p. 25

Using copulas to model time dependence in stochastic frontier models

Using copulas to model time dependence in stochastic frontier models Christine Amsler Michigan State University Artem Prokhorov Concordia University November 2008 Peter Schmidt Michigan State University