Double Bootstrap Confidence Intervals in the Two Stage DEA approach D.K. Chronopoulos, C. Girardone and J.C. Nankervis Essex Business School University of Essex 1
Determinants of efficiency DEA can be a useful tool in the hands of managers identify best practices. Efficiency levels might reflect not only the ability of the management, but the effects of contextual factors on firm s performance, as well. A second stage regression analysis on efficiency estimates can help quantify these effects. Understanding these relationships can be of help to: Managers improve firm s performance. Policy makers better assess cost of regulation. 2
Second Stage Regression: The problem The dependency problem: Efficiency measures estimated with DEA are dependent on each other by definition. (The estimator has a convergence rate of 2 p q 1 n + + ). This dependency disappears asymptotically, but generally at a rate slower than the usual n achieved by the truncated or censored MLE. Conventional inference procedures are invalid, when dimensionality of production is greater than 3 ( p + q > 3) (Xue and Harker 1999; Simar and Wilson 2007). The suggested solution: Bootstrap confidence intervals (Simar and Wilson 2007) 3
Aims Examine the convergence properties of the coverage rates of the alternative bootstrap confidence intervals estimators. Investigate the coverage accuracy of double bootstrap confidence intervals. Provide a less computationally demanding algorithm for constructing double bootstrap confidence intervals. 4
Data Generating Process A firm faces an environmental variable Z ~ N (2,4). Given Z, the production efficiency level δ is drawn from f( δ / Z). The conditioning operates through this mechanism δ = Zβ + ε [1], where ε ~ truncatedn(0,1), with left truncation at 1 Zβ. The input(s) are distributed as x U ) y P 1 3/4 xp p 1 P 1 3/ 4 xp p 1 = δ. = p ~ (6,16. We distinguish between single and multi output technologies: Single output: Multi output: ζ = δ. = If 2 then draw α U ) l 1 If Q 2 then additionally draw α ~ U (0,1 α ), for each Q = 1 ~ (0,1. l = 2,..., Q 1. l k1 K Then the output mix is given by yq = αζ q and q= 1,..., Q 1. Q Q 1 = (1 ) k= 1 k, for y α ζ 5
Step 1: Step 2: Step 3: Bootstrap Confidence Intervals Estimate the efficiency levels ˆ δ. Regress ˆ δ on the environmental variable Z using the truncated regression model to obtain ˆβ and σ ˆε estimates. * * Construct pseudo ˆ δ by drawing ε from the parametric distribution of the * errors truncated N 0, ˆ σ ) such that ˆ* δ = Z ˆ β + ε. A bootstrap estimate ( ε * of the parameter of interest is obtained by regressing ˆ δ on Z and denoted ˆ* β. Repeat the procedure J times. The basic bootstrap CI is given by: ˆ β ( ˆ β ˆ β), ˆ β ( ˆ β ˆ β) * * (1 α)( J+ 1) ( α( J+ 1)) The percentile bootstrap CI is given by: * * ˆ ˆ β( α( J 1), β + (1 α)( J+ 1) 6
Double Bootstrap Confidence Intervals Frequently the nominal coverage probability of the bootstrap CI differs from the true one. Step 4: Step 5: For each set of single bootstrap estimates construct a double bootstrap ** * ** ** sample ˆk δ = Zβ + ε k. Again use the truncated regression to obtain ˆ β k. Repeat the process K times. ** * Compute the statistic: U #( ˆ 2 ˆ ˆ = β k β β) K for the basic CI or ˆ** U = #( β ˆ β) K for the percentile CI. k The basic double bootstrap CI is given by: ˆ β ( ˆ β ˆ β), ˆ β ( ˆ β ˆ β) * * ( U ( J+ 1)) ( U ( J+ 1)) (1 a)( J+ 1) ( α ( J+ 1) The percentile double bootstrap CI is given by: ˆ β, α J β * * ( U ( J+ 1)) ( U ( J+ 1) ) ( ( + 1) ((1 α)( J+ 1) 7
The 25 th and 26 th values are the upper and lower bounds of and respectively. Stopping rules for double bootstrap Suppose J = 999 then U (25) and U(975) are required. Start with calculating 50 U and sort them in an increasing order. If the is greater than the current bound of and smaller than U then not all U 51 U (25) (975) K double bootstrap estimations are required. U (25) U(975) 8
Monte Carlo evidence - Single bootstrap Table 1. Estimated coverages of confidence intervals generated by conventio single bootstrap methods n Basic Boot. Alg.- Nominal significance Percentile Boot. Alg.- Nominal significance Asympt. Normal Apr.- Nominal significance 0.90 0.95 0.90 0.95 0.90 0.95 p = q = 1 100 0.83 0.89 0.88 0.94 0.85 0.90 200 0.86 0.91 0.88 0.93 0.87 0.92 400 0.88 0.93 0.90 0.94 0.89 0.93 1200 5000 10000 15000 100 0.88 0.94 0.89 0.94 0.89 0.94 0.90 0.95 0.90 0.95 0.90 0.95 0.90 0.96 0.90 0.96 0.90 0.96 0.91 0.95 0.91 0.95 0.91 0.95 0.76 0.81 0.82 0.88 - - 200 0.80 0.84 0.83 0.89 - - 400 0.80 0.87 0.83 0.90 - - 1200 0.83 0.88 0.85 0.90 - - 5000 0.82 0.90 0.84 0.91 - - 10000 0.85 0.91 0.85 0.92 - - 15000 0.87 0.93 0.88 0.93 - - 100 0.69 0.73 0.79 0.85 - - 200 0.76 0.80 0.81 0.87 - - 400 0.72 0.78 0.77 0.85 - - 1200 0.69 0.79 0.73 0.83 - - 5000 0.65 0.75 0.66 0.77 - - 10000 0.58 0.71 0.59 0.74 - - 15000 0.60 0.70 0.60 0.71 - - Notes: Results based on 1,000 Monte Carlo trials p = q = 2 p = q = 3 9
Monte Carlo evidence - Double bootstrap Table 2. Estimated coverages of confidence intervals generated by percentile single and double bootstrap methods n Percentile Single Boot. Nominal significance Percentile Double Boot. Nominal significance 0.90 0.95 0.90 0.95 p = q = 1 100 0.88 0.94 0.90 0.95 200 0.88 0.93 0.90 0.95 400 0.90 0.94 0.91 0.96 p = q = 2 100 0.82 0.88 0.86 0.93 200 0.83 0.89 0.86 0.92 400 0.83 0.90 0.86 0.93 p = q = 3 100 0.79 0.85 0.83 0.90 200 0.81 0.87 0.85 0.92 400 0.77 0.85 0.80 0.90 Notes: Results based on 1,000 Monte Carlo trials 10
Conclusions Correlation of efficiency estimates disappears, but not fast enough. Need for alternative inference making method. Bootstrap offers a good alternative, but single bootstrap CIs do not have good coverage rates (the dimensionality problem of the efficiency estimator carries over to the second stage regression). Double bootstrap offers a significant improvement but at a considerable computational cost. This computational burden can be reduced by adopting deterministic stopping rules (in the spirit of Nankervis(2005)). 11