Moment and IV Selection Approaches: A Comparative Simulation Study

Size: px

Start display at page:

Download "Moment and IV Selection Approaches: A Comparative Simulation Study"

Morgan Campbell
5 years ago
Views:

1 Moment and IV Selection Approaches: A Comparative Simulation Study Mehmet Caner Esfandiar Maasoumi Juan Andrés Riquelme August 7, 2014 Abstract We compare three moment selection approaches, followed by post selection estimation strategies. The first is adaptive lasso of Zou (2006), recently extended by Liao (2013) to possibly invalid moments in gmm. In this method, we select the valid instruments with adaptive lasso. The second method is based on the J test, as in Andrews and Lu (2001). The third one is using a Continuous Updating Objective (cue) function. This last approach is based on Hong et al. (2003) who propose a penalized generalized empirical likelihood based function to pick up valid moments. They use empirical likelihood, and exponential tilting in their simulations. However, the J test based approach of Andrews and Lu (2001) provides generally better moment selection results than the empirical likelihood and exponential tilting as can be seen in Hong et al. (2003). In this article, we examine penalized cue as a third way of selecting valid moments. Following a determination of valid moments, we run unpenalized gmm and cue and model averaging technique of Okui (2011) to see which one has better postselection estimator performance for structural parameters. The simulations are aimed at the following questions: which moment selection criterion can better select the valid ones and eliminate the invalid ones? Given the chosen instruments in the first stage, which strategy delivers the best finite sample performance? We find that the Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or moment averaging of Okui delivers generally the smallest rmse for the second stage coefficient estimators. Keywords and phrases: Shrinkage, Monte Carlo, Averaging. North Carolina State University, Department of Economics, 4168 Nelson Hall, Raleigh, NC mcaner@ncsu.edu. Emory University, Department of Economics, Atlanta, GA. esfandiar.maasoumi@emory.edu. North Carolina State University, Department of Economics. jariquel@ncsu.edu.

2 1 Introduction It is not uncommon to encounter a large number of instruments or moment conditions in the applications of instrumental variables (iv) or the Generalized Method of Moments (gmm) estimators. Some IVs or moments may be invalid, but the researcher does not know a priori which ones. This problem may be adjudicated statistically with the J-test which indicates whether overidentified restrictions are valid. If the null is rejected, the researcher would need to a moment selection technique that would allow distinguishing between the valid and invalid moment conditions. A few techniques have been proposed, each one with advantages (for example consistency) and disadvantages (such as overwhelming computational demand). In this paper we focus on information-based methods and review three of the moment selection criteria (msc) used in the current literature: (i) the shrinkage procedure as in Liao (2013) and (ii) the information-based criteria with gmm in Andrews (1999), and (iii) the information based criterion using generalized empirical likelihood of Hong et al. (2003). By using Monte Carlo simulations we compare these methods in their performance in selecting valid moments in linear settings under several relevant scenarios: small and large sample sizes, fixed and increasing number of moment conditions, weak and strong identification, local-to-zero moment conditions, homoskedastic and heteroskedastic errors. The contribution of our study is the comparison of these multistep approaches with each other in a fairly comprehensive manner. The choice of methods in this study was motivated by the following considerations: adaptive lasso is heavily used in statistics and has computational advantages in large scale problems; penalized methods in Andrews (1999) and Hong et al. (2003) are not computationally advantageous, but are used by econometricians due to the need to determine valid instruments. Further, these three methods have reasonably strong 1

3 theoretical underpinnings. We analyze second stage estimation performance, considering the finite sample properties of structural parameter estimators. To this end, we employ Okui (2011) model averaging technique to get better Mean Squared Error, and smaller bias for the structural parameters. We then compare Okui (2011) to unpenalized gmm and cue estimation, following selection of valid instruments in the first stage. We find that the Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or moment averaging of Okui, generally delivers the smallest rmse for the second stage estimation. There is a large and rich literature on Moment Selection Techniques. Smith (1992) proposes a procedure to compare competing non-nested gmm estimations, allowing for heteroskedasticity and serial correlation. Again in the GMM context, Andrews (1999) proposes a moment selection procedure using information criteria based on a J-statistic corrected for the number of moment conditions. This is analogous to the use of Akaike (aic), Bayesian (bic) and Hannan-Quinn (hqic) information criteria in model selection. He shows that the proposed methods are consistent under suitable assumptions, and also formalized the downward and upward testing procedures. The downward testing consists of iterative J-tests starting from the largest moment conditions, and proceeding down to fewer moment conditions at each iteration, until the null is not rejected. The upward testing works in the opposite order. Andrews and Lu (2001) extend these methods to model selection in dynamic panel data structures. Hong et al. (2003) propose a similar approach by using the generalized empirical likelihood defined by Newey and Smith (2000) instead of the J-statistic. A relatively new type of moment selection methods are based on shrinkage 2

4 procedures. One of the advantages of the shrinkage is its computational efficiency which is consequential, especially in high-dimensional contexts. In a brief comparison, Hastie et al. (2009, section 3.6) conclude that the shrinkage method performs better than alternative model selection techniques in reducing the estimation error. Liao (2013) shows that gmm shrinkage procedures have the oracle property in selecting the moment conditions, and adding additional valid moments improves efficiency for strongly identified parameters. Cheng and Liao (2012) used a similar approach and proposed a weighted tuning parameter that allows to shrink invalid and redundant moments. We chose the three moment selection criteria based on their optimality properties, such as the oracle property, and good finite sample performance. For model selection, assuming valid instruments, Belloni, Chernozhukov and Hansen (2011) utilize lasso-type estimators in the many iv case and provide conditions under which the iv approach is asymptotically oracle-efficient. Caner (2009) and Caner and Zhang (2013) also use shrinkage methods for model selection in a gmm context. Canay (2010) proposed the use of trapezoidal kernel weights to shrink the first stage estimators. Kuersteiner and Okui (2010) point out that, despite the advantages of the kernel shrinkage estimation, they cannot completely reduce the estimation bias and are inflexible once a particular kernel is chosen. They also propose a moment average estimator using the method in Hansen (2007) to construct optimal instruments. Okui (2011) develops a shrinkage method that minimizes the asymptotic mean squared error. An important concern in the gmm estimation is the presence of weak ivs (Hausman et al. (2005); Andrews and Stock (2007)). The results in Cheng and Liao (2012) suggest that shrinkage estimation is robust in discarding invalid ivs, but tends to include redundant ivs when identification is weak. 3

5 The rest of the paper is as follows: in the next section we provide a literature review; in section 2 we review the msc approaches under comparison. In section 3 we present the details of our Monte Carlo simulation setups. In section 4 the main results of our simulation exercises are presented. Section 5 concludes. Standard notation is used for the projection operator P A = A(A A) 1 A, where A is a matrix. 2 Theoretical Framework 2.1 Moment Selection Methods Consider a sequence of random variables {Z i } n i=1 drawn from an unknown probability distribution. The moment selection problem consists of selecting the r valid moments from a set of q candidates. A minimum set of s p valid moment conditions are required in order to identify the structural parameter vector θ, where p dim(θ). The set of q candidate moments can be separated in two subsets and the model is, for i = 1,..., n E[g S (Z i, θ 0 )] = 0 S = {1,..., s} (1) E[g S c(z i, θ 0 )]? = 0 S c = {s + 1,..., q} (2) where the sign =? means that the relationship may not hold for some of the indexes in S c. We see that r = s + s v, where s represents the number of moments in (1) (i.e. those deemed to be valid), and s v represents the number of valid moments in the second set of q s total moments in (2). S c represents the moments that may or may not be valid. Thus 0 s v q s. Our framework assumes that researcher knows a priori that s instruments are valid, and they can identify p parameters. The 4

6 question is whether to include the rest of the instruments for efficiency considerations. This is the framework used recently by Liao (2013). Note that θ 0 represents the true structural parameter vector of dimension p. The standard gmm estimator of θ 0, denoted by ˆθ n is ˆθ n argmin J(θ, W n ), θ Θ where W n is a p p symmetric and positive definite weight matrix and the objective function (Hansen, 1982) is defined as J(θ, W n ) n g n (θ) Wn g n (θ), (3) with g n (θ) = n 1 n i=1 g(z i, θ), and Θ is a compact subset of R p. For ease of notation let g(z i, θ) = g i (θ). Throughout the paper we consider the following linear model because of its computational advantages, and conduct the comparative examination in this widely used setup: y = Y θ 0 + ε (4) Y = Zπ 0 + u (5) where y is n 1 vector, Y is a n p vector of endogenous variables, Z is an n q matrix of instruments, ε and u are unobserved random variables with constant second moments and correlated with each other. We do not deal with control variables in the simulations. This makes no difference in a simulation setup. We make the following diversion from standard GMM in (3). The set of instruments are divided into valid 5

7 ones,z i1 (s 1) and the set that we suspect may contain invalid instruments Z i2 (q s 1). The sample moment conditions are defined by: g n (θ, β) = 1 n n g i (θ, β), i=1 where g i (θ, β) = (g i1 (θ), g i2 (θ, β) ) with g i1 (θ) = Z i1 (y i Y i θ), g i2 (θ, β) = Z i2 (y i Y i θ) β. The weight matrix for our nonstandard case is calculated as: W n = 1 n n g i ( θ, β)g i ( θ, β), i=1 where θ, β are the first step GMM estimators with I q as the weight matrix. The first method we discuss is the adaptive gmm shrinkage estimation method (Liao, 2013). This method has the advantage of selecting the valid moments and estimate θ in a single step. It consists of adding a slackness parameter vector β 0 to the moment conditions in (2). So the model is: E g i1(θ 0 ) g i2 (θ 0, β 0 ) = 0. and the validity of the moment conditions is verified by inference on whether β 0 = 0 or not. A moment condition j is valid only if β 0j = 0, for j = 1, q s. 6

8 The adaptive lasso estimators are defined as: (ˆθ n alasso, alasso ˆβ n ) = argmin (θ,β) Θ B [ ] q s g n (θ, β) W n g n (θ, β) + λ n ˆω j β j j=1 (6) where Θ B is the parameter space for (θ, β), ˆω j is a vector of weights, with ˆω j = 1 β j, and β j is the unpenalized standard gmm estimator using all q moments. The adaptive lasso (alasso) estimator penalizes the slackness parameter by its l 1 norm. This penalty is usually preferred because it has the oracle property (β 0j is shrunk to zero for the valid moments) and because it can be solved by using the lars algorithm (Efron et al., 2004), which represents a great computational advantage. Liao (2013) also considered alternatives adaptive lasso, as well as bridge, and smoothly clipped absolute deviation penalties, but we focus only on the adaptive lasso estimator because the penalty is convex and easy to estimate compared with others. The degree of shrinkage is given by the tuning parameter λ n 0: large values shrink more, and λ n = 0 corresponds to the gmm solution. λ n is chosen to differentiate between valid and invalid moments. The second msc that we analyze is by Andrews (1999) and extended in Andrews and Lu (2001). It consist of a penalization of the J-statistic (Hansen, 1982) in equation (3). Following Andrews (1999) notation, let c R q s denote a moment selection vector of zeros and ones such that if the jth moment condition is valid, the jth element of c is one. Let c = q s j=1 c j denote the number of moments selected by c and Z ic is the vector Z from which the jth element is deleted if the corresponding jth element in c is zero. The corresponding weight matrix is W c n of dimension s + c s + c. The 7

9 msc estimator objective function has the following general form: msc n (c) = J c (θ, W c n) h( c )κ n, (7) where J c (θ, W c n) = g n (θ) W c n g n (θ) uses the s + c moments in gmm objective function. See that g n (θ) is defined immediately below equation (3). In other words, in (7) we have W c n = n 1 n i=1 Z icz ic ɛ 2 i, where ɛ i = y i Y i θ, and θ is estimated through inefficient gmm with weight matrix as identity matrix, and this inefficient GMM uses Z ic. The algorithm for this process works as follows. For each instrument combination, we calculate first step inefficient GMM with identity as the weight matrix, and then given the inefficient GMM estimates we setup the new weight as described above, and get the parameter estimates for the second stage efficient GMM. Then we form (7) for each instrument combination, and pick the instrument combination that minimizes (7). The corresponding efficient GMM estimates are the ones that will be used. To be specific, say we have to potentially valid instruments Z1, Z2. The possible combinations are Z1 only, Z2 only, Z1, Z2 together. Then first, for Z1 only, we get inefficient GMM estimates, and use them to get weight matrix for the second stage and then get efficient GMM estimates for Z1. We repeat the same analysis for Z2, and then for Z1, Z2. So now we have three sets of efficient GMM estimates, and we choose the one that minimize (7). The choices of the function h( ) and {κ n } n 1 lead to different MSC. Andrews (1999) uses h( c ) = c p and three different choices of κ n that lead to three moment selection criteria (aic, bic, Hannan-Quinn) gmmbic: msc bic,n (c) = J c (θ, W c n) ( c p) ln n gmmaic: msc aic,n (c) = J c (θ, W c n) 2 ( c p) gmmhqic: msc hqic,n (c) = J c (θ, W c n) 2.1 ( c p) ln ln n 8

10 were the value 2.1 in gmmhqic is chosen in light of the results in Andrews (1997). For consistency among the methods we will analyze the gmmbic method in this paper. Also bic based penalty gives selection consistency in both adaptive lasso and Andrews and Lu (2001). The results for the aic and hqic cases are available on request. The third method is by Hong et al. (2003). Their method is analogous to Andrews and Lu (2001), but the J function is estimated using generalized empirical likelihood and exponential tilting statistics. However, we only use cue based objective function, as described in the introduction, due to poor performance of empirical likelihood and exponential tilting as shown in Hong et al. (2003). The objective function is the same as (7) but the weight matrix is updated continuously together with parameters until convergence. In this third method, the weight matrix is W n,cue = n 1 n i=1 Z icz icɛ 2 i, where ɛ i = y i Y i θ. 2.2 Parameter Estimation There are three methods that we will examine in the second stage of parameter estimation for θ. The first two are unpenalized gmm, and unpenalized CUE. Given valid instruments, these two methods will get parameter estimates for structural parameters. An alternative approach for parameter estimation after the moment selection has been done is the method proposed by Okui (2011): the shrinkage two stages least squares (stsls) estimation process. This method shrinks to zero some of the weights of the sample moment conditions and requires a minimum set of moments known to be valid before estimation. The stsls estimation is as follows: 9

11 for a shrinkage parameter m we define P m = P ZI + mp ZII and the stsls as ˆθ stsls n,s = (Y P m Y ) 1 Y P m y, where Z I represents s valid moments from the first set, and Z II represents the moments that may be selected as valid by an information criterion such as alasso or gmm, from the second set of q s moments/instruments. The shrinkage parameter m is chosen to minimize the Nagar (1959)-type approximation of the mean squared error. When there is only one endogenous variable, as in our simulation setup, the estimate of the optimal shrinkage parameter is: ˆm = ˆσ ε 2 Ŷ P ZI Ŷ n ˆσ 2 εu r2 n + ˆσ2 ε Ŷ P ZI Ŷ n where ˆσ ε 2 and ˆσ εu 2 are the estimates for σε 2 = E(ε 2 i ) and σu 2 = E(ε i u i ) obtained from a preliminary estimation as described in Okui (2011), and Ŷ is the prediction of Y based on a least squares regression with the selected instruments. Okui (2011) works only under homoskedasticity and m is valid only under homoskedasticity. 3 Monte Carlo Simulations The purpose of the Monte Carlo simulations is to compare the previously described msc approaches in two respects. First, effectiveness in selecting the correct moments conditions, and performance of the post selection estimators. We use the data generating process in equations (4) and (5). We have only one endogenous variable 10

12 and set the true θ 0 = 0.5. We employ (Z, ε, u) N(0, Σ) where Σ = σ 2 zzi q σ Zε 0 q σ Zε σ 2 ε σ εu 0 q σ εu σ 2 u is a (q + 2) (q + 2) symmetric matrix, σzz 2 is the variance of the instruments I q is an identity matrix of order q, σ Zε is a q 1 vector of correlations between the instruments and the structural error, 0 q is a q 1 vector of zeros, σ εu, σε 2 and σu 2 are scalars. We impose an heteroskedastic error structure of the form ε i = ε i Z i, with Z i = Z 2 i1 + + Z2 iq A moment is valid if E[g(Z i, θ 0 )] = E[Z i(y Y θ 0 )] = E[Z iε] = σ Zε = 0. We generate invalid moments by constructing σ Zε vectors in two ways: (1) constant correlation D 0 between the instrument and the structural error, and (2) local to zero correlation of the form 1/n, 1/ n and 1/ 3 n to explore different convergence rates. The homoskedastic case is when ε i = ε i. In all setups we have q total moments, and s of them known to be valid to the researcher a priori. However, there are a total of r = s + s v valid moments, so we have to select s v valid moments among q s of them. The number of valid and invalid moment conditions is generated in two ways: in the first setup we simulate data from a fixed number of moments: q = 11, s = 3 and r = 7. That is, there are 11 moments and we know that 3 of them are valid. We have to select among the other 8. We set 4 of them valid and 4 of them invalid. The errors for this setup are homoskedastic. In the second setup we allow the number of valid moments to increase with the 11

13 sample size: q = n, s = q and s v = (q s)/2, that is, we have to choose among q s candidates and we set half of them as valid. The errors for this setup are heteroskedastic. These are denoted as Setup 1 and Setup 2 respectively. In our Setup 1, Σ is a matrix constructed as follows: we simulate Z R 11 divided in three categories: the first set of instruments are known to be strong and valid (s = 3) as required by the mscs described in the previous section. As mentioned before, the next set of instruments is divided into two categories: the first four instruments are valid (s v = 4) and the last q r = 4 are invalid. The last elements of Σ are σ Zε = (0, 0, 0, 0, 0, 0, 0, D, D, D, D) in the constant correlation case and σ Zε = (0, 0, 0, 0, 0, 0, h, n h h, n 3 n, h ) in the local to zero scenario. Note that we n use three rates for the local to zero moments which are recycled as needed. We set σ 2 ε = 1, σ 2 u = 0.5, D = 0.2 and h = 1. For each correlation structure we investigate weak and strong identification scenarios by changing π 0 in equation 5. In the strong identification scenario we have π 0 = and in the weak identification case we have π 0 = (2 1 3, ) with 1 l being a row vector of ones of length l. The second setup is constructed in an analogous manner. We set the variance of the instruments σ 2 zz = {0.5, 1.0} I q and the covariance between the structural and reduced form errors σ εu = 0.5. This gives us two cases: in the Case 1 σ 2 zz = 0.5 I q and cov ue = 0.5 and in the Case 2 σ 2 zz = 1.0 I q and cov ue = 0.5. We have estimated many other cases for the covariance matrix: in the Case 3 σ 2 zz = I q and cov ue = 0.5, in the Case 4 σ 2 zz = I q and cov ue = 0.9, in the Case 5 σ 2 zz = 2 I q and cov ue = 0.5 and in the Case 6 σ 2 zz = 2 I q and cov ue = 0.9. These cases and the local-to-zero ones are available on request. 12

14 The simulated sample sizes are n = {50, 100, 250}. All the results in the next section are based on 1000 repetitions. 4 Results We focus only on the most relevant and salient results of our simulation exercises: Cases 1 and 2 for Setups 1 and 2 using invalid moments, with constant correlation with the structural equation. We are not presenting all our simulated scenarios for economy of space, and because the general results presented here hold across all the alternative setups 1. We will focus on the weak and strong identification cases with σzz 2 = 0.5I q and σzz 2 = 1I q. The analysis of the results is done with reference to two questions: how good are the msc selection procedures, and which technique gives the best estimation of the structural parameter θ 0. The R 2 of the first stage regression is presented in Table 7. It ranges from to depending on the strength of the identification and the number of observations. The moment selection methods are the adaptive lasso (alasso), penalized gmm (gmm pen ) and penalized cue (cue pen ). We have nine post selection structural parameter estimators: alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui (2011) s moment averaging estimator in the second stage. Setup 1 is the homoskedastic setup, and we use the optimal m in Okui (2011) MA method, which works only under homoskedasticity. Setup 2 allows heteroskedasticity. So Okui (2011) method is not designed for that and m in Section 2.2 here is only designed for homoskedasticity. But we still use m in Section 2.2 in Setup 2 simulations to see how this method fares 1 We have extensive results for all the moment selection techniques discussed in the section 2 for fixed correlation and local to zero correlation between the instruments and structural error available on request. 13

15 under heteroskedasticity. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the second stage. gmm pen gmm selects the moments in the same way as in the previous method but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. A summary of our results is presented in Tables 1 and 2 for model selection and post selection estimation performance respectively. In Table 1 we present the average ranking of each method on the probability of selecting the exact valid moments in Tables 3 and 4 for each sample size and strength of the identification. In case of a tie the methods get the same ranking (and we can have two first or two second places). From these tables we can see that the adaptive lasso method is the best in perfect moment selection. In Table 2 we present the performance of the post selection estimation methods. The performance is assessed by the rmse. The rankings are based on the relative performance in Tables 5a to 6b, presented by sample size and strength of identification. The estimator with the smallest value acquires the rank of 1. Two estimators with the same rmse are given the same rank. The Average Ranking ranges from 1 to 9 14

16 Table 1: Summary of the Performance of the Moment Selection Techniques Setup 1 Setup 2 alasso gmm pen cue pen Figures correspond to the average ranking of each method based on the probability of selecting the exact valid moments. The latter are in Tables 3 and 4, by sample size and strength of identification. In case of a tie the methods get the same ranking (we can have two first or two second places). alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. and the frequency of being in the Top Three from 0 to 12. From Table 2 we conclude that the best estimator is obtained using the adaptive lasso method to select the moments, followed by the moment averaging procedure (alasso ma). The moment averaging procedure improves estimation for the three moment selection techniques. The worse estimators are obtained with the cue method. In the heteroskedastic setup (Setup 2) adaptive lasso-ma (moment averaging in the second stage) is still the best in terms of RMSE, but not as good compared to the homoskedastic case (Setup 1). In the next sections we present the detailed analysis of the moment selection and post selection estimation methods. 4.1 Model Selection We analyze three msc methods: the Adaptive Lasso in equation 6, the Penalized Efficient gmm and the Penalized Continuously Updated gmm in equation 7. In all cases we adopt the bic criteria. For each method we measure its performance by the probability of three events: (1) the method selects the true number of valid moments and none of the invalid ones (perfect selection), (2) it selects only valid moments but strictly less or greater than true number of valid ones, and does not select any invalid at the same time, and (3), 15

17 Table 2: Summary of the Performance of the Post Selection Techniques Setup 1 Setup 2 Average Times at the Average Times at the Ranking top three Ranking top three alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue The performance is analyzed in terms of the rmse. The rankings are based in the relative performance in the Tables 5a to 6b. The estimator with the smaller value takes the rank of 1. If there is a tie the estimators are given the same rank. The average ranking ranges from 1 to 9 and the times at the top three from 0 to 12. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 16

18 it selects any invalid moments. The first probability shows the probability of being perfect. The second choice is second best, since we do not choose the correct number of valid moments, but still we choose only valid ones. One can benefit from this in the second stage of structural parameter estimation in gmm. The third probability provides how bad can the moment selection criteria behave. Since invalid moments can affect the finite sample bias of the second stage structural parameter estimate badly, we would prefer the third probability to be low. The results are presented in Tables 3 and 4 for, respectively, the first setup (fixed number of moments, homoskedastic case), and the second setup (increasing number of moments, heteroskedastic case) for the weak and strong identification cases and covariance among the instruments σzz 2 = 0.5 I q and σzz 2 = I q. In Table 3, Setup 1, for σzz 2 = 0.5 I q we find that in the smallest sample case, n = 50, all three msc approaches behave poorly, particularly the gmm pen, selecting invalid moments with a probability of in the weak identification case, and 1 in the strong identification case. The best method is the alasso, which selects invalid moments with a probability of and in the weak and strong identification cases respectively. The performance of the three methods improves when the sample size increases to n = 100, but their relative positions remain the same: alasso dominates the penalized methods. In this case the alasso selects invalid moments with probability and in the weak and strong identification cases, whereas the penalized methods do so with probabilities above The performance ranking changes when the sample size is increased to n = 250: the alasso still selects invalid moments with the smallest probability, but the penalized cue method select perfectly with probability and in strong and weak identification, compared with 17

19 0.355 and for the alasso. However, if the objective is avoid selecting any invalid moments, then the alasso still dominates, selecting any invalid moment with probability Since alasso and the penalized methods are selection consistent we can take this as evidence of differences in the convergence rates, with the penalized methods converging faster in this case and the differences in the performance of the penalized methods is negligible. This is not true for the next case. In Table 3, Setup 1, when σzz 2 = 1 I q the relative performance of the methods is the same as in the previous case, but with alasso dominating in all the cases and criteria. However the penalized methods are catching up the performance of alasso at the sample size increases, with the cue pen slightly dominates its counterpart gmm pen in all the cases. Also in all the cases the methods behave poorly compared to the setup with σzz 2 = 0.5 I q. The conclusions for the Setup 2 are the same as those for the Setup 1. It is noteworthy that the alasso moves smoothly between the three performance measures (prefect, valid and invalid selection), whereas the penalized methods jump from selecting invalid instruments to perfect selection as the sample size increases, with undesirably small probabilities of selecting only valid moments (but not perfect) under all our setups. 4.2 Post Selection Performance In this section we analyze the post selection performance of the msc in terms of the bias, standard deviation and rmse of the estimate for ˆθ. For each of the three methods we estimate the structural parameter using efficient and continuously updated gmm, and the moment averaging method in Okui (2011). The results for the Setups 1 and 18

20 Table 3: Probabilities: Moment Selection Criteria. Setup 1 Weak Identification Strong Identification Perfect Only Any Perfect Only Any selection Valid Invalid selection Valid Invalid σzz 2 = 0.5 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen σzz 2 = 1 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen Note: alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii) Any invalid: the method select any invalid moment. Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. 19

21 Table 4: Probabilities: Moment Selection Criteria. Setup 2 Weak Identification Strong Identification Perfect Only Any Perfect Only Any selection Valid Invalid selection Valid Invalid σzz 2 = 0.5 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen σzz 2 = 1 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen Note: alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii) Any invalid: the method select any invalid moment. Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. 20

22 2 are presented in Tables 5a, 5b and 6a, 6b respectively. In Table 6a, Setup 2, with weak identification and σzz 2 = 0.5 I q we find that with sample size n = 50 the best estimator is obtained by using alasso ma with rmse at The worst method is the cue with a rmse of Note that when using the full set of instruments the gmm estimator performs better than the cue in terms of rmse, but cue has a smaller bias in the weak identification case. With the sample size n = 50 the adaptive lasso based methods are the best in rmse in the strong identification case. As the sample size increases all the estimators converge to the true value. With σzz 2 = 1 I q the relative performance remains the same, but the rmse and the standard deviations are smaller. In terms of coverage, alasso-gmm performs the best among several specifications. In all setups and specifications, we see that alasso-gmm comes close to 95% coverage whereas other methods cannot replicate this behavior. In terms of bias, in more relevant Setup 2 (Tables 6a-b), we see that adaptive lasso based methods do very well, but so does penalized cue in first stage, followed with second stage cue. 5 Conclusion We have studied the relative performance of several moment selection techniques in selecting the correct moments and in estimating the structural parameter. Our simulations suggest that using adaptive lasso in the first stage, obtaining valid instruments, followed by gmm, or moment averaging, deliver the most satisfactory rmse for the structural parameter in both the homoskedastic and heteroskedastic cases. This approach has important computational benefits due to the possibility of estimation based on the lars algorithm, which makes it a good practical choice when the number of instruments grows large. 21

23 Table 5a: Monte Carlo results for ˆθ. Setup 1. (PART 1) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 0.5 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 22

24 Table 5b: Monte Carlo results for ˆθ. Setup 1. (PART 2) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 1 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 23

25 Table 6a: Monte Carlo results for ˆθ. Setup 2. (PART 1) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 0.5 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method and then using Okui s model averaging estimator, alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments and then use them in the efficient and cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm for model selection and then model averaging estimator. gmm pen gmm selects the moments in the same way and the structural parameter is estimated using efficient gmm. cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator, cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 24

26 Table 6b: Monte Carlo results for ˆθ. Setup 2. (PART 2) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 1 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method and then using Okui s model averaging estimator, alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments and then use them in the efficient and cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm for model selection and then model averaging estimator. gmm pen gmm selects the moments in the same way and the structural parameter is estimated using efficient gmm. cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator, cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 25

Big Data, Model Selection, Aggregation-Indexing

Big Data, Model Selection, Aggregation-Indexing Esfandiar Maasoumi 1 1 Emory University September 5, 2016 BIG Data-Summer School-Khatam-Pasargad MAASOUMI What Does Big Data Mean? BIG Data-Summer School-Khatam-Pasargad