Moment and IV Selection Approaches: A Comparative Simulation Study
|
|
- Morgan Campbell
- 5 years ago
- Views:
Transcription
1 Moment and IV Selection Approaches: A Comparative Simulation Study Mehmet Caner Esfandiar Maasoumi Juan Andrés Riquelme August 7, 2014 Abstract We compare three moment selection approaches, followed by post selection estimation strategies. The first is adaptive lasso of Zou (2006), recently extended by Liao (2013) to possibly invalid moments in gmm. In this method, we select the valid instruments with adaptive lasso. The second method is based on the J test, as in Andrews and Lu (2001). The third one is using a Continuous Updating Objective (cue) function. This last approach is based on Hong et al. (2003) who propose a penalized generalized empirical likelihood based function to pick up valid moments. They use empirical likelihood, and exponential tilting in their simulations. However, the J test based approach of Andrews and Lu (2001) provides generally better moment selection results than the empirical likelihood and exponential tilting as can be seen in Hong et al. (2003). In this article, we examine penalized cue as a third way of selecting valid moments. Following a determination of valid moments, we run unpenalized gmm and cue and model averaging technique of Okui (2011) to see which one has better postselection estimator performance for structural parameters. The simulations are aimed at the following questions: which moment selection criterion can better select the valid ones and eliminate the invalid ones? Given the chosen instruments in the first stage, which strategy delivers the best finite sample performance? We find that the Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or moment averaging of Okui delivers generally the smallest rmse for the second stage coefficient estimators. Keywords and phrases: Shrinkage, Monte Carlo, Averaging. North Carolina State University, Department of Economics, 4168 Nelson Hall, Raleigh, NC mcaner@ncsu.edu. Emory University, Department of Economics, Atlanta, GA. esfandiar.maasoumi@emory.edu. North Carolina State University, Department of Economics. jariquel@ncsu.edu.
2 1 Introduction It is not uncommon to encounter a large number of instruments or moment conditions in the applications of instrumental variables (iv) or the Generalized Method of Moments (gmm) estimators. Some IVs or moments may be invalid, but the researcher does not know a priori which ones. This problem may be adjudicated statistically with the J-test which indicates whether overidentified restrictions are valid. If the null is rejected, the researcher would need to a moment selection technique that would allow distinguishing between the valid and invalid moment conditions. A few techniques have been proposed, each one with advantages (for example consistency) and disadvantages (such as overwhelming computational demand). In this paper we focus on information-based methods and review three of the moment selection criteria (msc) used in the current literature: (i) the shrinkage procedure as in Liao (2013) and (ii) the information-based criteria with gmm in Andrews (1999), and (iii) the information based criterion using generalized empirical likelihood of Hong et al. (2003). By using Monte Carlo simulations we compare these methods in their performance in selecting valid moments in linear settings under several relevant scenarios: small and large sample sizes, fixed and increasing number of moment conditions, weak and strong identification, local-to-zero moment conditions, homoskedastic and heteroskedastic errors. The contribution of our study is the comparison of these multistep approaches with each other in a fairly comprehensive manner. The choice of methods in this study was motivated by the following considerations: adaptive lasso is heavily used in statistics and has computational advantages in large scale problems; penalized methods in Andrews (1999) and Hong et al. (2003) are not computationally advantageous, but are used by econometricians due to the need to determine valid instruments. Further, these three methods have reasonably strong 1
3 theoretical underpinnings. We analyze second stage estimation performance, considering the finite sample properties of structural parameter estimators. To this end, we employ Okui (2011) model averaging technique to get better Mean Squared Error, and smaller bias for the structural parameters. We then compare Okui (2011) to unpenalized gmm and cue estimation, following selection of valid instruments in the first stage. We find that the Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or moment averaging of Okui, generally delivers the smallest rmse for the second stage estimation. There is a large and rich literature on Moment Selection Techniques. Smith (1992) proposes a procedure to compare competing non-nested gmm estimations, allowing for heteroskedasticity and serial correlation. Again in the GMM context, Andrews (1999) proposes a moment selection procedure using information criteria based on a J-statistic corrected for the number of moment conditions. This is analogous to the use of Akaike (aic), Bayesian (bic) and Hannan-Quinn (hqic) information criteria in model selection. He shows that the proposed methods are consistent under suitable assumptions, and also formalized the downward and upward testing procedures. The downward testing consists of iterative J-tests starting from the largest moment conditions, and proceeding down to fewer moment conditions at each iteration, until the null is not rejected. The upward testing works in the opposite order. Andrews and Lu (2001) extend these methods to model selection in dynamic panel data structures. Hong et al. (2003) propose a similar approach by using the generalized empirical likelihood defined by Newey and Smith (2000) instead of the J-statistic. A relatively new type of moment selection methods are based on shrinkage 2
4 procedures. One of the advantages of the shrinkage is its computational efficiency which is consequential, especially in high-dimensional contexts. In a brief comparison, Hastie et al. (2009, section 3.6) conclude that the shrinkage method performs better than alternative model selection techniques in reducing the estimation error. Liao (2013) shows that gmm shrinkage procedures have the oracle property in selecting the moment conditions, and adding additional valid moments improves efficiency for strongly identified parameters. Cheng and Liao (2012) used a similar approach and proposed a weighted tuning parameter that allows to shrink invalid and redundant moments. We chose the three moment selection criteria based on their optimality properties, such as the oracle property, and good finite sample performance. For model selection, assuming valid instruments, Belloni, Chernozhukov and Hansen (2011) utilize lasso-type estimators in the many iv case and provide conditions under which the iv approach is asymptotically oracle-efficient. Caner (2009) and Caner and Zhang (2013) also use shrinkage methods for model selection in a gmm context. Canay (2010) proposed the use of trapezoidal kernel weights to shrink the first stage estimators. Kuersteiner and Okui (2010) point out that, despite the advantages of the kernel shrinkage estimation, they cannot completely reduce the estimation bias and are inflexible once a particular kernel is chosen. They also propose a moment average estimator using the method in Hansen (2007) to construct optimal instruments. Okui (2011) develops a shrinkage method that minimizes the asymptotic mean squared error. An important concern in the gmm estimation is the presence of weak ivs (Hausman et al. (2005); Andrews and Stock (2007)). The results in Cheng and Liao (2012) suggest that shrinkage estimation is robust in discarding invalid ivs, but tends to include redundant ivs when identification is weak. 3
5 The rest of the paper is as follows: in the next section we provide a literature review; in section 2 we review the msc approaches under comparison. In section 3 we present the details of our Monte Carlo simulation setups. In section 4 the main results of our simulation exercises are presented. Section 5 concludes. Standard notation is used for the projection operator P A = A(A A) 1 A, where A is a matrix. 2 Theoretical Framework 2.1 Moment Selection Methods Consider a sequence of random variables {Z i } n i=1 drawn from an unknown probability distribution. The moment selection problem consists of selecting the r valid moments from a set of q candidates. A minimum set of s p valid moment conditions are required in order to identify the structural parameter vector θ, where p dim(θ). The set of q candidate moments can be separated in two subsets and the model is, for i = 1,..., n E[g S (Z i, θ 0 )] = 0 S = {1,..., s} (1) E[g S c(z i, θ 0 )]? = 0 S c = {s + 1,..., q} (2) where the sign =? means that the relationship may not hold for some of the indexes in S c. We see that r = s + s v, where s represents the number of moments in (1) (i.e. those deemed to be valid), and s v represents the number of valid moments in the second set of q s total moments in (2). S c represents the moments that may or may not be valid. Thus 0 s v q s. Our framework assumes that researcher knows a priori that s instruments are valid, and they can identify p parameters. The 4
6 question is whether to include the rest of the instruments for efficiency considerations. This is the framework used recently by Liao (2013). Note that θ 0 represents the true structural parameter vector of dimension p. The standard gmm estimator of θ 0, denoted by ˆθ n is ˆθ n argmin J(θ, W n ), θ Θ where W n is a p p symmetric and positive definite weight matrix and the objective function (Hansen, 1982) is defined as J(θ, W n ) n g n (θ) Wn g n (θ), (3) with g n (θ) = n 1 n i=1 g(z i, θ), and Θ is a compact subset of R p. For ease of notation let g(z i, θ) = g i (θ). Throughout the paper we consider the following linear model because of its computational advantages, and conduct the comparative examination in this widely used setup: y = Y θ 0 + ε (4) Y = Zπ 0 + u (5) where y is n 1 vector, Y is a n p vector of endogenous variables, Z is an n q matrix of instruments, ε and u are unobserved random variables with constant second moments and correlated with each other. We do not deal with control variables in the simulations. This makes no difference in a simulation setup. We make the following diversion from standard GMM in (3). The set of instruments are divided into valid 5
7 ones,z i1 (s 1) and the set that we suspect may contain invalid instruments Z i2 (q s 1). The sample moment conditions are defined by: g n (θ, β) = 1 n n g i (θ, β), i=1 where g i (θ, β) = (g i1 (θ), g i2 (θ, β) ) with g i1 (θ) = Z i1 (y i Y i θ), g i2 (θ, β) = Z i2 (y i Y i θ) β. The weight matrix for our nonstandard case is calculated as: W n = 1 n n g i ( θ, β)g i ( θ, β), i=1 where θ, β are the first step GMM estimators with I q as the weight matrix. The first method we discuss is the adaptive gmm shrinkage estimation method (Liao, 2013). This method has the advantage of selecting the valid moments and estimate θ in a single step. It consists of adding a slackness parameter vector β 0 to the moment conditions in (2). So the model is: E g i1(θ 0 ) g i2 (θ 0, β 0 ) = 0. and the validity of the moment conditions is verified by inference on whether β 0 = 0 or not. A moment condition j is valid only if β 0j = 0, for j = 1, q s. 6
8 The adaptive lasso estimators are defined as: (ˆθ n alasso, alasso ˆβ n ) = argmin (θ,β) Θ B [ ] q s g n (θ, β) W n g n (θ, β) + λ n ˆω j β j j=1 (6) where Θ B is the parameter space for (θ, β), ˆω j is a vector of weights, with ˆω j = 1 β j, and β j is the unpenalized standard gmm estimator using all q moments. The adaptive lasso (alasso) estimator penalizes the slackness parameter by its l 1 norm. This penalty is usually preferred because it has the oracle property (β 0j is shrunk to zero for the valid moments) and because it can be solved by using the lars algorithm (Efron et al., 2004), which represents a great computational advantage. Liao (2013) also considered alternatives adaptive lasso, as well as bridge, and smoothly clipped absolute deviation penalties, but we focus only on the adaptive lasso estimator because the penalty is convex and easy to estimate compared with others. The degree of shrinkage is given by the tuning parameter λ n 0: large values shrink more, and λ n = 0 corresponds to the gmm solution. λ n is chosen to differentiate between valid and invalid moments. The second msc that we analyze is by Andrews (1999) and extended in Andrews and Lu (2001). It consist of a penalization of the J-statistic (Hansen, 1982) in equation (3). Following Andrews (1999) notation, let c R q s denote a moment selection vector of zeros and ones such that if the jth moment condition is valid, the jth element of c is one. Let c = q s j=1 c j denote the number of moments selected by c and Z ic is the vector Z from which the jth element is deleted if the corresponding jth element in c is zero. The corresponding weight matrix is W c n of dimension s + c s + c. The 7
9 msc estimator objective function has the following general form: msc n (c) = J c (θ, W c n) h( c )κ n, (7) where J c (θ, W c n) = g n (θ) W c n g n (θ) uses the s + c moments in gmm objective function. See that g n (θ) is defined immediately below equation (3). In other words, in (7) we have W c n = n 1 n i=1 Z icz ic ɛ 2 i, where ɛ i = y i Y i θ, and θ is estimated through inefficient gmm with weight matrix as identity matrix, and this inefficient GMM uses Z ic. The algorithm for this process works as follows. For each instrument combination, we calculate first step inefficient GMM with identity as the weight matrix, and then given the inefficient GMM estimates we setup the new weight as described above, and get the parameter estimates for the second stage efficient GMM. Then we form (7) for each instrument combination, and pick the instrument combination that minimizes (7). The corresponding efficient GMM estimates are the ones that will be used. To be specific, say we have to potentially valid instruments Z1, Z2. The possible combinations are Z1 only, Z2 only, Z1, Z2 together. Then first, for Z1 only, we get inefficient GMM estimates, and use them to get weight matrix for the second stage and then get efficient GMM estimates for Z1. We repeat the same analysis for Z2, and then for Z1, Z2. So now we have three sets of efficient GMM estimates, and we choose the one that minimize (7). The choices of the function h( ) and {κ n } n 1 lead to different MSC. Andrews (1999) uses h( c ) = c p and three different choices of κ n that lead to three moment selection criteria (aic, bic, Hannan-Quinn) gmmbic: msc bic,n (c) = J c (θ, W c n) ( c p) ln n gmmaic: msc aic,n (c) = J c (θ, W c n) 2 ( c p) gmmhqic: msc hqic,n (c) = J c (θ, W c n) 2.1 ( c p) ln ln n 8
10 were the value 2.1 in gmmhqic is chosen in light of the results in Andrews (1997). For consistency among the methods we will analyze the gmmbic method in this paper. Also bic based penalty gives selection consistency in both adaptive lasso and Andrews and Lu (2001). The results for the aic and hqic cases are available on request. The third method is by Hong et al. (2003). Their method is analogous to Andrews and Lu (2001), but the J function is estimated using generalized empirical likelihood and exponential tilting statistics. However, we only use cue based objective function, as described in the introduction, due to poor performance of empirical likelihood and exponential tilting as shown in Hong et al. (2003). The objective function is the same as (7) but the weight matrix is updated continuously together with parameters until convergence. In this third method, the weight matrix is W n,cue = n 1 n i=1 Z icz icɛ 2 i, where ɛ i = y i Y i θ. 2.2 Parameter Estimation There are three methods that we will examine in the second stage of parameter estimation for θ. The first two are unpenalized gmm, and unpenalized CUE. Given valid instruments, these two methods will get parameter estimates for structural parameters. An alternative approach for parameter estimation after the moment selection has been done is the method proposed by Okui (2011): the shrinkage two stages least squares (stsls) estimation process. This method shrinks to zero some of the weights of the sample moment conditions and requires a minimum set of moments known to be valid before estimation. The stsls estimation is as follows: 9
11 for a shrinkage parameter m we define P m = P ZI + mp ZII and the stsls as ˆθ stsls n,s = (Y P m Y ) 1 Y P m y, where Z I represents s valid moments from the first set, and Z II represents the moments that may be selected as valid by an information criterion such as alasso or gmm, from the second set of q s moments/instruments. The shrinkage parameter m is chosen to minimize the Nagar (1959)-type approximation of the mean squared error. When there is only one endogenous variable, as in our simulation setup, the estimate of the optimal shrinkage parameter is: ˆm = ˆσ ε 2 Ŷ P ZI Ŷ n ˆσ 2 εu r2 n + ˆσ2 ε Ŷ P ZI Ŷ n where ˆσ ε 2 and ˆσ εu 2 are the estimates for σε 2 = E(ε 2 i ) and σu 2 = E(ε i u i ) obtained from a preliminary estimation as described in Okui (2011), and Ŷ is the prediction of Y based on a least squares regression with the selected instruments. Okui (2011) works only under homoskedasticity and m is valid only under homoskedasticity. 3 Monte Carlo Simulations The purpose of the Monte Carlo simulations is to compare the previously described msc approaches in two respects. First, effectiveness in selecting the correct moments conditions, and performance of the post selection estimators. We use the data generating process in equations (4) and (5). We have only one endogenous variable 10
12 and set the true θ 0 = 0.5. We employ (Z, ε, u) N(0, Σ) where Σ = σ 2 zzi q σ Zε 0 q σ Zε σ 2 ε σ εu 0 q σ εu σ 2 u is a (q + 2) (q + 2) symmetric matrix, σzz 2 is the variance of the instruments I q is an identity matrix of order q, σ Zε is a q 1 vector of correlations between the instruments and the structural error, 0 q is a q 1 vector of zeros, σ εu, σε 2 and σu 2 are scalars. We impose an heteroskedastic error structure of the form ε i = ε i Z i, with Z i = Z 2 i1 + + Z2 iq A moment is valid if E[g(Z i, θ 0 )] = E[Z i(y Y θ 0 )] = E[Z iε] = σ Zε = 0. We generate invalid moments by constructing σ Zε vectors in two ways: (1) constant correlation D 0 between the instrument and the structural error, and (2) local to zero correlation of the form 1/n, 1/ n and 1/ 3 n to explore different convergence rates. The homoskedastic case is when ε i = ε i. In all setups we have q total moments, and s of them known to be valid to the researcher a priori. However, there are a total of r = s + s v valid moments, so we have to select s v valid moments among q s of them. The number of valid and invalid moment conditions is generated in two ways: in the first setup we simulate data from a fixed number of moments: q = 11, s = 3 and r = 7. That is, there are 11 moments and we know that 3 of them are valid. We have to select among the other 8. We set 4 of them valid and 4 of them invalid. The errors for this setup are homoskedastic. In the second setup we allow the number of valid moments to increase with the 11
13 sample size: q = n, s = q and s v = (q s)/2, that is, we have to choose among q s candidates and we set half of them as valid. The errors for this setup are heteroskedastic. These are denoted as Setup 1 and Setup 2 respectively. In our Setup 1, Σ is a matrix constructed as follows: we simulate Z R 11 divided in three categories: the first set of instruments are known to be strong and valid (s = 3) as required by the mscs described in the previous section. As mentioned before, the next set of instruments is divided into two categories: the first four instruments are valid (s v = 4) and the last q r = 4 are invalid. The last elements of Σ are σ Zε = (0, 0, 0, 0, 0, 0, 0, D, D, D, D) in the constant correlation case and σ Zε = (0, 0, 0, 0, 0, 0, h, n h h, n 3 n, h ) in the local to zero scenario. Note that we n use three rates for the local to zero moments which are recycled as needed. We set σ 2 ε = 1, σ 2 u = 0.5, D = 0.2 and h = 1. For each correlation structure we investigate weak and strong identification scenarios by changing π 0 in equation 5. In the strong identification scenario we have π 0 = and in the weak identification case we have π 0 = (2 1 3, ) with 1 l being a row vector of ones of length l. The second setup is constructed in an analogous manner. We set the variance of the instruments σ 2 zz = {0.5, 1.0} I q and the covariance between the structural and reduced form errors σ εu = 0.5. This gives us two cases: in the Case 1 σ 2 zz = 0.5 I q and cov ue = 0.5 and in the Case 2 σ 2 zz = 1.0 I q and cov ue = 0.5. We have estimated many other cases for the covariance matrix: in the Case 3 σ 2 zz = I q and cov ue = 0.5, in the Case 4 σ 2 zz = I q and cov ue = 0.9, in the Case 5 σ 2 zz = 2 I q and cov ue = 0.5 and in the Case 6 σ 2 zz = 2 I q and cov ue = 0.9. These cases and the local-to-zero ones are available on request. 12
14 The simulated sample sizes are n = {50, 100, 250}. All the results in the next section are based on 1000 repetitions. 4 Results We focus only on the most relevant and salient results of our simulation exercises: Cases 1 and 2 for Setups 1 and 2 using invalid moments, with constant correlation with the structural equation. We are not presenting all our simulated scenarios for economy of space, and because the general results presented here hold across all the alternative setups 1. We will focus on the weak and strong identification cases with σzz 2 = 0.5I q and σzz 2 = 1I q. The analysis of the results is done with reference to two questions: how good are the msc selection procedures, and which technique gives the best estimation of the structural parameter θ 0. The R 2 of the first stage regression is presented in Table 7. It ranges from to depending on the strength of the identification and the number of observations. The moment selection methods are the adaptive lasso (alasso), penalized gmm (gmm pen ) and penalized cue (cue pen ). We have nine post selection structural parameter estimators: alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui (2011) s moment averaging estimator in the second stage. Setup 1 is the homoskedastic setup, and we use the optimal m in Okui (2011) MA method, which works only under homoskedasticity. Setup 2 allows heteroskedasticity. So Okui (2011) method is not designed for that and m in Section 2.2 here is only designed for homoskedasticity. But we still use m in Section 2.2 in Setup 2 simulations to see how this method fares 1 We have extensive results for all the moment selection techniques discussed in the section 2 for fixed correlation and local to zero correlation between the instruments and structural error available on request. 13
15 under heteroskedasticity. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the second stage. gmm pen gmm selects the moments in the same way as in the previous method but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. A summary of our results is presented in Tables 1 and 2 for model selection and post selection estimation performance respectively. In Table 1 we present the average ranking of each method on the probability of selecting the exact valid moments in Tables 3 and 4 for each sample size and strength of the identification. In case of a tie the methods get the same ranking (and we can have two first or two second places). From these tables we can see that the adaptive lasso method is the best in perfect moment selection. In Table 2 we present the performance of the post selection estimation methods. The performance is assessed by the rmse. The rankings are based on the relative performance in Tables 5a to 6b, presented by sample size and strength of identification. The estimator with the smallest value acquires the rank of 1. Two estimators with the same rmse are given the same rank. The Average Ranking ranges from 1 to 9 14
16 Table 1: Summary of the Performance of the Moment Selection Techniques Setup 1 Setup 2 alasso gmm pen cue pen Figures correspond to the average ranking of each method based on the probability of selecting the exact valid moments. The latter are in Tables 3 and 4, by sample size and strength of identification. In case of a tie the methods get the same ranking (we can have two first or two second places). alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. and the frequency of being in the Top Three from 0 to 12. From Table 2 we conclude that the best estimator is obtained using the adaptive lasso method to select the moments, followed by the moment averaging procedure (alasso ma). The moment averaging procedure improves estimation for the three moment selection techniques. The worse estimators are obtained with the cue method. In the heteroskedastic setup (Setup 2) adaptive lasso-ma (moment averaging in the second stage) is still the best in terms of RMSE, but not as good compared to the homoskedastic case (Setup 1). In the next sections we present the detailed analysis of the moment selection and post selection estimation methods. 4.1 Model Selection We analyze three msc methods: the Adaptive Lasso in equation 6, the Penalized Efficient gmm and the Penalized Continuously Updated gmm in equation 7. In all cases we adopt the bic criteria. For each method we measure its performance by the probability of three events: (1) the method selects the true number of valid moments and none of the invalid ones (perfect selection), (2) it selects only valid moments but strictly less or greater than true number of valid ones, and does not select any invalid at the same time, and (3), 15
17 Table 2: Summary of the Performance of the Post Selection Techniques Setup 1 Setup 2 Average Times at the Average Times at the Ranking top three Ranking top three alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue The performance is analyzed in terms of the rmse. The rankings are based in the relative performance in the Tables 5a to 6b. The estimator with the smaller value takes the rank of 1. If there is a tie the estimators are given the same rank. The average ranking ranges from 1 to 9 and the times at the top three from 0 to 12. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 16
18 it selects any invalid moments. The first probability shows the probability of being perfect. The second choice is second best, since we do not choose the correct number of valid moments, but still we choose only valid ones. One can benefit from this in the second stage of structural parameter estimation in gmm. The third probability provides how bad can the moment selection criteria behave. Since invalid moments can affect the finite sample bias of the second stage structural parameter estimate badly, we would prefer the third probability to be low. The results are presented in Tables 3 and 4 for, respectively, the first setup (fixed number of moments, homoskedastic case), and the second setup (increasing number of moments, heteroskedastic case) for the weak and strong identification cases and covariance among the instruments σzz 2 = 0.5 I q and σzz 2 = I q. In Table 3, Setup 1, for σzz 2 = 0.5 I q we find that in the smallest sample case, n = 50, all three msc approaches behave poorly, particularly the gmm pen, selecting invalid moments with a probability of in the weak identification case, and 1 in the strong identification case. The best method is the alasso, which selects invalid moments with a probability of and in the weak and strong identification cases respectively. The performance of the three methods improves when the sample size increases to n = 100, but their relative positions remain the same: alasso dominates the penalized methods. In this case the alasso selects invalid moments with probability and in the weak and strong identification cases, whereas the penalized methods do so with probabilities above The performance ranking changes when the sample size is increased to n = 250: the alasso still selects invalid moments with the smallest probability, but the penalized cue method select perfectly with probability and in strong and weak identification, compared with 17
19 0.355 and for the alasso. However, if the objective is avoid selecting any invalid moments, then the alasso still dominates, selecting any invalid moment with probability Since alasso and the penalized methods are selection consistent we can take this as evidence of differences in the convergence rates, with the penalized methods converging faster in this case and the differences in the performance of the penalized methods is negligible. This is not true for the next case. In Table 3, Setup 1, when σzz 2 = 1 I q the relative performance of the methods is the same as in the previous case, but with alasso dominating in all the cases and criteria. However the penalized methods are catching up the performance of alasso at the sample size increases, with the cue pen slightly dominates its counterpart gmm pen in all the cases. Also in all the cases the methods behave poorly compared to the setup with σzz 2 = 0.5 I q. The conclusions for the Setup 2 are the same as those for the Setup 1. It is noteworthy that the alasso moves smoothly between the three performance measures (prefect, valid and invalid selection), whereas the penalized methods jump from selecting invalid instruments to perfect selection as the sample size increases, with undesirably small probabilities of selecting only valid moments (but not perfect) under all our setups. 4.2 Post Selection Performance In this section we analyze the post selection performance of the msc in terms of the bias, standard deviation and rmse of the estimate for ˆθ. For each of the three methods we estimate the structural parameter using efficient and continuously updated gmm, and the moment averaging method in Okui (2011). The results for the Setups 1 and 18
20 Table 3: Probabilities: Moment Selection Criteria. Setup 1 Weak Identification Strong Identification Perfect Only Any Perfect Only Any selection Valid Invalid selection Valid Invalid σzz 2 = 0.5 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen σzz 2 = 1 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen Note: alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii) Any invalid: the method select any invalid moment. Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. 19
21 Table 4: Probabilities: Moment Selection Criteria. Setup 2 Weak Identification Strong Identification Perfect Only Any Perfect Only Any selection Valid Invalid selection Valid Invalid σzz 2 = 0.5 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen σzz 2 = 1 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen Note: alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii) Any invalid: the method select any invalid moment. Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. 20
22 2 are presented in Tables 5a, 5b and 6a, 6b respectively. In Table 6a, Setup 2, with weak identification and σzz 2 = 0.5 I q we find that with sample size n = 50 the best estimator is obtained by using alasso ma with rmse at The worst method is the cue with a rmse of Note that when using the full set of instruments the gmm estimator performs better than the cue in terms of rmse, but cue has a smaller bias in the weak identification case. With the sample size n = 50 the adaptive lasso based methods are the best in rmse in the strong identification case. As the sample size increases all the estimators converge to the true value. With σzz 2 = 1 I q the relative performance remains the same, but the rmse and the standard deviations are smaller. In terms of coverage, alasso-gmm performs the best among several specifications. In all setups and specifications, we see that alasso-gmm comes close to 95% coverage whereas other methods cannot replicate this behavior. In terms of bias, in more relevant Setup 2 (Tables 6a-b), we see that adaptive lasso based methods do very well, but so does penalized cue in first stage, followed with second stage cue. 5 Conclusion We have studied the relative performance of several moment selection techniques in selecting the correct moments and in estimating the structural parameter. Our simulations suggest that using adaptive lasso in the first stage, obtaining valid instruments, followed by gmm, or moment averaging, deliver the most satisfactory rmse for the structural parameter in both the homoskedastic and heteroskedastic cases. This approach has important computational benefits due to the possibility of estimation based on the lars algorithm, which makes it a good practical choice when the number of instruments grows large. 21
23 Table 5a: Monte Carlo results for ˆθ. Setup 1. (PART 1) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 0.5 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 22
24 Table 5b: Monte Carlo results for ˆθ. Setup 1. (PART 2) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 1 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 23
25 Table 6a: Monte Carlo results for ˆθ. Setup 2. (PART 1) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 0.5 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method and then using Okui s model averaging estimator, alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments and then use them in the efficient and cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm for model selection and then model averaging estimator. gmm pen gmm selects the moments in the same way and the structural parameter is estimated using efficient gmm. cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator, cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 24
26 Table 6b: Monte Carlo results for ˆθ. Setup 2. (PART 2) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 1 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method and then using Okui s model averaging estimator, alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments and then use them in the efficient and cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm for model selection and then model averaging estimator. gmm pen gmm selects the moments in the same way and the structural parameter is estimated using efficient gmm. cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator, cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 25
Big Data, Model Selection, Aggregation-Indexing
Big Data, Model Selection, Aggregation-Indexing Esfandiar Maasoumi 1 1 Emory University September 5, 2016 BIG Data-Summer School-Khatam-Pasargad MAASOUMI What Does Big Data Mean? BIG Data-Summer School-Khatam-Pasargad
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationGMM-based Model Averaging
GMM-based Model Averaging Luis F. Martins Department of Quantitative Methods, ISCTE-LUI, Portugal Centre for International Macroeconomic Studies (CIMS), UK (luis.martins@iscte.pt) Vasco J. Gabriel CIMS,
More informationUsing Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM
Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM Francis J. DiTraglia a a Faculty of Economics, University of Cambridge Abstract In finite samples, the use of an invalid
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationLinear Instrumental Variables Model Averaging Estimation
Linear Instrumental Variables Model Averaging Estimation Luis F. Martins Department of Quantitative Methods, ISCE-LUI, Portugal Centre for International Macroeconomic Studies CIMS, UK luis.martins@iscte.pt
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationEconomic modelling and forecasting
Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationUltra High Dimensional Variable Selection with Endogenous Variables
1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High
More informationEconomics 582 Random Effects Estimation
Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0
More informationCOMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract
Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao
More informationSpecification Test for Instrumental Variables Regression with Many Instruments
Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationWhat s New in Econometrics. Lecture 15
What s New in Econometrics Lecture 15 Generalized Method of Moments and Empirical Likelihood Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Generalized Method of Moments Estimation
More informationVariable Selection in Predictive Regressions
Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when
More informationVector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.
Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationShrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed Effects
Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed Effects un Lu andliangjunsu Department of Economics, HKUST School of Economics, Singapore Management University February, 5 Abstract
More informationA New Paradigm: A Joint Test of Structural and Correlation Parameters in Instrumental Variables Regression When Perfect Exogeneity is Violated
A New Paradigm: A Joint Test of Structural and Correlation Parameters in Instrumental Variables Regression When Perfect Exogeneity is Violated By Mehmet Caner and Melinda Sandler Morrill September 22,
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationExponential Tilting with Weak Instruments: Estimation and Testing
Exponential Tilting with Weak Instruments: Estimation and Testing Mehmet Caner North Carolina State University January 2008 Abstract This article analyzes exponential tilting estimator with weak instruments
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationChapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE
Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationOn Mixture Regression Shrinkage and Selection via the MR-LASSO
On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationSelecting (In)Valid Instruments for Instrumental Variables Estimation
Selecting InValid Instruments for Instrumental Variables Estimation Frank Windmeijer a,e, Helmut Farbmacher b, Neil Davies c,e George Davey Smith c,e, Ian White d a Department of Economics University of
More information10. Time series regression and forecasting
10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationMultiple Equation GMM with Common Coefficients: Panel Data
Multiple Equation GMM with Common Coefficients: Panel Data Eric Zivot Winter 2013 Multi-equation GMM with common coefficients Example (panel wage equation) 69 = + 69 + + 69 + 1 80 = + 80 + + 80 + 2 Note:
More informationMachine learning, shrinkage estimation, and economic theory
Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such
More informationESSAYS ON INSTRUMENTAL VARIABLES. Enrique Pinzón García. A dissertation submitted in partial fulfillment of. the requirements for the degree of
ESSAYS ON INSTRUMENTAL VARIABLES By Enrique Pinzón García A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Economics) at the UNIVERSITY OF WISCONSIN-MADISON
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationGMM Estimation and Testing
GMM Estimation and Testing Whitney Newey July 2007 Idea: Estimate parameters by setting sample moments to be close to population counterpart. Definitions: β : p 1 parameter vector, with true value β 0.
More informationTesting Overidentifying Restrictions with Many Instruments and Heteroskedasticity
Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity John C. Chao, Department of Economics, University of Maryland, chao@econ.umd.edu. Jerry A. Hausman, Department of Economics,
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationWeak Instruments in IV Regression: Theory and Practice
Weak Instruments in IV Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun November 20, 2018 Abstract When instruments are weakly correlated with endogenous regressors, conventional
More informationOn the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression
Working Paper 2016:1 Department of Statistics On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Sebastian
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction
More informationExogeneity tests and weak identification
Cireq, Cirano, Départ. Sc. Economiques Université de Montréal Jean-Marie Dufour Cireq, Cirano, William Dow Professor of Economics Department of Economics Mcgill University June 20, 2008 Main Contributions
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationA Robust Approach to Estimating Production Functions: Replication of the ACF procedure
A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018
More informationSpring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle
More informationEconometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationSize Distortion and Modi cation of Classical Vuong Tests
Size Distortion and Modi cation of Classical Vuong Tests Xiaoxia Shi University of Wisconsin at Madison March 2011 X. Shi (UW-Mdsn) H 0 : LR = 0 IUPUI 1 / 30 Vuong Test (Vuong, 1989) Data fx i g n i=1.
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationSingle Equation Linear GMM
Single Equation Linear GMM Eric Zivot Winter 2013 Single Equation Linear GMM Consider the linear regression model Engodeneity = z 0 δ 0 + =1 z = 1 vector of explanatory variables δ 0 = 1 vector of unknown
More information1 Estimation of Persistent Dynamic Panel Data. Motivation
1 Estimation of Persistent Dynamic Panel Data. Motivation Consider the following Dynamic Panel Data (DPD) model y it = y it 1 ρ + x it β + µ i + v it (1.1) with i = {1, 2,..., N} denoting the individual
More informationTesting for Regime Switching in Singaporean Business Cycles
Testing for Regime Switching in Singaporean Business Cycles Robert Breunig School of Economics Faculty of Economics and Commerce Australian National University and Alison Stegman Research School of Pacific
More informationInference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms
Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Byunghoon ang Department of Economics, University of Wisconsin-Madison First version December 9, 204; Revised November
More informationRecent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data
Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Endogeneity b) Instrumental
More informationA Factor Analytical Method to Interactive Effects Dynamic Panel Models with or without Unit Root
A Factor Analytical Method to Interactive Effects Dynamic Panel Models with or without Unit Root Joakim Westerlund Deakin University Australia March 19, 2014 Westerlund (Deakin) Factor Analytical Method
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationGeneralized Method of Moment
Generalized Method of Moment CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University June 16, 2010 C.-M. Kuan (Finance & CRETA, NTU Generalized Method of Moment June 16, 2010 1 / 32 Lecture
More informationABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.
ABSTRACT POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.) Statisticians are often faced with the difficult task of model
More informationLeast Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006
Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging
More information1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project
Erik Swanson Cori Saviano Li Zha Final Project 1 Introduction In analyzing time series data, we are posed with the question of how past events influences the current situation. In order to determine this,
More informationDSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.
DSGE Methods Estimation of DSGE models: GMM and Indirect Inference Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@wiwi.uni-muenster.de Summer
More informationGMM Estimation and Testing II
GMM Estimation and Testing II Whitney Newey October 2007 Hansen, Heaton, and Yaron (1996): In a Monte Carlo example of consumption CAPM, two-step optimal GMM with with many overidentifying restrictions
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationGMM estimation of spatial panels
MRA Munich ersonal ReEc Archive GMM estimation of spatial panels Francesco Moscone and Elisa Tosetti Brunel University 7. April 009 Online at http://mpra.ub.uni-muenchen.de/637/ MRA aper No. 637, posted
More informationPanel Data Models. James L. Powell Department of Economics University of California, Berkeley
Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel
More informationRobust Two Step Confidence Sets, and the Trouble with the First Stage F Statistic
Robust Two Step Confidence Sets, and the Trouble with the First Stage F Statistic Isaiah Andrews Discussion by Bruce Hansen September 27, 2014 Discussion (Bruce Hansen) Robust Confidence Sets Sept 27,
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationSingle Equation Linear GMM with Serially Correlated Moment Conditions
Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot November 2, 2011 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )
More informationOn GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models
On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models Takashi Yamagata y Department of Economics and Related Studies, University of York, Heslington, York, UK January
More informationSingle Equation Linear GMM with Serially Correlated Moment Conditions
Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationApplied Econometrics (MSc.) Lecture 3 Instrumental Variables
Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.
More informationGMM, HAC estimators, & Standard Errors for Business Cycle Statistics
GMM, HAC estimators, & Standard Errors for Business Cycle Statistics Wouter J. Den Haan London School of Economics c Wouter J. Den Haan Overview Generic GMM problem Estimation Heteroskedastic and Autocorrelation
More informationarxiv: v3 [math.st] 23 May 2016
Inference in partially identified models with many moment arxiv:1604.02309v3 [math.st] 23 May 2016 inequalities using Lasso Federico A. Bugni Mehmet Caner Department of Economics Department of Economics
More informationA Robust Test for Weak Instruments in Stata
A Robust Test for Weak Instruments in Stata José Luis Montiel Olea, Carolin Pflueger, and Su Wang 1 First draft: July 2013 This draft: November 2013 Abstract We introduce and describe a Stata routine ivrobust
More informationTopic 4 Unit Roots. Gerald P. Dwyer. February Clemson University
Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends
More informationDynamic panel data methods
Dynamic panel data methods for cross-section panels Franz Eigner University Vienna Prepared for UK Econometric Methods of Panel Data with Prof. Robert Kunst 27th May 2009 Structure 1 Preliminary considerations
More informationChapter 11 GMM: General Formulas and Application
Chapter 11 GMM: General Formulas and Application Main Content General GMM Formulas esting Moments Standard Errors of Anything by Delta Method Using GMM for Regressions Prespecified weighting Matrices and
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationGravity Models, PPML Estimation and the Bias of the Robust Standard Errors
Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors Michael Pfaffermayr August 23, 2018 Abstract In gravity models with exporter and importer dummies the robust standard errors of
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationIV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors
IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral, IAE, Barcelona GSE and University of Gothenburg U. of Gothenburg, May 2015 Roadmap Testing for deviations
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationMissing dependent variables in panel data models
Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationOracle Estimation of a Change Point in High Dimensional Quantile Regression
Oracle Estimation of a Change Point in High Dimensional Quantile Regression Sokbae Lee, Yuan Liao, Myung Hwan Seo, and Youngki Shin arxiv:1603.00235v2 [stat.me] 16 Dec 2016 15 November 2016 Abstract In
More information