Moment and IV Selection Approaches: A Comparative Simulation Study

Size: px
Start display at page:

Download "Moment and IV Selection Approaches: A Comparative Simulation Study"

Transcription

1 Moment and IV Selection Approaches: A Comparative Simulation Study Mehmet Caner Esfandiar Maasoumi Juan Andrés Riquelme August 7, 2014 Abstract We compare three moment selection approaches, followed by post selection estimation strategies. The first is adaptive lasso of Zou (2006), recently extended by Liao (2013) to possibly invalid moments in gmm. In this method, we select the valid instruments with adaptive lasso. The second method is based on the J test, as in Andrews and Lu (2001). The third one is using a Continuous Updating Objective (cue) function. This last approach is based on Hong et al. (2003) who propose a penalized generalized empirical likelihood based function to pick up valid moments. They use empirical likelihood, and exponential tilting in their simulations. However, the J test based approach of Andrews and Lu (2001) provides generally better moment selection results than the empirical likelihood and exponential tilting as can be seen in Hong et al. (2003). In this article, we examine penalized cue as a third way of selecting valid moments. Following a determination of valid moments, we run unpenalized gmm and cue and model averaging technique of Okui (2011) to see which one has better postselection estimator performance for structural parameters. The simulations are aimed at the following questions: which moment selection criterion can better select the valid ones and eliminate the invalid ones? Given the chosen instruments in the first stage, which strategy delivers the best finite sample performance? We find that the Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or moment averaging of Okui delivers generally the smallest rmse for the second stage coefficient estimators. Keywords and phrases: Shrinkage, Monte Carlo, Averaging. North Carolina State University, Department of Economics, 4168 Nelson Hall, Raleigh, NC mcaner@ncsu.edu. Emory University, Department of Economics, Atlanta, GA. esfandiar.maasoumi@emory.edu. North Carolina State University, Department of Economics. jariquel@ncsu.edu.

2 1 Introduction It is not uncommon to encounter a large number of instruments or moment conditions in the applications of instrumental variables (iv) or the Generalized Method of Moments (gmm) estimators. Some IVs or moments may be invalid, but the researcher does not know a priori which ones. This problem may be adjudicated statistically with the J-test which indicates whether overidentified restrictions are valid. If the null is rejected, the researcher would need to a moment selection technique that would allow distinguishing between the valid and invalid moment conditions. A few techniques have been proposed, each one with advantages (for example consistency) and disadvantages (such as overwhelming computational demand). In this paper we focus on information-based methods and review three of the moment selection criteria (msc) used in the current literature: (i) the shrinkage procedure as in Liao (2013) and (ii) the information-based criteria with gmm in Andrews (1999), and (iii) the information based criterion using generalized empirical likelihood of Hong et al. (2003). By using Monte Carlo simulations we compare these methods in their performance in selecting valid moments in linear settings under several relevant scenarios: small and large sample sizes, fixed and increasing number of moment conditions, weak and strong identification, local-to-zero moment conditions, homoskedastic and heteroskedastic errors. The contribution of our study is the comparison of these multistep approaches with each other in a fairly comprehensive manner. The choice of methods in this study was motivated by the following considerations: adaptive lasso is heavily used in statistics and has computational advantages in large scale problems; penalized methods in Andrews (1999) and Hong et al. (2003) are not computationally advantageous, but are used by econometricians due to the need to determine valid instruments. Further, these three methods have reasonably strong 1

3 theoretical underpinnings. We analyze second stage estimation performance, considering the finite sample properties of structural parameter estimators. To this end, we employ Okui (2011) model averaging technique to get better Mean Squared Error, and smaller bias for the structural parameters. We then compare Okui (2011) to unpenalized gmm and cue estimation, following selection of valid instruments in the first stage. We find that the Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or moment averaging of Okui, generally delivers the smallest rmse for the second stage estimation. There is a large and rich literature on Moment Selection Techniques. Smith (1992) proposes a procedure to compare competing non-nested gmm estimations, allowing for heteroskedasticity and serial correlation. Again in the GMM context, Andrews (1999) proposes a moment selection procedure using information criteria based on a J-statistic corrected for the number of moment conditions. This is analogous to the use of Akaike (aic), Bayesian (bic) and Hannan-Quinn (hqic) information criteria in model selection. He shows that the proposed methods are consistent under suitable assumptions, and also formalized the downward and upward testing procedures. The downward testing consists of iterative J-tests starting from the largest moment conditions, and proceeding down to fewer moment conditions at each iteration, until the null is not rejected. The upward testing works in the opposite order. Andrews and Lu (2001) extend these methods to model selection in dynamic panel data structures. Hong et al. (2003) propose a similar approach by using the generalized empirical likelihood defined by Newey and Smith (2000) instead of the J-statistic. A relatively new type of moment selection methods are based on shrinkage 2

4 procedures. One of the advantages of the shrinkage is its computational efficiency which is consequential, especially in high-dimensional contexts. In a brief comparison, Hastie et al. (2009, section 3.6) conclude that the shrinkage method performs better than alternative model selection techniques in reducing the estimation error. Liao (2013) shows that gmm shrinkage procedures have the oracle property in selecting the moment conditions, and adding additional valid moments improves efficiency for strongly identified parameters. Cheng and Liao (2012) used a similar approach and proposed a weighted tuning parameter that allows to shrink invalid and redundant moments. We chose the three moment selection criteria based on their optimality properties, such as the oracle property, and good finite sample performance. For model selection, assuming valid instruments, Belloni, Chernozhukov and Hansen (2011) utilize lasso-type estimators in the many iv case and provide conditions under which the iv approach is asymptotically oracle-efficient. Caner (2009) and Caner and Zhang (2013) also use shrinkage methods for model selection in a gmm context. Canay (2010) proposed the use of trapezoidal kernel weights to shrink the first stage estimators. Kuersteiner and Okui (2010) point out that, despite the advantages of the kernel shrinkage estimation, they cannot completely reduce the estimation bias and are inflexible once a particular kernel is chosen. They also propose a moment average estimator using the method in Hansen (2007) to construct optimal instruments. Okui (2011) develops a shrinkage method that minimizes the asymptotic mean squared error. An important concern in the gmm estimation is the presence of weak ivs (Hausman et al. (2005); Andrews and Stock (2007)). The results in Cheng and Liao (2012) suggest that shrinkage estimation is robust in discarding invalid ivs, but tends to include redundant ivs when identification is weak. 3

5 The rest of the paper is as follows: in the next section we provide a literature review; in section 2 we review the msc approaches under comparison. In section 3 we present the details of our Monte Carlo simulation setups. In section 4 the main results of our simulation exercises are presented. Section 5 concludes. Standard notation is used for the projection operator P A = A(A A) 1 A, where A is a matrix. 2 Theoretical Framework 2.1 Moment Selection Methods Consider a sequence of random variables {Z i } n i=1 drawn from an unknown probability distribution. The moment selection problem consists of selecting the r valid moments from a set of q candidates. A minimum set of s p valid moment conditions are required in order to identify the structural parameter vector θ, where p dim(θ). The set of q candidate moments can be separated in two subsets and the model is, for i = 1,..., n E[g S (Z i, θ 0 )] = 0 S = {1,..., s} (1) E[g S c(z i, θ 0 )]? = 0 S c = {s + 1,..., q} (2) where the sign =? means that the relationship may not hold for some of the indexes in S c. We see that r = s + s v, where s represents the number of moments in (1) (i.e. those deemed to be valid), and s v represents the number of valid moments in the second set of q s total moments in (2). S c represents the moments that may or may not be valid. Thus 0 s v q s. Our framework assumes that researcher knows a priori that s instruments are valid, and they can identify p parameters. The 4

6 question is whether to include the rest of the instruments for efficiency considerations. This is the framework used recently by Liao (2013). Note that θ 0 represents the true structural parameter vector of dimension p. The standard gmm estimator of θ 0, denoted by ˆθ n is ˆθ n argmin J(θ, W n ), θ Θ where W n is a p p symmetric and positive definite weight matrix and the objective function (Hansen, 1982) is defined as J(θ, W n ) n g n (θ) Wn g n (θ), (3) with g n (θ) = n 1 n i=1 g(z i, θ), and Θ is a compact subset of R p. For ease of notation let g(z i, θ) = g i (θ). Throughout the paper we consider the following linear model because of its computational advantages, and conduct the comparative examination in this widely used setup: y = Y θ 0 + ε (4) Y = Zπ 0 + u (5) where y is n 1 vector, Y is a n p vector of endogenous variables, Z is an n q matrix of instruments, ε and u are unobserved random variables with constant second moments and correlated with each other. We do not deal with control variables in the simulations. This makes no difference in a simulation setup. We make the following diversion from standard GMM in (3). The set of instruments are divided into valid 5

7 ones,z i1 (s 1) and the set that we suspect may contain invalid instruments Z i2 (q s 1). The sample moment conditions are defined by: g n (θ, β) = 1 n n g i (θ, β), i=1 where g i (θ, β) = (g i1 (θ), g i2 (θ, β) ) with g i1 (θ) = Z i1 (y i Y i θ), g i2 (θ, β) = Z i2 (y i Y i θ) β. The weight matrix for our nonstandard case is calculated as: W n = 1 n n g i ( θ, β)g i ( θ, β), i=1 where θ, β are the first step GMM estimators with I q as the weight matrix. The first method we discuss is the adaptive gmm shrinkage estimation method (Liao, 2013). This method has the advantage of selecting the valid moments and estimate θ in a single step. It consists of adding a slackness parameter vector β 0 to the moment conditions in (2). So the model is: E g i1(θ 0 ) g i2 (θ 0, β 0 ) = 0. and the validity of the moment conditions is verified by inference on whether β 0 = 0 or not. A moment condition j is valid only if β 0j = 0, for j = 1, q s. 6

8 The adaptive lasso estimators are defined as: (ˆθ n alasso, alasso ˆβ n ) = argmin (θ,β) Θ B [ ] q s g n (θ, β) W n g n (θ, β) + λ n ˆω j β j j=1 (6) where Θ B is the parameter space for (θ, β), ˆω j is a vector of weights, with ˆω j = 1 β j, and β j is the unpenalized standard gmm estimator using all q moments. The adaptive lasso (alasso) estimator penalizes the slackness parameter by its l 1 norm. This penalty is usually preferred because it has the oracle property (β 0j is shrunk to zero for the valid moments) and because it can be solved by using the lars algorithm (Efron et al., 2004), which represents a great computational advantage. Liao (2013) also considered alternatives adaptive lasso, as well as bridge, and smoothly clipped absolute deviation penalties, but we focus only on the adaptive lasso estimator because the penalty is convex and easy to estimate compared with others. The degree of shrinkage is given by the tuning parameter λ n 0: large values shrink more, and λ n = 0 corresponds to the gmm solution. λ n is chosen to differentiate between valid and invalid moments. The second msc that we analyze is by Andrews (1999) and extended in Andrews and Lu (2001). It consist of a penalization of the J-statistic (Hansen, 1982) in equation (3). Following Andrews (1999) notation, let c R q s denote a moment selection vector of zeros and ones such that if the jth moment condition is valid, the jth element of c is one. Let c = q s j=1 c j denote the number of moments selected by c and Z ic is the vector Z from which the jth element is deleted if the corresponding jth element in c is zero. The corresponding weight matrix is W c n of dimension s + c s + c. The 7

9 msc estimator objective function has the following general form: msc n (c) = J c (θ, W c n) h( c )κ n, (7) where J c (θ, W c n) = g n (θ) W c n g n (θ) uses the s + c moments in gmm objective function. See that g n (θ) is defined immediately below equation (3). In other words, in (7) we have W c n = n 1 n i=1 Z icz ic ɛ 2 i, where ɛ i = y i Y i θ, and θ is estimated through inefficient gmm with weight matrix as identity matrix, and this inefficient GMM uses Z ic. The algorithm for this process works as follows. For each instrument combination, we calculate first step inefficient GMM with identity as the weight matrix, and then given the inefficient GMM estimates we setup the new weight as described above, and get the parameter estimates for the second stage efficient GMM. Then we form (7) for each instrument combination, and pick the instrument combination that minimizes (7). The corresponding efficient GMM estimates are the ones that will be used. To be specific, say we have to potentially valid instruments Z1, Z2. The possible combinations are Z1 only, Z2 only, Z1, Z2 together. Then first, for Z1 only, we get inefficient GMM estimates, and use them to get weight matrix for the second stage and then get efficient GMM estimates for Z1. We repeat the same analysis for Z2, and then for Z1, Z2. So now we have three sets of efficient GMM estimates, and we choose the one that minimize (7). The choices of the function h( ) and {κ n } n 1 lead to different MSC. Andrews (1999) uses h( c ) = c p and three different choices of κ n that lead to three moment selection criteria (aic, bic, Hannan-Quinn) gmmbic: msc bic,n (c) = J c (θ, W c n) ( c p) ln n gmmaic: msc aic,n (c) = J c (θ, W c n) 2 ( c p) gmmhqic: msc hqic,n (c) = J c (θ, W c n) 2.1 ( c p) ln ln n 8

10 were the value 2.1 in gmmhqic is chosen in light of the results in Andrews (1997). For consistency among the methods we will analyze the gmmbic method in this paper. Also bic based penalty gives selection consistency in both adaptive lasso and Andrews and Lu (2001). The results for the aic and hqic cases are available on request. The third method is by Hong et al. (2003). Their method is analogous to Andrews and Lu (2001), but the J function is estimated using generalized empirical likelihood and exponential tilting statistics. However, we only use cue based objective function, as described in the introduction, due to poor performance of empirical likelihood and exponential tilting as shown in Hong et al. (2003). The objective function is the same as (7) but the weight matrix is updated continuously together with parameters until convergence. In this third method, the weight matrix is W n,cue = n 1 n i=1 Z icz icɛ 2 i, where ɛ i = y i Y i θ. 2.2 Parameter Estimation There are three methods that we will examine in the second stage of parameter estimation for θ. The first two are unpenalized gmm, and unpenalized CUE. Given valid instruments, these two methods will get parameter estimates for structural parameters. An alternative approach for parameter estimation after the moment selection has been done is the method proposed by Okui (2011): the shrinkage two stages least squares (stsls) estimation process. This method shrinks to zero some of the weights of the sample moment conditions and requires a minimum set of moments known to be valid before estimation. The stsls estimation is as follows: 9

11 for a shrinkage parameter m we define P m = P ZI + mp ZII and the stsls as ˆθ stsls n,s = (Y P m Y ) 1 Y P m y, where Z I represents s valid moments from the first set, and Z II represents the moments that may be selected as valid by an information criterion such as alasso or gmm, from the second set of q s moments/instruments. The shrinkage parameter m is chosen to minimize the Nagar (1959)-type approximation of the mean squared error. When there is only one endogenous variable, as in our simulation setup, the estimate of the optimal shrinkage parameter is: ˆm = ˆσ ε 2 Ŷ P ZI Ŷ n ˆσ 2 εu r2 n + ˆσ2 ε Ŷ P ZI Ŷ n where ˆσ ε 2 and ˆσ εu 2 are the estimates for σε 2 = E(ε 2 i ) and σu 2 = E(ε i u i ) obtained from a preliminary estimation as described in Okui (2011), and Ŷ is the prediction of Y based on a least squares regression with the selected instruments. Okui (2011) works only under homoskedasticity and m is valid only under homoskedasticity. 3 Monte Carlo Simulations The purpose of the Monte Carlo simulations is to compare the previously described msc approaches in two respects. First, effectiveness in selecting the correct moments conditions, and performance of the post selection estimators. We use the data generating process in equations (4) and (5). We have only one endogenous variable 10

12 and set the true θ 0 = 0.5. We employ (Z, ε, u) N(0, Σ) where Σ = σ 2 zzi q σ Zε 0 q σ Zε σ 2 ε σ εu 0 q σ εu σ 2 u is a (q + 2) (q + 2) symmetric matrix, σzz 2 is the variance of the instruments I q is an identity matrix of order q, σ Zε is a q 1 vector of correlations between the instruments and the structural error, 0 q is a q 1 vector of zeros, σ εu, σε 2 and σu 2 are scalars. We impose an heteroskedastic error structure of the form ε i = ε i Z i, with Z i = Z 2 i1 + + Z2 iq A moment is valid if E[g(Z i, θ 0 )] = E[Z i(y Y θ 0 )] = E[Z iε] = σ Zε = 0. We generate invalid moments by constructing σ Zε vectors in two ways: (1) constant correlation D 0 between the instrument and the structural error, and (2) local to zero correlation of the form 1/n, 1/ n and 1/ 3 n to explore different convergence rates. The homoskedastic case is when ε i = ε i. In all setups we have q total moments, and s of them known to be valid to the researcher a priori. However, there are a total of r = s + s v valid moments, so we have to select s v valid moments among q s of them. The number of valid and invalid moment conditions is generated in two ways: in the first setup we simulate data from a fixed number of moments: q = 11, s = 3 and r = 7. That is, there are 11 moments and we know that 3 of them are valid. We have to select among the other 8. We set 4 of them valid and 4 of them invalid. The errors for this setup are homoskedastic. In the second setup we allow the number of valid moments to increase with the 11

13 sample size: q = n, s = q and s v = (q s)/2, that is, we have to choose among q s candidates and we set half of them as valid. The errors for this setup are heteroskedastic. These are denoted as Setup 1 and Setup 2 respectively. In our Setup 1, Σ is a matrix constructed as follows: we simulate Z R 11 divided in three categories: the first set of instruments are known to be strong and valid (s = 3) as required by the mscs described in the previous section. As mentioned before, the next set of instruments is divided into two categories: the first four instruments are valid (s v = 4) and the last q r = 4 are invalid. The last elements of Σ are σ Zε = (0, 0, 0, 0, 0, 0, 0, D, D, D, D) in the constant correlation case and σ Zε = (0, 0, 0, 0, 0, 0, h, n h h, n 3 n, h ) in the local to zero scenario. Note that we n use three rates for the local to zero moments which are recycled as needed. We set σ 2 ε = 1, σ 2 u = 0.5, D = 0.2 and h = 1. For each correlation structure we investigate weak and strong identification scenarios by changing π 0 in equation 5. In the strong identification scenario we have π 0 = and in the weak identification case we have π 0 = (2 1 3, ) with 1 l being a row vector of ones of length l. The second setup is constructed in an analogous manner. We set the variance of the instruments σ 2 zz = {0.5, 1.0} I q and the covariance between the structural and reduced form errors σ εu = 0.5. This gives us two cases: in the Case 1 σ 2 zz = 0.5 I q and cov ue = 0.5 and in the Case 2 σ 2 zz = 1.0 I q and cov ue = 0.5. We have estimated many other cases for the covariance matrix: in the Case 3 σ 2 zz = I q and cov ue = 0.5, in the Case 4 σ 2 zz = I q and cov ue = 0.9, in the Case 5 σ 2 zz = 2 I q and cov ue = 0.5 and in the Case 6 σ 2 zz = 2 I q and cov ue = 0.9. These cases and the local-to-zero ones are available on request. 12

14 The simulated sample sizes are n = {50, 100, 250}. All the results in the next section are based on 1000 repetitions. 4 Results We focus only on the most relevant and salient results of our simulation exercises: Cases 1 and 2 for Setups 1 and 2 using invalid moments, with constant correlation with the structural equation. We are not presenting all our simulated scenarios for economy of space, and because the general results presented here hold across all the alternative setups 1. We will focus on the weak and strong identification cases with σzz 2 = 0.5I q and σzz 2 = 1I q. The analysis of the results is done with reference to two questions: how good are the msc selection procedures, and which technique gives the best estimation of the structural parameter θ 0. The R 2 of the first stage regression is presented in Table 7. It ranges from to depending on the strength of the identification and the number of observations. The moment selection methods are the adaptive lasso (alasso), penalized gmm (gmm pen ) and penalized cue (cue pen ). We have nine post selection structural parameter estimators: alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui (2011) s moment averaging estimator in the second stage. Setup 1 is the homoskedastic setup, and we use the optimal m in Okui (2011) MA method, which works only under homoskedasticity. Setup 2 allows heteroskedasticity. So Okui (2011) method is not designed for that and m in Section 2.2 here is only designed for homoskedasticity. But we still use m in Section 2.2 in Setup 2 simulations to see how this method fares 1 We have extensive results for all the moment selection techniques discussed in the section 2 for fixed correlation and local to zero correlation between the instruments and structural error available on request. 13

15 under heteroskedasticity. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the second stage. gmm pen gmm selects the moments in the same way as in the previous method but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. A summary of our results is presented in Tables 1 and 2 for model selection and post selection estimation performance respectively. In Table 1 we present the average ranking of each method on the probability of selecting the exact valid moments in Tables 3 and 4 for each sample size and strength of the identification. In case of a tie the methods get the same ranking (and we can have two first or two second places). From these tables we can see that the adaptive lasso method is the best in perfect moment selection. In Table 2 we present the performance of the post selection estimation methods. The performance is assessed by the rmse. The rankings are based on the relative performance in Tables 5a to 6b, presented by sample size and strength of identification. The estimator with the smallest value acquires the rank of 1. Two estimators with the same rmse are given the same rank. The Average Ranking ranges from 1 to 9 14

16 Table 1: Summary of the Performance of the Moment Selection Techniques Setup 1 Setup 2 alasso gmm pen cue pen Figures correspond to the average ranking of each method based on the probability of selecting the exact valid moments. The latter are in Tables 3 and 4, by sample size and strength of identification. In case of a tie the methods get the same ranking (we can have two first or two second places). alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. and the frequency of being in the Top Three from 0 to 12. From Table 2 we conclude that the best estimator is obtained using the adaptive lasso method to select the moments, followed by the moment averaging procedure (alasso ma). The moment averaging procedure improves estimation for the three moment selection techniques. The worse estimators are obtained with the cue method. In the heteroskedastic setup (Setup 2) adaptive lasso-ma (moment averaging in the second stage) is still the best in terms of RMSE, but not as good compared to the homoskedastic case (Setup 1). In the next sections we present the detailed analysis of the moment selection and post selection estimation methods. 4.1 Model Selection We analyze three msc methods: the Adaptive Lasso in equation 6, the Penalized Efficient gmm and the Penalized Continuously Updated gmm in equation 7. In all cases we adopt the bic criteria. For each method we measure its performance by the probability of three events: (1) the method selects the true number of valid moments and none of the invalid ones (perfect selection), (2) it selects only valid moments but strictly less or greater than true number of valid ones, and does not select any invalid at the same time, and (3), 15

17 Table 2: Summary of the Performance of the Post Selection Techniques Setup 1 Setup 2 Average Times at the Average Times at the Ranking top three Ranking top three alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue The performance is analyzed in terms of the rmse. The rankings are based in the relative performance in the Tables 5a to 6b. The estimator with the smaller value takes the rank of 1. If there is a tie the estimators are given the same rank. The average ranking ranges from 1 to 9 and the times at the top three from 0 to 12. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 16

18 it selects any invalid moments. The first probability shows the probability of being perfect. The second choice is second best, since we do not choose the correct number of valid moments, but still we choose only valid ones. One can benefit from this in the second stage of structural parameter estimation in gmm. The third probability provides how bad can the moment selection criteria behave. Since invalid moments can affect the finite sample bias of the second stage structural parameter estimate badly, we would prefer the third probability to be low. The results are presented in Tables 3 and 4 for, respectively, the first setup (fixed number of moments, homoskedastic case), and the second setup (increasing number of moments, heteroskedastic case) for the weak and strong identification cases and covariance among the instruments σzz 2 = 0.5 I q and σzz 2 = I q. In Table 3, Setup 1, for σzz 2 = 0.5 I q we find that in the smallest sample case, n = 50, all three msc approaches behave poorly, particularly the gmm pen, selecting invalid moments with a probability of in the weak identification case, and 1 in the strong identification case. The best method is the alasso, which selects invalid moments with a probability of and in the weak and strong identification cases respectively. The performance of the three methods improves when the sample size increases to n = 100, but their relative positions remain the same: alasso dominates the penalized methods. In this case the alasso selects invalid moments with probability and in the weak and strong identification cases, whereas the penalized methods do so with probabilities above The performance ranking changes when the sample size is increased to n = 250: the alasso still selects invalid moments with the smallest probability, but the penalized cue method select perfectly with probability and in strong and weak identification, compared with 17

19 0.355 and for the alasso. However, if the objective is avoid selecting any invalid moments, then the alasso still dominates, selecting any invalid moment with probability Since alasso and the penalized methods are selection consistent we can take this as evidence of differences in the convergence rates, with the penalized methods converging faster in this case and the differences in the performance of the penalized methods is negligible. This is not true for the next case. In Table 3, Setup 1, when σzz 2 = 1 I q the relative performance of the methods is the same as in the previous case, but with alasso dominating in all the cases and criteria. However the penalized methods are catching up the performance of alasso at the sample size increases, with the cue pen slightly dominates its counterpart gmm pen in all the cases. Also in all the cases the methods behave poorly compared to the setup with σzz 2 = 0.5 I q. The conclusions for the Setup 2 are the same as those for the Setup 1. It is noteworthy that the alasso moves smoothly between the three performance measures (prefect, valid and invalid selection), whereas the penalized methods jump from selecting invalid instruments to perfect selection as the sample size increases, with undesirably small probabilities of selecting only valid moments (but not perfect) under all our setups. 4.2 Post Selection Performance In this section we analyze the post selection performance of the msc in terms of the bias, standard deviation and rmse of the estimate for ˆθ. For each of the three methods we estimate the structural parameter using efficient and continuously updated gmm, and the moment averaging method in Okui (2011). The results for the Setups 1 and 18

20 Table 3: Probabilities: Moment Selection Criteria. Setup 1 Weak Identification Strong Identification Perfect Only Any Perfect Only Any selection Valid Invalid selection Valid Invalid σzz 2 = 0.5 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen σzz 2 = 1 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen Note: alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii) Any invalid: the method select any invalid moment. Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. 19

21 Table 4: Probabilities: Moment Selection Criteria. Setup 2 Weak Identification Strong Identification Perfect Only Any Perfect Only Any selection Valid Invalid selection Valid Invalid σzz 2 = 0.5 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen σzz 2 = 1 I q n = 50 alasso gmm pen cue pen n = 100 alasso gmm pen cue pen n = 250 alasso gmm pen cue pen Note: alasso, gmm pen and cue pen stands for adaptive lasso, penalized gmm and penalized cue respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii) Any invalid: the method select any invalid moment. Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. 20

22 2 are presented in Tables 5a, 5b and 6a, 6b respectively. In Table 6a, Setup 2, with weak identification and σzz 2 = 0.5 I q we find that with sample size n = 50 the best estimator is obtained by using alasso ma with rmse at The worst method is the cue with a rmse of Note that when using the full set of instruments the gmm estimator performs better than the cue in terms of rmse, but cue has a smaller bias in the weak identification case. With the sample size n = 50 the adaptive lasso based methods are the best in rmse in the strong identification case. As the sample size increases all the estimators converge to the true value. With σzz 2 = 1 I q the relative performance remains the same, but the rmse and the standard deviations are smaller. In terms of coverage, alasso-gmm performs the best among several specifications. In all setups and specifications, we see that alasso-gmm comes close to 95% coverage whereas other methods cannot replicate this behavior. In terms of bias, in more relevant Setup 2 (Tables 6a-b), we see that adaptive lasso based methods do very well, but so does penalized cue in first stage, followed with second stage cue. 5 Conclusion We have studied the relative performance of several moment selection techniques in selecting the correct moments and in estimating the structural parameter. Our simulations suggest that using adaptive lasso in the first stage, obtaining valid instruments, followed by gmm, or moment averaging, deliver the most satisfactory rmse for the structural parameter in both the homoskedastic and heteroskedastic cases. This approach has important computational benefits due to the possibility of estimation based on the lars algorithm, which makes it a good practical choice when the number of instruments grows large. 21

23 Table 5a: Monte Carlo results for ˆθ. Setup 1. (PART 1) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 0.5 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 22

24 Table 5b: Monte Carlo results for ˆθ. Setup 1. (PART 2) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 1 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 1 3, ) in the weak identification case and π = in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui s moment averaging estimator in the second stage. alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui s moment averaging estimator in the first stage. gmm pen gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 23

25 Table 6a: Monte Carlo results for ˆθ. Setup 2. (PART 1) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 0.5 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method and then using Okui s model averaging estimator, alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments and then use them in the efficient and cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm for model selection and then model averaging estimator. gmm pen gmm selects the moments in the same way and the structural parameter is estimated using efficient gmm. cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator, cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 24

26 Table 6b: Monte Carlo results for ˆθ. Setup 2. (PART 2) Weak Identification Strong Identification Mean Sd Bias rmse 95%c Mean Sd Bias rmse 95%c σzz 2 = 1 I q n = 50 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 100 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue n = 250 alasso ma alasso gmm alasso cue gmm gmm pen ma gmm pen gmm cue cue pen ma cue pen cue Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n moments, q known to be valid and q s unknown. From these, (q s)/2 are valid and (q s)/2 are invalid. π = (2 1 s, q s ) in the weak identification case and π = 2 1 q in the strong identification case. alasso ma is the estimator obtained by selecting the moments using the Adaptive Lasso method and then using Okui s model averaging estimator, alasso gmm and alasso cue are the estimates using adaptive lasso to select the valid moments and then use them in the efficient and cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmm pen ma uses the penalized gmm for model selection and then model averaging estimator. gmm pen gmm selects the moments in the same way and the structural parameter is estimated using efficient gmm. cue denotes the cue estimator using the full set of moments, cue pen ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator, cue pen cue selects the moments using penalized cue and estimates θ 0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 25

Big Data, Model Selection, Aggregation-Indexing

Big Data, Model Selection, Aggregation-Indexing Big Data, Model Selection, Aggregation-Indexing Esfandiar Maasoumi 1 1 Emory University September 5, 2016 BIG Data-Summer School-Khatam-Pasargad MAASOUMI What Does Big Data Mean? BIG Data-Summer School-Khatam-Pasargad

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

GMM-based Model Averaging

GMM-based Model Averaging GMM-based Model Averaging Luis F. Martins Department of Quantitative Methods, ISCTE-LUI, Portugal Centre for International Macroeconomic Studies (CIMS), UK (luis.martins@iscte.pt) Vasco J. Gabriel CIMS,

More information

Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM

Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM Francis J. DiTraglia a a Faculty of Economics, University of Cambridge Abstract In finite samples, the use of an invalid

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Linear Instrumental Variables Model Averaging Estimation

Linear Instrumental Variables Model Averaging Estimation Linear Instrumental Variables Model Averaging Estimation Luis F. Martins Department of Quantitative Methods, ISCE-LUI, Portugal Centre for International Macroeconomic Studies CIMS, UK luis.martins@iscte.pt

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

What s New in Econometrics. Lecture 15

What s New in Econometrics. Lecture 15 What s New in Econometrics Lecture 15 Generalized Method of Moments and Empirical Likelihood Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Generalized Method of Moments Estimation

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed Effects

Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed Effects Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed Effects un Lu andliangjunsu Department of Economics, HKUST School of Economics, Singapore Management University February, 5 Abstract

More information

A New Paradigm: A Joint Test of Structural and Correlation Parameters in Instrumental Variables Regression When Perfect Exogeneity is Violated

A New Paradigm: A Joint Test of Structural and Correlation Parameters in Instrumental Variables Regression When Perfect Exogeneity is Violated A New Paradigm: A Joint Test of Structural and Correlation Parameters in Instrumental Variables Regression When Perfect Exogeneity is Violated By Mehmet Caner and Melinda Sandler Morrill September 22,

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Exponential Tilting with Weak Instruments: Estimation and Testing

Exponential Tilting with Weak Instruments: Estimation and Testing Exponential Tilting with Weak Instruments: Estimation and Testing Mehmet Caner North Carolina State University January 2008 Abstract This article analyzes exponential tilting estimator with weak instruments

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Selecting (In)Valid Instruments for Instrumental Variables Estimation

Selecting (In)Valid Instruments for Instrumental Variables Estimation Selecting InValid Instruments for Instrumental Variables Estimation Frank Windmeijer a,e, Helmut Farbmacher b, Neil Davies c,e George Davey Smith c,e, Ian White d a Department of Economics University of

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Multiple Equation GMM with Common Coefficients: Panel Data

Multiple Equation GMM with Common Coefficients: Panel Data Multiple Equation GMM with Common Coefficients: Panel Data Eric Zivot Winter 2013 Multi-equation GMM with common coefficients Example (panel wage equation) 69 = + 69 + + 69 + 1 80 = + 80 + + 80 + 2 Note:

More information

Machine learning, shrinkage estimation, and economic theory

Machine learning, shrinkage estimation, and economic theory Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such

More information

ESSAYS ON INSTRUMENTAL VARIABLES. Enrique Pinzón García. A dissertation submitted in partial fulfillment of. the requirements for the degree of

ESSAYS ON INSTRUMENTAL VARIABLES. Enrique Pinzón García. A dissertation submitted in partial fulfillment of. the requirements for the degree of ESSAYS ON INSTRUMENTAL VARIABLES By Enrique Pinzón García A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Economics) at the UNIVERSITY OF WISCONSIN-MADISON

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

GMM Estimation and Testing

GMM Estimation and Testing GMM Estimation and Testing Whitney Newey July 2007 Idea: Estimate parameters by setting sample moments to be close to population counterpart. Definitions: β : p 1 parameter vector, with true value β 0.

More information

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity John C. Chao, Department of Economics, University of Maryland, chao@econ.umd.edu. Jerry A. Hausman, Department of Economics,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Weak Instruments in IV Regression: Theory and Practice

Weak Instruments in IV Regression: Theory and Practice Weak Instruments in IV Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun November 20, 2018 Abstract When instruments are weakly correlated with endogenous regressors, conventional

More information

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Working Paper 2016:1 Department of Statistics On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Sebastian

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Exogeneity tests and weak identification

Exogeneity tests and weak identification Cireq, Cirano, Départ. Sc. Economiques Université de Montréal Jean-Marie Dufour Cireq, Cirano, William Dow Professor of Economics Department of Economics Mcgill University June 20, 2008 Main Contributions

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Size Distortion and Modi cation of Classical Vuong Tests

Size Distortion and Modi cation of Classical Vuong Tests Size Distortion and Modi cation of Classical Vuong Tests Xiaoxia Shi University of Wisconsin at Madison March 2011 X. Shi (UW-Mdsn) H 0 : LR = 0 IUPUI 1 / 30 Vuong Test (Vuong, 1989) Data fx i g n i=1.

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Single Equation Linear GMM

Single Equation Linear GMM Single Equation Linear GMM Eric Zivot Winter 2013 Single Equation Linear GMM Consider the linear regression model Engodeneity = z 0 δ 0 + =1 z = 1 vector of explanatory variables δ 0 = 1 vector of unknown

More information

1 Estimation of Persistent Dynamic Panel Data. Motivation

1 Estimation of Persistent Dynamic Panel Data. Motivation 1 Estimation of Persistent Dynamic Panel Data. Motivation Consider the following Dynamic Panel Data (DPD) model y it = y it 1 ρ + x it β + µ i + v it (1.1) with i = {1, 2,..., N} denoting the individual

More information

Testing for Regime Switching in Singaporean Business Cycles

Testing for Regime Switching in Singaporean Business Cycles Testing for Regime Switching in Singaporean Business Cycles Robert Breunig School of Economics Faculty of Economics and Commerce Australian National University and Alison Stegman Research School of Pacific

More information

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Byunghoon ang Department of Economics, University of Wisconsin-Madison First version December 9, 204; Revised November

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Endogeneity b) Instrumental

More information

A Factor Analytical Method to Interactive Effects Dynamic Panel Models with or without Unit Root

A Factor Analytical Method to Interactive Effects Dynamic Panel Models with or without Unit Root A Factor Analytical Method to Interactive Effects Dynamic Panel Models with or without Unit Root Joakim Westerlund Deakin University Australia March 19, 2014 Westerlund (Deakin) Factor Analytical Method

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Generalized Method of Moment

Generalized Method of Moment Generalized Method of Moment CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University June 16, 2010 C.-M. Kuan (Finance & CRETA, NTU Generalized Method of Moment June 16, 2010 1 / 32 Lecture

More information

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell. ABSTRACT POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.) Statisticians are often faced with the difficult task of model

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project

1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project Erik Swanson Cori Saviano Li Zha Final Project 1 Introduction In analyzing time series data, we are posed with the question of how past events influences the current situation. In order to determine this,

More information

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc. DSGE Methods Estimation of DSGE models: GMM and Indirect Inference Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@wiwi.uni-muenster.de Summer

More information

GMM Estimation and Testing II

GMM Estimation and Testing II GMM Estimation and Testing II Whitney Newey October 2007 Hansen, Heaton, and Yaron (1996): In a Monte Carlo example of consumption CAPM, two-step optimal GMM with with many overidentifying restrictions

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

GMM estimation of spatial panels

GMM estimation of spatial panels MRA Munich ersonal ReEc Archive GMM estimation of spatial panels Francesco Moscone and Elisa Tosetti Brunel University 7. April 009 Online at http://mpra.ub.uni-muenchen.de/637/ MRA aper No. 637, posted

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Robust Two Step Confidence Sets, and the Trouble with the First Stage F Statistic

Robust Two Step Confidence Sets, and the Trouble with the First Stage F Statistic Robust Two Step Confidence Sets, and the Trouble with the First Stage F Statistic Isaiah Andrews Discussion by Bruce Hansen September 27, 2014 Discussion (Bruce Hansen) Robust Confidence Sets Sept 27,

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Single Equation Linear GMM with Serially Correlated Moment Conditions

Single Equation Linear GMM with Serially Correlated Moment Conditions Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot November 2, 2011 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )

More information

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models Takashi Yamagata y Department of Economics and Related Studies, University of York, Heslington, York, UK January

More information

Single Equation Linear GMM with Serially Correlated Moment Conditions

Single Equation Linear GMM with Serially Correlated Moment Conditions Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics GMM, HAC estimators, & Standard Errors for Business Cycle Statistics Wouter J. Den Haan London School of Economics c Wouter J. Den Haan Overview Generic GMM problem Estimation Heteroskedastic and Autocorrelation

More information

arxiv: v3 [math.st] 23 May 2016

arxiv: v3 [math.st] 23 May 2016 Inference in partially identified models with many moment arxiv:1604.02309v3 [math.st] 23 May 2016 inequalities using Lasso Federico A. Bugni Mehmet Caner Department of Economics Department of Economics

More information

A Robust Test for Weak Instruments in Stata

A Robust Test for Weak Instruments in Stata A Robust Test for Weak Instruments in Stata José Luis Montiel Olea, Carolin Pflueger, and Su Wang 1 First draft: July 2013 This draft: November 2013 Abstract We introduce and describe a Stata routine ivrobust

More information

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends

More information

Dynamic panel data methods

Dynamic panel data methods Dynamic panel data methods for cross-section panels Franz Eigner University Vienna Prepared for UK Econometric Methods of Panel Data with Prof. Robert Kunst 27th May 2009 Structure 1 Preliminary considerations

More information

Chapter 11 GMM: General Formulas and Application

Chapter 11 GMM: General Formulas and Application Chapter 11 GMM: General Formulas and Application Main Content General GMM Formulas esting Moments Standard Errors of Anything by Delta Method Using GMM for Regressions Prespecified weighting Matrices and

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors

Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors Michael Pfaffermayr August 23, 2018 Abstract In gravity models with exporter and importer dummies the robust standard errors of

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral, IAE, Barcelona GSE and University of Gothenburg U. of Gothenburg, May 2015 Roadmap Testing for deviations

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Oracle Estimation of a Change Point in High Dimensional Quantile Regression

Oracle Estimation of a Change Point in High Dimensional Quantile Regression Oracle Estimation of a Change Point in High Dimensional Quantile Regression Sokbae Lee, Yuan Liao, Myung Hwan Seo, and Youngki Shin arxiv:1603.00235v2 [stat.me] 16 Dec 2016 15 November 2016 Abstract In

More information