Nuclear Norm Regularized Estimation of Panel Regression Models

Size: px
Start display at page:

Download "Nuclear Norm Regularized Estimation of Panel Regression Models"

Transcription

1 Nuclear Norm Regularized Estimation of Panel Regression Models Hyungsik Roger Moon Martin Weidner February 3, 08 Abstract In this paper we investigate panel regression models with interactive fixed effects. We propose two new estimation methods that are based on minimizing convex objective functions. The first estimation method minimizes the sum of squared residuals with a nuclear trace norm regularization. The second estimation method minimizes the nuclear norm of the residuals. First, we establish the consistency of the two estimators, and then we show how to use these two estimators as a preliminary estimator and to construct an estimator that is asymptotically equivalent to the QMLE in Bai 009 and Moon and Weidner 07. For this, we propose an iteration procedure and derive its asymptotic properties. Keywords: Interactive Fixed Effects, Factors, Nuclear Norm Regularization, Nuclear Norm Minimization, Convex Optimization, Iterative Estimation Introduction Interactive fixed effects panel regression models have been widely studied in recent panel literature. The interactive fixed effect model parsimoniously represents heterogeneity in both dimensions of the panel and includes the conventional additive error component model as a special case. Department of Economics, University of Southern California, Los Angeles, CA moonr@usc.edu. School of Economics, Yonsei University, Seoul, Korea. Department of Economics, University College London, Gower Street, London WCE 6BT, U.K., and CeMMaP. m.weidner@ucl.ac.uk.

2 One of the widely used estimation methods of the interactive fixed effects panel regression is the fixed effect approach or principal component approach in a linear model that treats the interactive fixed effects in a factor form as parameters to estimate. For example, Bai 009, Moon and Weidner 05, and Moon and Weidner 07 investigated asymptotic properties of the fixed effect least squares estimator for linear panel regression models with interactive fixed effects, and Chen, Fernandez-Val, and Weidner 04 studied the fixed effect maximum likelihood estimator for nonlinear panel regressions with interactive fixed effects. The main advantage of the fixed effect or least squares LS approach is that it does not restrict the relationship between the unobserved heterogeneity and the observed explanatory variables. On the other hand, there are also drawbacks in it. First, the computation of the fixed effects estimator requires solving a non-convex optimization problem with respect to high dimensional fixed effects parameters. Even in a linear regression where a closed form profile objective function is available, the non-convexity can become a big practical problem if the dimension of the regression coefficients is large. The second problem is a theoretical issue. When the dimension of the factors or equivalently the rank of the interactive fixed effects matrix is unknown, the fixed effect estimator of the coefficients of low rank regressors could be inconsistent. As the main contribution of the paper, we propose two new estimation methods that overcome the two problems of the existing fixed effect estimation of interactive fixed effect panel model. The first estimation method is to minimizes the sum of squared residuals with a nuclear trace norm regulization. The second estimation method is to minimizes the nuclear norm of the residuals. These two estimators are based on convex objective functions and overcome the computational problem of the fixed effect estimation. Also, the nuclear norm regulization estimation overcome the potential inconsistency of the low rank regressor coefficients through regularization. Here the role of the regulization is similar to that in the penalized sieve method e.g., see Chen 0 for a survey.. The nuclear norm penalized estimation has been widely studied in machine learning and statistical learning literature, and very recently in econometrics literature Bai and Ng 07 and Athey, Bayati, Doudchenko, Imbens, and Khosravi 07. The main difference is that in our set up, the parameters of interest are regression coefficients while the high dimensional Other estimation methods of panel regressions with interactive fixed effects literature include the quasidifference approach e.g., Holtz-Eakin, Newey, and Rosen 988, the common correlated random effect method e.g., Pesaran 006, the decision theoretic approach e.g., Chamberlain and Moreira 009, Lasso type shrinkage methods on fixed effects e.g., Cheng, Liao, and Schorfheide 06, Lu and Su 06, Su, Shi, and Phillips 06. Earlier papers of the fixed effect approach include Kiefer 980 and Ahn, Lee, and Schmidt 00.

3 low rank matrix is incidental. In most machine learning and statistical learning papers, the parameter of interest is the high dimensional low rank matrix interactive fixed effects and use the nuclear norm regulization to complete the matrix e.g., Recht, Fazel, and Parrilo 00 and Hastie, Tibshirani, and Wainwright 05 for recent surveys or to deal with missing observations in panel Bai and Ng 07 and Athey, Bayati, Doudchenko, Imbens, and Khosravi 07. Regarding the nuclear norm minization estimation, it is new in literature, and was not studied before, as far as we know. In this paper, we establish asymptotics of the two estimators. As for the nuclear norm regularized estimation, we show that with a proper choice of the regularization parameter the nuclear norm regularized estimator of the interactive fixed effect matrix is consistent in Frobenius norm, and the nuclear norm regularized estimator of regression coefficients is close to minn, T consistent. As for the nuclear norm minimization estimation, we show that under certain regularity conditions the estimator minimizing the nuclear norm of the residuals is minn, T consistent. We also show how to use these two estimators as a preliminary estimator and construct an estmator that is asymptotically equivalent to the QMLE in Bai 009 and Moon and Weidner 07. For this, we propose an iteration procedure that achieves the asymptotic equivalence. The paper is organized as follows. Section introduces the model. In Section 3 we discuss the two aforementioned problems of the fixed effect estimation method as the main motivation of the paper. In Section 4 we define the nuclear norm regulization estimator, introduce regularity conditinos, and derive the consistency of the estimator. In Section 5, we define the nuclear norm minimization estimation method and discuss how it is related with the regulization method in Section 4. We also derive the consistency of the estimator. In Section 6 we show how to use these two estimators as a preliminary estimator and to construct an estimator through iterations that achieves asymptotic equlivalence to the fixed effect estimator. In Section 7, we show how to extend the analysis to nonlinear panel regressions with interactive fixed effects. Section 8 investigates finite sample properties of the estimators in Sections 4,5, and 6 through a small scale Monte Carlo simulations. Section 9 concludes the paper. All the technical derivations and proofs are presented in the appendix. Notation: For matrix A, we denote P A = AA A A and M A = I P A. For a matrix A, veca is the vector that vectorizes the columns of A. Denote mat as the inverse operator of vec, so for a = veca, mata = A. For matrices A,..., A K and K-vector α = α,..., α K, we denote α A := K k= α ka k. For an N, T matrix A, we denote s r A as the r th largest singular value of A for r =,..., minn, T. 3

4 Model Consider a linear panel regression with interactive fixed effects y it = β 0x it + γ 0,it + e it, where γ 0,it = λ 0,if 0,t, or in matrix and vector notation K Y = X k β 0,k + Γ 0 + E k= =: β 0 X + Γ 0 + E,. where Y, X,..., X K R N T are observed N T matrix of panel data, E R N T represents the panel of unobserved errors, and β 0 = β 0,,..., β 0,K is the main parameter of interest. The parameter Γ 0 = λ 0 f 0 represents the interactive fixed effcts, where λ 0 R N R 0 and f 0 R T R 0 with R 0 minn, T. We use small letters to denote vectororized variables and parameters. Let y = vecy, x k = vecx k, γ 0 = vecγ 0, and e = vece. Define x = x,..., x k. We can express the model. as y = xβ 0 + γ 0 + e = xβ 0 + f 0 λ 0 veci R0 + e.. The main goal of the paper is to estimate β 0 in the presense of the interactive fixed effects in panel. 3 Motivations 3. Computation One of the popular estimation methods is the least squares LS or the quasi-maximum likelihood method. β, Γ, Let Lβ, Γ be the quasi-gaussian likelihood function of parameters Lβ, Γ = Y β X Γ. 4

5 The LS estimator of β 0 minimizes the objective function imposing the interactive fixed effect restriction on Γ, Γ = λf, β LS := argmin β min λ R N R 0,f R T R 0 Lβ, λf. It is known that under the regularity conditions, as N, T κ, the LS estimator β LS is -consistent and asymptotically normal with a bias in the limit e.g.,bai 009, Moon and Weidner 05, and Moon and Weidner 07. The bias in the limit distribution is caused by heteroskedasticity of the error term e it accross i or over t e.g., Bai 009 and sequential exogeneity of the regressor x it e.g., Moon and Weidner 07. Even with all the nice theoretical properties of the LS estimation, in estimating the interactive fixed effect panel regression model with least squares approach, its computation can be a potential practical issue. The reason is because it requires optimizating a nonconvex objective function with respect to β, λ, f. To see this, notice that the interactive fixed effect least squares objective function is Lβ, λf = Lβ, Γ s.t. rankγ = R The non-convexity problem arises because even though the uncontrained objective function Lβ, Γ is convex in β, Γ, imposing the rank restriction, rankγ = R 0, makes the optimization problem non-convex. In the case of a linear regression like., we have a closed form profile objective function as a function of β as The LS estimator, β LS, minimizes Lβ, Γ imposing the rank restriction on Γ, Γ = λf, β LS = argmin β = argmin β min λ R N R 0,f R T R 0 min{n,t } r=r 0 + Y β X λf 3. s r Y β X. In this case, the profile objective function has a closed form and non-convexity matters only for optimizing the profile objective function β LS with respect to β. However, even in this case, if the dimension of β is high, then it is not easy to find the global minimum quickly and it may require long and careful numerical search. In the appendix, we provide an example of a non-convex profile LS objective function. When the panel regression model is nonlinear such as panel binary choice regression e.g., Chen, Fernandez-Val, and Weidner 04, optimizing a non-convex objective function with 5

6 respect to high-dimensional parameters λ and f can be challenging. 3. Low-rank Regressors and Unknown Number of Factors When estimating a panel regression with interactive fixed effects both the regression coefficients β 0 and the number of factors R 0 are usually unknown. If both N and T are sufficiently large and all factors are sufficiently strong, then it is known how to estimate β 0 and R 0, as long as all regressors are high-rank regressors, by which we mean that the rank of the N T matrix X k is large for all k. Namely, in that case the consistency proofs of Bai 009 and Moon and Weidner 05 are applicable for the LS estimator for β even if the number of factors used in the estimation is too large, that is, if only an upper bound on R 0 is known. The preliminary estimator β pre obtained in that way can be shown to be minn, T -consistent, and one can therefore consistently estimate the number of factors from Y β pre X using methods for pure factors models without regressors, e.g. the procedure in Bai and Ng 00. However, if there are also low-rank regressors, that is, if rankx k is very small for some k, then the LS estimator with interactive fixed effects cannot be used as a preliminary estimator if the number of factors is unknown. In Moon and Weidner 07 we give a consistency proof that allows for low-rank regressors, but only if the number of factors is known, and this LS estimator is indeed not robust towards using the wrong number of factors. To illustrate this, consider the following DGP for the regression model y it = β 0 x it +λ 0,i f 0,t +e it with K = regressors and R 0 = factors, x it = w i v t, λ 0,i = w i + λ i, f 0,t = v t + f t, e it = N T + w i w j /N u js + v s v t /T, j= s= where λ i, ft and u it are all mutually independent and standard normally distributed, and w = w,..., w N and v = v,..., v T are non-random with w = N and v = T. One can just set w and v equal to the vector of ones, in which case x it = becomes a constant regressor. It is easy to verify that this DGP satisfies all regularity conditions for the consistency proof in Moon and Weidner 07, that is, we know from that paper that the LS estimator with correct number of factors R = R 0 = is consistent. However, it can be shown that the LS estimator with interactive fixed effects is inconsistent in this example whenever any other number of factors is chosen, both R = 0 or R >. Specifically, if R = 0 is chosen, then it is easy to see that the resulting OLS estimator converges to β 0 +, because x it and λ 0,i f 0,t have a sample correlation that converges to one. For R > the large N, T limit of the estimator depends on the ratio of N and T, but in any case the limiting profile LS objective 6

7 function turns out to have a local maximum at β 0, instead of a local minimum, and the LS estimator is never consistent for R > see the appendix.. This example illustrates a more general problem: If low-rank regressors are present the LS estimator for β is in general only consistent if we use the correct number of factors R 0 in the estimation, but in order to estimate R 0 consistently, we first need a consistent estimator for β. This argument is obviously circular. To overcome this problem we require an alternative initial estimator for β, which is not the LS estimator with interactive fixed effects. One candidate for this is the common correlated effects estimator of Pesaran 006, but the requirements for this estimator are quite different from the LS estimator, and we will therefore not explore this possibility further here. Preliminary consistent estimators for β 0 which are much more closely related to the LS estimator with interactive fixed effects are the nuclear norm regularized and the nuclear norm minimzation estimator discussed in this paper. Those estimators allow estimation of β without having to specify a number of factors first. 4 Nuclear Norm Regularized Estimation Before we introduce one of our estimators, we first introduce a matrix norm called nuclear norm. For an N T matrix Γ, the nuclear norm of Γ, Γ, is defined by Γ := minn,t r= s r Γ = Tr [ ΓΓ /] = sup TrB Γ, B where B is the spectral norm of matrix B. In literature, the nuclear norm is also called as trace norm, Schatten -norm, or Ky Fan n-norm. An important property of the nuclear norm is that Γ is convex in Γ. Also, denoting A as the Frobenius norm of matrix A, we have A A A and TrA B A B. Our first estimator minimizes the least squares objective function Lβ, Γ with regularization of the nuclear norm of the nuisance parameters Γ. Define Q ψ β, Γ = Lβ, Γ + ψ Γ, where ψ = ψ is a regulization parameter such that plim N,T ψ = 0 The nuclear norm regularized estimators of β 0, Γ 0 are β ψ, Γ ψ = argmin β,γ Q ψ β, Γ. 4. 7

8 An important feature of the objective function Q ψ β, Γ is its convexity in β, Γ. This is true because the least squares objective function Lβ, Γ is convex in β, Γ and the penalty term Γ is also convex in Γ. Therefore, the profile penalized objective function min Γ R N T Q ψ β, Γ is also convex in β. However, since the nuclear norm Γ is not globally differentiable, the penalized objective function Q ψ β, Γ and its profile function min Γ R N T Q ψ β, Γ are not globally differentiable. When the outcome equation is linear in β as., we can compute β ψ, Γ ψ in 4. easily. First, we show that in the linear regression of. it is possible to derive the profiled penalized objective function and the penalized estimator of Γ 0, Γ ψ, in closed forms. Lemma. a The profile penalized objective function, min Γ R N T Q ψ β, Γ, has the following closed form: min Q ψ β, Γ Γ R N T = minn,t r= [ min { ψ, s r Y β X } + ψ max {0, s r Y β X ψ }]. b For given β, let U Y β X and V Y β X be left and right singular matrices of Y β X, respectively. Then, Γ ψ = U Y βψ X diagˆγv Y β ψ X, where ˆγ r = max [0, s r Y β ] ψ X ψ. Having min Γ R N T Q ψ β, Γ in the closed form and noticing that it is convex in β, one can compute β ψ using various optimizing algorithms for a convex function e.g., see chapter 5 of Hastie, Tibshirani, and Wainwright 05. When the dimension of β is small, one even may use a grid search method. 4. Consistency of Γ ψ and β ψ In this subsection we establish consistency of β ψ, Γ ψ. Suppose that Γ 0 0. We will discuss the case of Γ 0 later in the remark that follows Theorem. One of a critical condition that is required for consistency of Γ ψ and consequently of β ψ is the following condition called restricted strong convexity. 8

9 Assumption Restricted Strong Convexity. Let C = { Θ R N T M λ0 ΘM f0 3 Θ M λ0 ΘM f0 }. Let there exists µ > 0, independent from N and T, such that for any θ R with matθ C we have θ M x θ µ θ θ, for all N, T. This condition assumes that the quadratic term, γ γ 0 M x γ γ 0, of the profile µ likelihood function, min β Lβ, Γ, is bounded below by a strictly convex function, γ γ 0 γ γ 0, if Γ Γ 0 belongs in the cone C. Notice that without any restriciton on the parameter, we cannot find a strictly positive constant µ > 0 such that min Γ γ γ 0 M x γ γ 0 µγ γ 0 γ γ 0. Assumption assumes that if we restrict the parameter set to be C, then we can find a strictly convex lower bound of the quadratic term of the profile likelihood. The restricted strong convexity condition in Assumption corresponds to the restricted strong convexity condition in Negahban, Ravikumar, Wainwright, and Yu 0, and it plays the same role as the restricted eigenvalue condition in recent LASSO literature e.g., see Candes and Tao 007 and Bickel, Ritov, and Tsybakov 009. In the following subsection we discuss Assumption in more detail and provide further sufficient conditions under which the assumption is satisfied. Lemma Convergence Rate of Γ ψ. Let Assumption holds and assume that Then we have ψ matm xe. 4. Γ ψ Γ 0 3 R 0 ψ. µ Next we introduce further regularity conditions for the consistency of Γ ψ and β ψ. Assumption Regularity Conditions. i E = O p maxn, T /, ii e x = O p, iii x x p Σ x > 0, iv ψ = c maxn,t /, where c and ψ 0. Remarks on Assumption The conditions in Assumption are weak and quite general. a Various examples of E that satisfy Assumption i can be found in Moon and Weidner 07. These include weak dependent errors and nonidentical but independent sequence of errors. 9

10 b Assumption ii is satisfied if the regressors are exogeneous with respect to the error, Ex it e it = 0, and x it e it are weakly correlated over t and across i so that N i,j= T t,s= Ex k,it x l,js e it e js is bounded asymptotically. c Assumption iii is the standard full rank condition of the regressors. d Assumption iv restricts the choice of the regulation parameter ψ. The requirement of c in Assumption iv is necessary to satisfy the regularity condition 4. of Lemma. To see this in more detail, since matm x e = E K Ê k = X k x k x k x ke, we have Then, choosing c + Op E makes ψ satisfy 4. with probability approaching one. K matm x e = E Ê k k= K E + Êk = E + k= E + O p E K X k x k x k k=. x k e k= Êk with The following theorem shows the consistency of Γ ψ and β ψ. Theorem. Let Assumption hold with µ independent of the sample size. Let Assumption hold. Then we have a Γψ Γ 0 O p R 0 c ψ = O p. minn, T / b β ψ β 0 Op R 0 c ψ = O p. minn, T / Remark A special case is when Γ 0 = 0. In this case, with the regulation parameter ψ in 4., it follows Γ ψ Γ 0 = A proof of this available in the appendix. In this case, the regularized estimator of β becomes the pooled OLS estimator, β ψ = x x x y. 0

11 4. Sufficient Conditions for Restricted Strong Convexity In this section we discuss Assumption in more detail. The requirement in Assumption is to take a lower bound of θ M x θ with strictly convex function. To have some intuition, suppose that the regressor is scalar and assume that X = x x / = without loss of generality because the projection operator M x is invariant to the scale change. Also assume that θ 0. Then, θ M x θ = θ θ θ x = θ θ θ x θ θ = θ θ x x x θθ θ θ x θ θ min θ C x θ. In this case, if the limit of the distance between the regressor and the restricted parameter set is positive, Assumption is satisfied if µ := lim inf N,T min θ C x θ, the distance of the normalized regressor x and convex cone C is positive. An obvious necessary condition for this is that the normalized regressor does not belong in the cone C, that is, M λ0 XM f0 > 3 X M λ0 XM f0. The following Lemma 3 generalizes this. Let HA, C be the distance between a matrix A R N T and the cone C, HA, C := min B C TrA B A B. Assumption 3. We assume that there exists a positive constant µ > 0 such that for any α R K with α x x α =, the regressors X,..., X K satisfy X H α, C µ > 0 wpa. Lemma 3. Suppose that Assumption 3 holds. Then, there exists a positive constant µ > 0 such that for any Θ C. θ M x θ µ θ θ Assumption 3 is a high level condition for the restricted strong convexity. In what follows we introduce a set of lower level conditions under which Assumption 3 is satisfied. The lower

12 level conditions restrict the singular values of M λ0 α XM f0. Assumption 4. Suppose that α R K with α x x α =. Let s s s 3... s minn,t 0 be the singular values of the N T matrix M λ0 α XM f0. We assume that the following conditions hold uniformly in α with α x x α =. i α X = O p. ii minn,t r=q s r c > 0 wpa. iii q r= s r s q P. Lemma 4. Suppose that an N T random matrix α X with α R K with α x x α = satisfies Assumption 4. Then, it satisfies Assumption 3 and thus also Assumption with µ = c. Remarks a When X is a high rank regressor and s q s are of an order O p minn, T, we can choose, for example, q = minn, T /, for N, T converging to infinity at the same rate. Then, it is easy to verify those sufficient condition i, ii and iii for e.g. X it i.i.d.n 0, σ from well-known random matrix theory results. More generally, we can explicitly verify i, ii and iii if X has an approximate factor structure X = λ x f x + E x. b For a low-rank regressor with rankx = we have singular values s = M λ0 XM f0 and s r = 0 for all r. In that case [ we find that a = s, b ] = 0, and we have aq = 0 and bq = b = 3 R 0 s X M λ0 XM f0 for all q. Also, a b. Therefore ] min [aq + max {0, bq} = min { a, max{0, b} } q {,,...,minn,t } Thus, Assumption 3 is satisfied if wpa we have [ M λ0 XM f0 3 ] R 0 X M λ0 XM f0 c > 0 for some constant c. This last condition simply demands that the part of X that cannot be explained by λ 0 and f 0 needs to be sufficiently larger than the part of X that can be explained by either λ 0 or f 0. This is a sufficient condition for Assumption.

13 An analysis that is specialized towards low-rank regressors will likely give a weaker condition for Assumption in this case. 5 Nuclear Norm Minimization Estimation In this section we consider the estimator β = argmin β Y β X. 5. To motivate the nuclear norm minimization estimation in 5., let Γ ψ β := argmin Γ Q ψ β, Γ. For given N, T, suppose that we choose a very small regulization penalty ψ > 0 such that ψ min r s r Y β X. Then, since Γ ψ = Y β X ψ U β V β by Lemma, we have Q ψ β, Γ ψ = = ψ Uβ V β + ψ Y β X ψ Uβ V β ψ + ψ Y β X ψ Uβ V β. Then, for fixed N, T, letting ψ 0, we have uniformly in β. 3 lim ψ 0 ψ Q ψ β, Γ ψ β = Y β X 5. An important practical advantage of β over both β LS and β ψ is that neither a number of factors R 0 nor a regularization parameter ψ needs to be chosen. However, a drawback of the estimator β is that for the consistency of β we do not allow a low rank regressor In what follows we consider a special case where K = for an illustrative purpose. A general case of multiple regressors will be presented in the appendix and we provide a proof of the theorem under the general assumption. Assumption 5. Suppose that K =. As N, T with N > T, the following conditions are satisfied; 3 In this case, β = lim ψ 0 βψ 5.3 for fixed N, T. However, this relation between β ψ and β is not actually important for the following discussion of β, one can simply consider 5. as the definition of β. 3

14 i We assume E = O p N. ii We assume that there exists finite positive constant c up such that wpa we have T N E c up. iii We assume X = O p. vi Let U E S E V E be the singular value decomposition of M λ 0 EM f0 4. We assume Tr X U E V E = O p. v We assume that there exists a constant c low > 0 such that wpa T N / M λ0 XM f0 c low. vi Let U x S x V x = M λ0 XM f0 be the singular value decomposition of the matrix M λ0 XM f0. We assume that there exists c x 0, such that wpa Tr U EU x S x U xu E c x TrS x. Theorem. Under Assumption 5, T β β 0 = O p. Remarks on Assumption 5 a We consider a limit with N > T here. Alternatively, we could consider a limit with T < N, but then we also need to replace N by T, and X by X in the remaining assumptions. b Conditions i and ii are satisfied under weak restrictions on the error term e it. Various examples of the DGP s for condition i are found in Moon and Weidner 07. c Condition iii is standard. It is satisfied if X = O p, which is satisfied if sup it Ex it is finite. 4 That is, U E S E V E = M λ 0 EM f0 and U E is an N rankm λ0 EM f0 matrix of singular vectors, S E is a rankm λ0 EM f0 rankm λ0 EM f0 diagonal matrix, and V E is an T rankm λ0 EM f0 matrix of singular vectors. 4

15 d Condition iv is a high level condition and will be satisfied if sup E V E,rX U E,r M 5.4 r for some finite constant M, where U E,r and V E,r are the r th columns of U E,r and V E, respectively. An example of DGP s of X and E that satisfies condition 5.4 is Assumption LL i and ii in Moon and Weidner 05. A proof for this is similar to that of Lemma A.4 and we omit it. This example allows the regressor to be lagged dependent. e Condition v rules out low-rank regressors, for which we typically have M λ0 XM f0 = O p, but is satisfied generically for high-rank regressors, for which M λ0 XM f0 has T singular values of order N, so that M λ0 XM f0 is of order T N. It is not surprising that we need to rule out low-rank regressors here, because the estimator β does not use any information of R 0, so that Γ 0 cannot be distinguished from a low-rank regressor. 6 Post Nuclear Norm Regularization/Minimization Estimation In Sections 4 and 5 we showed β ψ and β are consistent but with a much slower convergence rate than that of the LS estimator β LS. In this section we investigate how to establish an estimator that is asymptotically equivalent to the LS estimator and yet easy to compute. Our suggestion is to use β ψ and β as a preliminary estimator and iterate estimating Γ 0 = λ 0 f 0 and β 0. First we consider the case where R 0 is known. When R 0 is unknown, we use a consistent estimate of R 0. After we introduce the main result of the section we discuss how to estimate R 0 consistently. Our iteration method is defined as follows. We start with s =. Step : In the first step s =, we set β s := β ψ or β. Step : We estimate the factor loadings and the factors of the s step residuals Y β s X by the principle component method: λ s+, f s+ Y := argmin λ R N R 0,f R T R 0 βs X λf. 5

16 Step 3: We update the s-stage estimator β s by β s+ := argmin β min Y X β λ s+ G + H f s+ G,H = x M f s+ M λs+ x x M f s+ M λs+ y. 6. Step 4: Then, increase s to s + and repeat Steps,3, and 4. In the following theorem we show that after the two iterations, βs with s = 3, 4... achieves asymptotic equivalence to the LS estimator β LS. Theorem 3.. Let Assumption hold with µ independent of the sample size. Let Assumption hold with c slowly such that Lemma 0 in the appendix. Then, as N, T with N T a c 4 κ with 0 < κ <, we have c β β 0 = O p 0. Also assume the assumptions in b βs β LS = O p for all s = 3, 4,... When R 0 is unknown, we use its consistent estimate, say R, in Step. One can estimate R 0 in various ways. With the nuclear norm regulization frame, we may consider the following. that Suppose that ψ is the regularization parameter used in β ψ. Choose ψ 0 such ψ Then, we estimate R 0 by = o, R ψ := minn,t r= ψ ψ = o, maxn, T ψ 6. I {s r Y β } ψ X ψ. 6.3 Lemma 5. Assume the assumptions of Theorem. Also, assume the strong factor assumption. Then, P{ R ψ R 0 } 0. 6

17 Table : Monte Carlo Results N, T POLS βls βψ β ψ 50,50 bias s.d ,00 bias s.d β 3 ψ β 4 ψ 7 A Small Scale Monte Carlo Simulations We consider a simple linear model with one regressor and two factors: y it = β 0 x it + λ 0,ir f 0,tr + e it, r= x it = + e x,it + λ 0,ir + λ x,ir f 0,tr + f 0,t r, r= where f 0,tr iidn0, and λ 0,ir, λ x,ir iidn,, and e x,it, e it iidn0,, and mutually independent. We set N, T = 50, 50, 00, 00 and ψ = logn / maxn,t. With this design, we compare the finite sample properties of β LS, β ψ, β, and the post β ψ 3 4 estimators, β ψ, and β ψ post estimates based on β is very similar and we omit them. based on β ψ as a preliminary estimator. The results for the 8 Conclusions In this paper we analyze two new estimation methods of interactive fixed effect panel regressions: i nuclear norm penalized estimation and ii nuclear norm minimizing estimation. These new estimation methods optimize convex functions of the parameters and can be applied to models where the LS estimator cannot estimate the parameter consistently. We establish consistency of the new estimators of the regression coefficients, and we show how to use them as a preliminary estimator to achieve asymptotic equivalence to the LS estimator. There are several ongoing extensions including developing a unified method to deal with heterogeneous coefficients, sieve estimation of nonparametrics, high-dimensional regressors, and data dependent choice of the penalty term and inferences. 7

18 Appendix: Proofs and Technical Derivations A Motivating Examples A. Example of Non-convex LS Profile Objective Function As an example of the non-convex LS profile objective function in 3., we consider the following linear model with one regressor and two factors: y it = β 0 x it + λ 0,ir f 0,tr + e it r= x it = 0.04e x,it + λ 0,i f 0,t + λ x,i f x,t, where λ 0,i = λ 0,i, λ 0,i iidn,, f 0,t = f 0,t, f 0,t iidn,, λ x,i iid χ, f x,t iid χ, x it, e it iidn0,, and {λ 0,i }, {f 0,t }, {λ x,i }, {f x,t }, {e x,it }, {e it } are independent each other. With N, T = 00, 00, we generate the panel data y it, x it and plot the LS objective function in the following figure. 8

19 A. Example of Inconsistent LS Estimation We plot some realizations of the LS objective function of the low rank regressor example in Section 3.. We also plot the nuclear norm regularized objective functions with various regularization penalty parameters. 9

20 B Useful Facts on Matrix Algebra We introduce some useful facts used in the proofs. Lemma 6. Suppose that A and B are two matrices with ranks of A and B are ranka and rankb, respectively. i A A A ranka A ranka A. ii AB A B. iii AB A B A B. iv AB = 0, A B = 0 implies A + B = max{ A, B }. v A B = 0 implies A + B A + B. Lemma 7. Suppose that N T matrix A has a singular value decomposition, A = U A S A V A, where U A R N minn,t, V A R T minn,t, and S A R minn,t minn,t is the diagonal matrix whose elements are s A,..., s minn,t A. Suppose that S R minn,t minn,t is the diagonal matrix consisting of s s minn,t 0. Then, Proof. Since min min U R N minn,t : U U=I V R T minn,t : V V =I A USV = S A S. we have A USV = TrA A TrA USV + TrS = TrS A TrA USV + TrS, min min U R N minn,t : U U=I V R T minn,t : V V =I = TrS A + TrS max A USV max U R N minn,t : U U=I V R T minn,t : V V =I TrA USV. B. According to Lemma 8, for any U U = I and V V = I, TrA USV is bounded uniformly by TrS A S because TrA USV = minn,t k= minn,t k= s k Aσ k USV s k As k = TrS A S. 0

21 Therefore, max max U R N minn,t : U U=I V R T minn,t : V V =I TrA USV TrS A S. For an lower bound of TrA USV, we choose U = U A and V = V A, which leads B. max max U R N minn,t : U U=I V R T minn,t : V V =I TrA USV TrA U A SV A = TrV A S A U AU A SV A = TrS A S. B.3 From B. and B.3, we have max max U R N minn,t : U U=I V R T minn,t : V V =I TrA USV = TrS A S, and so B. = TrS A + TrS TrS A S = S A S, as required. Lemma 8. Let A, B be N T matrices whose singular values are s A s minn,t A and s B s minn,t B, respectively. Then, TrA B minn,t s k As k B. k= Proof. See Theorem of Horn and Johnson 99. C Proofs of Section 4 Recall that the rank of Γ 0 = λ 0 f 0 is R 0 minn, T. Throughout the rest of the appendix, we use the following singular value decomposition of Γ 0, Γ 0 = USV, C. where U R N R 0 with U U = I R0, V R T R 0 with V V = I R0, S is the R 0 R 0 diagonal matrix of singular values of Γ 0. Suppose that f 0 is normalized as T f 0 f 0 = I R0. Then, we have f 0 = T V λ 0 = US T. First, we provide a proof of Lemma. Proof of Lemma.

22 Part a. Notice that min Q ψβ, Γ Γ R N T = min Γ R N T = min { γ R minn,t γ 0} minn,t r= { } [s r Y β X Γ] + ψ s r Γ min min { U R N minn,t U U=I N} { V R T minn,t V V =I T } = min { γ R minn,t γ 0} = min { γ R minn,t γ 0} = = minn,t r= minn,t r= min { α R α 0} minn,t r= minn,t r= minn,t r= { [sr Y β X UdiagγV ] + ψ γ r } { [sr Y β X UY β X diagγv Y β X] + ψ γ r } {[s r Y β X γ r ] + ψ γ r } { } [s r Y β X α] + ψ α [ { min ψ, s r Y β X } ] + ψ max {0, s r Y β X ψ }. Here, in the first step we rewrote the sum of squared residuals as the sum of squared singular values. In the second step we introduced the singular value decomposition Γ = UdiagγV, thus replacing the minimization over Γ by a minimization over the singular vector matrices U and V and the singular value vector γ. In the third step we use that for any given γ the minimizing U and V are equal to the corresponding singular vector matrices of Y β X, denoted by U Y β X and V Y β X as follows by B.3 in the proof of Lemma 7 below and Lemma 7. Here notice also that the minimizing singular vectors might not be unique, in particular they are not unique for γ r = 0.. The fourth step follows by Lemma 7. The fifth step interchanges the sum and the minimization. The final step solves the minimization problem over α the optimal α is α = max{0, s r Y β X ψ }. Then, we have the required result for Part a. Part b. The required result of Part b follows as a corollary of Step 3 and Step 6 of the proof of Part a. Next, we introduce some notation we will use in the following proofs. Let QΓ := inf β Q ψβ, Γ LΓ := inf β Lβ, Γ

23 be the profile objective functions of Q ψ β, Γ and Lβ, Γ, respectively, that concentrate out parameter β. We use notation Θ := Γ Γ 0 and θ := vecθ. Proof of Lemma. # Step : Use assumption i to show Θ ψ C 0 By definition, we have 0 QΓ 0 + Θ ψ QΓ 0 = LΓ 0 + Θ ψ LΓ 0 + ψ Γ 0 + Θ ψ Γ 0. Let Θ ψ := Γ ψ Γ 0, θ ψ := vec Θ ψ, Θ ψ, := M U0 Θψ M V0 and Θ ψ, := Θ ψ M U0 Θψ M V0 Then LΓ 0 + Θ ψ LΓ 0 = θ ψ M x θ ψ e M x θψ e M x θψ = Tr Θ ψ matm xe Θ ψ matm x e ψ Θ ψ ψ Θ ψ, ψ Θ ψ,. Here the first inequality holds since θ ψ M x θ ψ 0, the second inequality holds by the Hölder inequality, the third inequality holds by assumption i, and the last inequality holds by the triangle inequality. We furthermore have ψ Γ 0 + Θ ψ Γ 0 = ψ Γ 0 + Θ ψ, + Θ ψ, Γ 0 ψ Γ 0 + Θ ψ, Γ 0 ψ Θ ψ, = ψ Θ ψ, ψ Θ ψ,. Therefore, 0 LΓ 0 + Θ ψ LΓ 0 + ψ Γ 0 + Θ ψ Γ 0 ψ Θ ψ, ψ Θ ψ, + ψ Θ ψ, ψ Θ ψ, = ψ Θ ψ, 3 Θ ψ,. 3

24 Thus, we have Θ ψ C := { B R N T M U BM V 3 B M U BM V }. # Step : Also use assumption ii to show the final result: Using assumption ii and the same derivation as above we find QΓ 0 + Θ ψ QΓ 0 θ ψ M x θ ψ e M x θψ Because 0 QΓ 0 + Θ ψ QΓ 0 we thus have µ Θ ψ + ψ µ Θ ψ 3 ψ Θ ψ,. µ Θ ψ 3 ψ Θ ψ, 0. Θ ψ, 3 Θ ψ, Since the rank of Θ ψ, is at most R 0 e.g., see Recht, Fazel, and Parrilo 00, we have Θ ψ, R 0 Θ ψ, and we also have Θ ψ, Θ ψ. Therefore, Θ ψ 3 ψ R0 Θ ψ 0, µ and therefore Θ ψ 3 ψ R0. µ Proof of Theorem. Part a follows by Lemma 3 and the definition of ψ in Assumption. Part b. Let ˆβΓ = x x x y γ. Then, by definion we have ˆβ ψ β 0 := ˆβˆΓ ψ β 0 = x x x e x ˆγ ψ γ 0. 4

25 Under Assumption, x x = Op and e x = O p. Also, by Part a we have x ˆγ ψ γ 0 X ˆΓ ψ Γ 0 = O p maxn, T / ψ = O p. Combining these, we can deduce the required result for Part b. Proof of 4.3 in Remarks. Since M x is positive semi-definite and e M x γ ψ Γ ψ matm x e by Hölder inequality, we have 0 Q Γ ψ QΓ 0 = γ ψ γ 0 M x γ ψ γ 0 e M x γ ψ γ 0 + ψ Γ ψ Γ 0 e M x γ ψ γ 0 + ψ Γ ψ Γ 0 Γ ψ Γ 0 matm x e + ψ Γ ψ Γ 0 = ψ matm x e Γ ψ Γ 0. The required result follows since ψ matm x e > 0. Proof of Lemma 3. Recall the definition x = [x,..., x K ], K, where x k = vecx k. Case i. If θ = 0, the required result holds for any constant µ > 0. Now assume that θ 0. Case ii. If θ x = 0, then the required result holds for µ = because Case iii. We assume that θ 0 and θ x 0. Also, x 0. θ θ θ xx x x θ = θ θ. 5

26 Define x θ = Pxθ P xθ, and X θ := mat x θ. Then, for any Θ C and Θ 0, we have θ θ θ xx x x θ = θ θ θ x θ x θ θ by the definition of x θ = Θ θ x θ x θ θ θ since θ 0 θ = Θ x θθ θ θ θ x θ = Θ xθ P θ x θ Θ min x θ veca A C = Θ = Θ x θ x θ x θθ θ θ θ x θ H X θ, C, C. where the inequality holds because matp θ x θ C since Θ C and C is a cone. Notice that x θ = P xθ P x θ = x α, where α = x x x θ θ x x x x θ / and α x x α =. This implies X θ = α X with α α =. Therefore, we have C. Θ min α x x α= H α Then, the required result of the lemma follows by Assumption 3. X, C. Proof of Lemma 4. For notational simplicity, we write X α for α X. For given N T matrix X, and N R 0 matrix λ 0, and T R 0 matrix f 0, we want to find a lower bound on ν := min X α Θ Θ R N T s.t. M λ0 ΘM f0 3 Θ M λ0 ΘM f0. 6

27 By definition, we have X α Θ = M λ0 X α M f0 M λ0 ΘM f0 + X α M λ0 X α M f0 Θ M λ0 ΘM f0 Also, rankθ M λ0 ΘM f0 R 0 e.g., see Lemma 3.4 of Recht, Fazel, and Parrilo 00, and therefore Θ M λ0 ΘM f0 R 0 Θ M λ0 ΘM f0. Using this we find. ν min Θ R N T { } M λ0 X α M f0 M λ0 ΘM f0 + X α M λ0 X α M f0 Θ M λ0 ΘM f0 s.t. M λ0 ΘM f0 3 R 0 Θ M λ0 ΘM f0. Here, we have weakened the constraint allowing more values for Θ, and the minimizing value therefore weakly decreases. It is easy to see that for ω 0 we have X α M λ0 X α M f0 ω = min X α M λ0 X α M f0 Θ M λ0 ΘM f0 Θ R N T s.t. Θ M λ0 ΘM f0 = ω, because the optimal Θ M λ0 ΘM f0 here equals equals X α M λ0 X α M f0 rescaled by a non-negative number. We therefore have ν min ω 0 Let min Θ R N T M λ0 X α M f0 M λ0 ΘM f0 + X α M λ0 X α M f0 ω s.t. M λ0 ΘM f0 3 R 0 ω. M λ0 X α M f0 = be the singular value decomposition of M λ0 X α M f0 minn,t R 0 r= s r v r w r, singular vectors v r R N and w r R T. The optimal M λ0 ΘM f0 has the form minn,t R 0 r= with singular values s r 0 and normalized max0, s r β v r w r, in the last optimization problem 7

28 for some β 0. Here, β = 0 occurs if the constraint is not binding, that is, if M λ0 X α M f0 3 R 0 ω. We therefore have ν minn,t R 0 min ω 0, β 0 r= r= s r max0, s r β + X α M λ0 X α M f0 ω minn,t R 0 s.t. max0, s r β 3 R 0 ω. { } minn,t Here, the optimal ω equals max X α M λ0 X α M f0, 3 R0 R 0 r= max0, s r β, and we thus have ν min β 0 minn,t R 0 r= [ mins r, β + max 0, 3 R 0 minn,t R 0 r= max0, s r β X α M λ0 X α M f0 Let = s 0 > s... s minn,t R0 s minn,t R0 + = 0. For any β 0 there exists q be such that β [s q+, s q. We can therefore write ν min q {0,,,...,minN,T R 0 } min β [s q+,s q + [ q β + minn,t R 0 r=q+ s r { q } ] max 0, 3 s r β I{q } X α M λ0 X α M f0. R 0 Here, for a given q the objective function is strictly increasing, and the optimal β is always equal to the lower bound of the interval [s q+, s q. Using this and also shifting q q we obtain ν r= min aq + [max {0, bq}], q {,,...,minn,t R 0 } ]. where aq = q s q + [ bq = 3 R 0 q minn,t r=q s r, ] s r s q I{q } X α M λ0 X α M f0. r= Notice that aq is nonnegative and weakly decreasing and bq is weakly increasing. Then, for 8

29 any integer valued sequence q between and minn, T R 0 such that bq > 0, min aq + [max {0, bq}] q {,,...,minn,t R 0 } { = min min aq + [max {0, bq}], min q {,,...,q } { } min min aq, min [max {0, bq}] q {,,...,q } q {q +,...,minn,t R 0 } min { aq, bq + }. q {q +,...,minn,t R 0 } aq + [max {0, bq}] } Then, Assumption 4 thus guarantees that ν c. The definition of ν together with Lemma 3 thus guarantees that Assumptions 3 and Assumption are satisfied with µ = c. D Proofs of Section 5 In this section, we introduce conditions for Theorem with K -regressors and provide a proof of the general case. Assumption 6 K-regressor Case. Let there exist symmetric idempotent T T matrices Q k = Q k, such that Q k V = 0, for all k {,..., K}, and Q k Q l = 0, for all k, l {,..., K}. Suppose that N > T. As N, T, we assume the following conditions hold. i,ii Assume Assumption 5i and ii. iii For all k {,..., K}, we assume X k = O p. vi Let U E S E V E be the singular value decomposition of M λ 0 EM f0. We assume Tr X k U EV E = O p for all k {,..., K}. v We assume that there exists a constant c low > 0 such that wpa T N / M U X k M V Q k c low, for all k {,..., K}. vi For k =,..., K let U k S k V k = M UX k M V Q k be the singular value decomposition of the matrix M U X k M V Q k. We assume that there exists c x 0, such that wpa U k U E c x for all k =,..., K Remark For t {,,..., T }, let e t be the t th unit vector of dimension T. For k {,..., K}, let A k = e k T/K +, e k T/K +,... e kt/k be a T T/K matrix, and let P Ak be the 9

30 projector onto the column space of A k. Also define f 0,k = P Ak f 0 and B k = M f0,k A k. Then, for K > one possible choice for Q k in Assumption 6 vi is given by Q k = P Bk = M f0,k P Ak. The discussion of Assumption 6 vi is then analogous to the K = case, except that for the k th regressor only the time periods k T/K + to kt/k are used in the assumption, that is, we need enough variation in the k th regressor within those time periods. Other choices of Q k are also conceivable. Now we prove Theorem under Assumption 6, which is more general than Assumption 5. Proof of Theorem. Before we start the proof, remind the following sigular value decompositions, Γ 0 = USV, M λ0 EM f0 = U E S E V E, and M λ 0 X k M f0 = M U X k M V = U k X k V k used in what follows. The proof consists of two steps. that will be In the first step, we show that the local minimizer that minimizes the objective function Q β in a convex neighborhood of β 0 defined by is T - consistent. B := { β : c x c low c up K β In the second step, we show that the local minimizer is the global minimizer, for which we use convexity of the objective function Q β. Step. By definition of the nuclear norm, we have Q β = Γ 0 + E β X = } sup Tr [ Γ 0 + E β X A ]. {A : A } To obtain a lower bound on Q β we choose the following matrix A in the above minimization, A β = UV + a β U EV E a β K K where M UE = I N U E U E and a β [0, ] is given by k= a β = c x c low c up K β. sgn β k M UE U k V k, 30

31 We have A β, because A β = max UV, a β U EV E a β K sgn β k M UE U k V k K k= max UV, a β U E V E + a β K M K UE U k V k k= { max UV, a β } U E V E + a K β M UE U k V k K =. Here, for the first line, we used that UV is orthogonal to a β U EV E a β K K k= sgn β k M UE U k V k in both matrix dimension, which implies that the spectral norm of the sum is equal to the maximum of the spectral norms see Lemma 6iv. For the second line, we used that the columns of U E V E are orthogonal to the columns of K k= M U E U k V k, which implies that the squared spectral norm of the sum is bounded by the sum of the squared spectral norms see Lemma 6v. For the third, we used that the rows of M UE U k V k and M U E U l V l are orthogonal for k l see Lemma 6v. In the final line we used that UV = U E V E = and that M U E U k V k for all k. With this choice of A = A β we obtain the following lower bound for the objective function; for all β B, Q β Tr [ Γ 0 + E β X A β ] k= = Γ 0 + Tr E UV + Tr [ β X UV ] + a β M UEM V + a β Tr [ β X U E V E ] + a β K where we used the following: K k= β k Tr [ X k M U E U k V k], Tr Γ 0UV = Tr V SU UV = TrS = Γ 0 Tr E U E V E = Tr E MU EV U + M U EV U U E V E = TrMU EV U U E V E = TrS E = M U EM V Tr Γ 0U E V E = Tr V SU U E V E = 0 Tr [ Γ 0M UE U k V k ] [ = Tr V SUMUE U k V k ] [ = Tr V SUUk V k ] [ + Tr V SUPUE U k V k] = 0 Tr [ E M UE U k V k] = 0. We furthermore have Qβ 0 = Γ 0 + E. Thus, applying the assumptions of Lemma??, and also 3

32 using a β a β a4 β, we obtain for β B, Q β Q β 0 Tr [ Γ 0 + E β X A β ] Γ0 + E a β K K k= β k Tr [ X k M U E U k V k ] a β M UEM V Γ 0 + E Γ 0 M U EM V + Tr E UV a4 β M UEM V + a β Tr [ β X U E V E ] [ + Tr β X UV ] =: B B B 3 + B 4 B 5 + B 6. D. Here we bound B from below by B = a β K = a β K a β c x K K k= K k= K k= a β c x c low K a β c x c low K β k Tr [ X k M U E U k V k ] β k [ Tr S k Tr U EU k S k U k U ] E β k Tr S k = a β c x K T N K β k k= T N β, K k= β k M U X k M V Q k where the first inequality holds by Assumption 6vi and the second inequality holds by Assumption 6v. We bound B from above by B = a β M UEM V a β E + P U E + EP V + P U EP V a β E + 3R 0 E a β T cup N + T O p wpa a β T N c up wpa, where the first inequality holds by the triangle inequality, the second inequality holds by Lemma 6i and the third and the fourth inequalities follow by Assumption 5i and ii. 3

33 We bound term B 3 from above by B 3 = Γ 0 + E Γ 0 M U EM V E M U EM V = P U E + EP V P U EP V P U E + EP V + P U EP V 3R 0 E O p N where the second inequality holds by the triangle inequality and the third inequality holds by Lemma 6i. For B 4, by Hölder s inequality we have B 4 = Tr E UV E UV = O p N. For B 5, denoting O P + as a stochastically strictly positive and bounded term and using similar arguments for the bound of term B, we obtain B 5 = a4 β M UEM V = O P + a 4 β T N = O P + β 4 T N. For B 6, we have B 6 = a β Tr [ β X U E V E ] [ + Tr β X UV ] = O p β, where the last equality holds since TrX k U E V E = O p by Assumption 6vi, and TrX k UV X k UV = O p under Assumption 6iii. Notice that our choice for a β above is such that a β c x c low K β a β c up is maximized, which guarantees that B B is positive, namely B B T N c xc low K c up β. Combining the above, for any β B, we have T N {Q β Q β 0 } c xc low β + O p β + O p T + O P + β 4, K c up T which holds uniformly over β B i.e. none of the constants hidden in the O p. notation depends on β. Let β := argmin β B Q β be the local minimizer in a convex neighbood B of β 0. Notice that since β 0 B, Q β Q β 0 33

34 by definition. Therefore, we have This implies 0 T N Q β Q β 0 c xc low K c up β β 0 + O p O P + T T β β 0 + O p T + O P + β β 0 4. c x c low + O P + β β 0 β β 0 + O p β β 0 K c up T c xc low K c up β β 0 + O p T β β 0. From this we deduce β β 0 = O p. D. T Step. Let β B, that is, α β =. Write β := β β 0. From D. with a β =, we can bound Q β Q β 0 from below by T N Q β Q β 0 c x c low β K c up + O p β T + O p T + O P + β 4 = 4 c up + O p T cup K c up K + O p + O P + T c x c low c x c low > 0 wpa, where the equality holds since β = cup K c x c low. Since Q β is convex and has unique minimum, the local minimum at β is is also the global minimum asymptotically. Therefore, asymptotically β = β wpa. Then, combining this with the T consistency result of the local minimizer in D., we have the required result of the theorem. E Proofs of Section 6 Before we start the proof of Theorem 3, we introduce several helpful lemmas. 34

35 For parameter β, define λβ, fβ := argmin λ R N R 0,f R T R 0 Y β X and the corresponding residuals Êβ := Y β X λβ fβ. Define M λβ := I N λβ λβ λβ λβ, M f β := I N fβ fβ fβ fβ. E. Lemma 9. Assume the assumptions of Theorem S.9. of the supplementary appendix of Moon and Weidner 07. Then, we have the following expansions M λβ = M λ0 + M K λ,e + M λ,e β k β 0,k M k= M f β = M f0 + M K f,e + M f,e β k β 0,k M k= λ,k + Mrem λ f,k + Mrem f where the spectral norms of the remainders satisfy for any series η 0: sup {β: β β 0 η } sup {β: β β 0 η } β, β, M rem β λ β β 0 + / E β β 0 + 3/ E 3 = O p, M rem β f β β 0 + / E β β 0 + 3/ E 3 = O p, and the expansion coefficients are given by M λ,e = M λ 0 E f 0 f 0f 0 λ 0λ 0 λ 0 λ 0 λ 0λ 0 f 0f 0 f 0 E M λ0, M λ,k = M λ 0 X k f 0 f 0f 0 λ 0λ 0 λ 0 λ 0 λ 0λ 0 f 0f 0 f 0 X k M λ 0, M λ,e = M λ 0 E f 0 f 0f 0 λ 0λ 0 λ 0 E f 0 f 0f 0 λ 0λ 0 λ 0 + λ 0 λ 0λ 0 f 0f 0 f 0 E λ 0 λ 0λ 0 f 0f 0 f 0 E M λ0 M λ0 E M f0 E λ 0 λ 0λ 0 f 0f 0 λ 0λ 0 λ 0 λ 0 λ 0λ 0 f 0f 0 λ 0λ 0 λ 0 E M f0 E M λ0 M λ0 E f 0 f 0f 0 λ 0λ 0 f 0f 0 f 0 E M λ0 + λ 0 λ 0λ 0 f 0f 0 f 0 E M λ0 e f 0 f 0f 0 λ 0λ 0 λ 0, 35

36 analogously M f,e = M f 0 E λ 0 λ 0λ 0 f 0f 0 f 0 f 0 f 0f 0 λ 0λ 0 λ 0 E M f0, M f,k = M f 0 X k λ 0 λ 0λ 0 f 0f 0 f 0 f 0 f 0f 0 λ 0λ 0 λ 0 X k M f0, M f,e = M f 0 E λ 0 λ 0λ 0 f 0f 0 f 0 e λ 0 λ 0λ 0 f 0f 0 f 0 + f 0 f 0f 0 λ 0λ 0 λ 0 E f 0 f 0f 0 λ 0λ 0 λ 0 E M f0 M f0 E M λ0 E f 0 f 0f 0 λ 0λ 0 f 0f 0 f 0 f 0 f 0f 0 λ 0λ 0 f 0f 0 f 0 E M λ0 E M f0 M f0 E λ 0 λ 0λ 0 f 0f 0 λ 0λ 0 λ 0 E M f0 + f 0 f 0f 0 λ 0λ 0 λ 0 E M f0 E λ 0 λ 0λ 0 f 0f 0 f 0. Lemma 0. Suppose that the assumptions of Theorem S.9. of the supplementary appendix of Moon and Weidner 07 and of Theorem 4.4 hold. Then, the following hold. a M λ,e, M f,e = O pn /. b M λ,e, M f,e = O pn. c M λ,k, M f,k = O p for all k =,..., K. Proof. See the proof of Corollary S.0. in the supplementary appendix of Moon and Weidner 07, which is available in Moon and Weidner 05. Suppose that β s is a preliminary estimator of β 0 such that β s β 0 = O p δ s with δ s 0. Since y = xβ 0 + f 0 λ 0 veci R0 + e, we can write β s+ β 0 [ = x M ˆf s+ Mˆλs+ [ x M ˆf s+ Mˆλs+ ] x f 0 λ 0 veci R0 + x M ˆf s+ Mˆλs+ ] e. Lemma. Under the Assumptions in Theorem 3, the following hold. a Suppose that δ s 0. Then, M x ˆf s+ Mˆλs+ x = x M f0 M λ0 x + O p. For δ = c maxn,t /, we have / b c x M ˆf Mˆλ f 0 λ 0 veci R0 = O p, 36

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models

More information

Nuclear Norm Regularized Estimation of Panel Regression Models arxiv: v1 [econ.em] 25 Oct 2018

Nuclear Norm Regularized Estimation of Panel Regression Models arxiv: v1 [econ.em] 25 Oct 2018 Nuclear Norm Regularized Estimation of Panel Regression Models arxiv:80.0987v [econ.em] 25 Oct 208 Hyungsik Roger Moon Martin Weidner October 26, 208 Abstract In this paper we investigate panel regression

More information

Random Matrix Theory and its Applications to Econometrics

Random Matrix Theory and its Applications to Econometrics Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of

More information

Dynamic Linear Panel Regression Models with Interactive Fixed Effects

Dynamic Linear Panel Regression Models with Interactive Fixed Effects Dynamic Linear Panel Regression Models with Interactive Fixed Effects Hyungsik Roger Moon Martin Weidner The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP47/4 Dynamic

More information

Likelihood Expansion for Panel Regression Models with Factors

Likelihood Expansion for Panel Regression Models with Factors Likelihood Expansion for Panel Regression Models with Factors Hyungsik Roger Moon Martin Weidner April 29 Abstract In this paper we provide a new methodology to analyze the Gaussian profile quasi likelihood

More information

Likelihood Expansion for Panel Regression Models with Factors

Likelihood Expansion for Panel Regression Models with Factors Likelihood Expansion for Panel Regression Models with Factors Hyungsik Roger Moon Martin Weidner May 2009 Abstract In this paper we provide a new methodology to analyze the Gaussian profile quasi likelihood

More information

Linear regression for panel with unknown number of factors as interactive fixed effects

Linear regression for panel with unknown number of factors as interactive fixed effects Linear regression for panel with unknown number of factors as interactive fixed effects Hyungsik Roger Moon Martin Weidner The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper

More information

Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects

Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects Hyungsik Roger Moon Martin Weidner December 26, 2014 Abstract In this paper we study the least squares (LS) estimator

More information

Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects

Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects Linear Regression for Panel with Unnown Number of Factors as Interactive Fixed Effects Hyungsi Roger Moon Martin Weidne February 27, 22 Abstract In this paper we study the Gaussian quasi maximum lielihood

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Supplementary Material. Dynamic Linear Panel Regression Models with Interactive Fixed Effects

Supplementary Material. Dynamic Linear Panel Regression Models with Interactive Fixed Effects Supplementary Material Dynamic Linear Panel Regression Models with Interactive Fixed Effects Hyungsik Roger Moon Martin Weidner August 3, 05 S. Proof of Identification Theorem. Proof of Theorem.. Let Qβ,

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

1 Estimation of Persistent Dynamic Panel Data. Motivation

1 Estimation of Persistent Dynamic Panel Data. Motivation 1 Estimation of Persistent Dynamic Panel Data. Motivation Consider the following Dynamic Panel Data (DPD) model y it = y it 1 ρ + x it β + µ i + v it (1.1) with i = {1, 2,..., N} denoting the individual

More information

Forecast comparison of principal component regression and principal covariate regression

Forecast comparison of principal component regression and principal covariate regression Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments CHAPTER 6 Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments James H. Stock and Motohiro Yogo ABSTRACT This paper extends Staiger and Stock s (1997) weak instrument asymptotic

More information

Matrix Completion Methods for Causal Panel Data Models

Matrix Completion Methods for Causal Panel Data Models Matrix Completion Methods for Causal Panel Data Models arxiv:1710.10251v1 [math.st] 27 Oct 2017 Susan Athey Mohsen Bayati Nikolay Doudchenko Guido Imbens Khashayar Khosravi Current version October 2017

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations C. David Levermore Department of Mathematics and Institute for Physical Science and Technology

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II Jeff Wooldridge IRP Lectures, UW Madison, August 2008 5. Estimating Production Functions Using Proxy Variables 6. Pseudo Panels

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Multiple Regression Model: I

Multiple Regression Model: I Multiple Regression Model: I Suppose the data are generated according to y i 1 x i1 2 x i2 K x ik u i i 1...n Define y 1 x 11 x 1K 1 u 1 y y n X x n1 x nk K u u n So y n, X nxk, K, u n Rks: In many applications,

More information

GMM estimation of spatial panels

GMM estimation of spatial panels MRA Munich ersonal ReEc Archive GMM estimation of spatial panels Francesco Moscone and Elisa Tosetti Brunel University 7. April 009 Online at http://mpra.ub.uni-muenchen.de/637/ MRA aper No. 637, posted

More information

Matrix Support Functional and its Applications

Matrix Support Functional and its Applications Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016 Connections What

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed Effects

Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed Effects Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed Effects un Lu andliangjunsu Department of Economics, HKUST School of Economics, Singapore Management University February, 5 Abstract

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Matrix Algebra, part 2

Matrix Algebra, part 2 Matrix Algebra, part 2 Ming-Ching Luoh 2005.9.12 1 / 38 Diagonalization and Spectral Decomposition of a Matrix Optimization 2 / 38 Diagonalization and Spectral Decomposition of a Matrix Also called Eigenvalues

More information

Jianhua Z. Huang, Haipeng Shen, Andreas Buja

Jianhua Z. Huang, Haipeng Shen, Andreas Buja Several Flawed Approaches to Penalized SVDs A supplementary note to The analysis of two-way functional data using two-way regularized singular value decompositions Jianhua Z. Huang, Haipeng Shen, Andreas

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Quaderni di Dipartimento. Estimation Methods in Panel Data Models with Observed and Unobserved Components: a Monte Carlo Study

Quaderni di Dipartimento. Estimation Methods in Panel Data Models with Observed and Unobserved Components: a Monte Carlo Study Quaderni di Dipartimento Estimation Methods in Panel Data Models with Observed and Unobserved Components: a Monte Carlo Study Carolina Castagnetti (Università di Pavia) Eduardo Rossi (Università di Pavia)

More information

Estimation of random coefficients logit demand models with interactive fixed effects

Estimation of random coefficients logit demand models with interactive fixed effects Estimation of random coefficients logit demand models with interactive fixed effects Hyungsik Roger Moon Matthew Shum Martin Weidner The Institute for Fiscal Studies Department of Economics, UCL cemmap

More information

arxiv: v1 [math.na] 1 Sep 2018

arxiv: v1 [math.na] 1 Sep 2018 On the perturbation of an L -orthogonal projection Xuefeng Xu arxiv:18090000v1 [mathna] 1 Sep 018 September 5 018 Abstract The L -orthogonal projection is an important mathematical tool in scientific computing

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5. VISCOSITY SOLUTIONS PETER HINTZ We follow Han and Lin, Elliptic Partial Differential Equations, 5. 1. Motivation Throughout, we will assume that Ω R n is a bounded and connected domain and that a ij C(Ω)

More information

Delta Theorem in the Age of High Dimensions

Delta Theorem in the Age of High Dimensions Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high

More information

Estimation of Random Coefficients Logit Demand Models with Interactive Fixed Effects

Estimation of Random Coefficients Logit Demand Models with Interactive Fixed Effects Estimation of Random Coefficients Logit Demand Models with Interactive Fixed Effects Hyungsik Roger Moon Matthew Shum Martin Weidner First draft: October 2009; This draft: October 4, 205 Abstract We extend

More information

Spectral inequalities and equalities involving products of matrices

Spectral inequalities and equalities involving products of matrices Spectral inequalities and equalities involving products of matrices Chi-Kwong Li 1 Department of Mathematics, College of William & Mary, Williamsburg, Virginia 23187 (ckli@math.wm.edu) Yiu-Tung Poon Department

More information

Nonparametric panel data models with interactive fixed effects

Nonparametric panel data models with interactive fixed effects Nonparametric panel data models with interactive fixed effects Joachim Freyberger Department of Economics, University of Wisconsin - Madison November 3, 2012 Abstract This paper studies nonparametric panel

More information

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based

More information

Section 3.9. Matrix Norm

Section 3.9. Matrix Norm 3.9. Matrix Norm 1 Section 3.9. Matrix Norm Note. We define several matrix norms, some similar to vector norms and some reflecting how multiplication by a matrix affects the norm of a vector. We use matrix

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

A Factor Analytical Method to Interactive Effects Dynamic Panel Models with or without Unit Root

A Factor Analytical Method to Interactive Effects Dynamic Panel Models with or without Unit Root A Factor Analytical Method to Interactive Effects Dynamic Panel Models with or without Unit Root Joakim Westerlund Deakin University Australia March 19, 2014 Westerlund (Deakin) Factor Analytical Method

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Lecture 6: Dynamic panel models 1

Lecture 6: Dynamic panel models 1 Lecture 6: Dynamic panel models 1 Ragnar Nymoen Department of Economics, UiO 16 February 2010 Main issues and references Pre-determinedness and endogeneity of lagged regressors in FE model, and RE model

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Matrix estimation by Universal Singular Value Thresholding

Matrix estimation by Universal Singular Value Thresholding Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric

More information

Estimation of Random Coefficients Logit Demand Models with Interactive Fixed Effects

Estimation of Random Coefficients Logit Demand Models with Interactive Fixed Effects Estimation of Random Coefficients Logit Demand Models with Interactive Fixed Effects Hyungsik Roger Moon Matthew Shum Martin Weidner First draft: October 2009; This draft: September 5, 204 Abstract We

More information

Lecture 10: Panel Data

Lecture 10: Panel Data Lecture 10: Instructor: Department of Economics Stanford University 2011 Random Effect Estimator: β R y it = x itβ + u it u it = α i + ɛ it i = 1,..., N, t = 1,..., T E (α i x i ) = E (ɛ it x i ) = 0.

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2 1. You are not allowed to use the svd for this problem, i.e. no arguments should depend on the svd of A or A. Let W be a subspace of C n. The

More information

Program Evaluation with High-Dimensional Data

Program Evaluation with High-Dimensional Data Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Panel Threshold Regression Models with Endogenous Threshold Variables

Panel Threshold Regression Models with Endogenous Threshold Variables Panel Threshold Regression Models with Endogenous Threshold Variables Chien-Ho Wang National Taipei University Eric S. Lin National Tsing Hua University This Version: June 29, 2010 Abstract This paper

More information

Short T Panels - Review

Short T Panels - Review Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN

SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN A. BELLONI, D. CHEN, V. CHERNOZHUKOV, AND C. HANSEN Abstract. We develop results for the use of LASSO and Post-LASSO

More information

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX September 2007 MSc Sep Intro QT 1 Who are these course for? The September

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

Estimation of random coefficients logit demand models with interactive fixed effects

Estimation of random coefficients logit demand models with interactive fixed effects Estimation of random coefficients logit demand models with interactive fixed effects Hyungsik Roger Moon Matthew Shum Martin Weidner The Institute for Fiscal Studies Department of Economics, UCL cemmap

More information

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018

More information

Applied Health Economics (for B.Sc.)

Applied Health Economics (for B.Sc.) Applied Health Economics (for B.Sc.) Helmut Farbmacher Department of Economics University of Mannheim Autumn Semester 2017 Outlook 1 Linear models (OLS, Omitted variables, 2SLS) 2 Limited and qualitative

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS Olivier Scaillet a * This draft: July 2016. Abstract This note shows that adding monotonicity or convexity

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

The exact bias of S 2 in linear panel regressions with spatial autocorrelation SFB 823. Discussion Paper. Christoph Hanck, Walter Krämer

The exact bias of S 2 in linear panel regressions with spatial autocorrelation SFB 823. Discussion Paper. Christoph Hanck, Walter Krämer SFB 83 The exact bias of S in linear panel regressions with spatial autocorrelation Discussion Paper Christoph Hanck, Walter Krämer Nr. 8/00 The exact bias of S in linear panel regressions with spatial

More information

Econometric Methods for Panel Data

Econometric Methods for Panel Data Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Introduction to Numerical Linear Algebra II

Introduction to Numerical Linear Algebra II Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in

More information

Matrix Completion Methods. for Causal Panel Data Models

Matrix Completion Methods. for Causal Panel Data Models Matrix Completion Methods for Causal Panel Data Models Susan Athey, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, & Khashayar Khosravi (Stanford University) NBER Summer Institute Cambridge, July 27th,

More information