Nonparametric Instrumental Variables Identification and Estimation of Nonseparable Panel Models

Size: px

Start display at page:

Download "Nonparametric Instrumental Variables Identification and Estimation of Nonseparable Panel Models"

Deborah Nash
5 years ago
Views:

1 Nonparametric Instrumental Variables Identification and Estimation of Nonseparable Panel Models Bradley Setzler December 8, 2016 Abstract This paper considers identification and estimation of ceteris paribus effects of continuous regressors in nonseparable panel models with instrumental variables. From the insight that the passing of time creates a triangular structure in which shocks realized in the future are excluded at present, a novel recursive control function is shown to control for persistence in the unobservables. The recursive control function achieves nonparametric identification for any number of time periods and endogenous regressors without restrictions on the joint dependence of unobservables over time or time homogeneity, but requires that the instruments satisfy a full support condition. Then, it is shown that a semiparametric version of the recursive control function identifies the model under semiparametric shape restrictions even if the instruments have small support. Semiparametric estimators based on quantile regression are introduced that do not suffer a curse of dimensionality and perform well in small samples. In a novel empirical application to the population of matched firms and workers in Norway, the ceteris paribus elasticities of firm production with respect to capital and labor are estimated without separability of productivity shocks from capital or labor. PhD Candidate, Department of Economics, University of Chicago, bradley.setzler@gmail.com. I acknowledge the support of the NSF Graduate Research Fellowship and the NIH National Institute on Aging Predoctoral Fellowship. I am grateful to Stéphane Bonhomme, Magne Mogstad, Azeem Shaikh, and Alex Torgovitsky for valuable input and guidance. 1

2 JEL classification: C14, C33, C36, C51 Keywords: Control variables, Instrumental variables, Nonseparable model, Nonparametric identification, Panel data, Production function Contents 1 Introduction 3 2 Illustration of the Identification Approach Motivation from Heterogeneous Firm Productivity Illustration of the Identification Approach Discussion and Preview of General Results Nonparametric Identification Triangular Panel Model Identifying Assumptions Identification Results Semiparametric Identification and Estimation Random Coefficient and Markov Chain Assumptions Identification with Small-support Instruments Practical Estimators and Monte Carlo Study Data and Empirical Results Norwegian Tax Records on Firms and Workers Empirical Results Conclusions 26 References 27 2

3 A Appendix 30 A.1 Proof of Theorem A.2 Proof of Theorem A.3 Proof of Theorem A.4 Proof of Theorem Introduction This paper is concerned with the identification of the ceteris paribus effect of an input X t on an output Y t at time t. Such an effect is challenging to identify when unobservable determinants of Y t depend nonseparably on unobservable determinants of X t. An existing literature achieves identification of this effect by assuming that the distribution of nonseparable unobservables and their relationship to X t and Y t are constant over time, which Chernozhukov et al. (2013, 2015) refer to as time being randomly assigned or time-homogeneity. This assumption is satisfied when all dependence over time (persistence) in the nonseparable unobservables is due to time-invariant unobservables (fixed effects), which I will refer to as the fixed effects assumption. The fixed effects assumption is central to identification results in nonseparable panel models by Altonji and Matzkin (2005), Bester and Hansen (2012), Graham and Powell (2012), and Chernozhukov et al. (2013, 2015). However, the fixed effects assumption rules out many common persistence structures, including first-order Markov chains (e.g., the random walk) and, more generally, k th -order Markov chains (e.g., the k th - order autoregressive process). This literature also rules out impulse responses, that is, ceteris paribus effects of X t or the unobservables at t on Y t, t > t. By contrast, this paper considers the case in which the relationship between X t, Y t, and the unobservables changes over time: new unobservables arrive at t, and both X t and Y t depend on new as well as lagged unobservables which arrived prior to t. New and lagged unobservables have unrestricted dependence, so persistence may be due to fixed effects, random walks, k th -order autoregressions, or any other joint distribution of unobservables 3

4 over time. Furthermore, X t and unobservables at t may affect Y t, t > t, allowing for impulse responses. Any of these relationships may vary with time without restrictions. The model may grow in complexity over time, as more observables and unobservables accumulate. While the weak restrictions considered in this paper violate time-homogeneity in general and the fixed effects assumption in particular, this paper achieves identification using the triangularity generated by time: unobservables that arrive after t are excluded from the relationship between X t and Y t. Results from the literature on the identification of triangular models in cross-sectional contexts using instrumental variables, especially results by Imbens and Newey (2009), are extended to identify the triangular panel model using a recursive control function introduced here that controls for the persistence in unobservables over time. Identification is achieved for any dimension of inputs X t and any number of time periods, as the approach simplifies to existing cross-sectional identification results when there is only one time period. The outcome models identified in this paper include those identified in the existing literature cited above, as shown explicitly through examples in the text. However, like the related cross-sectional literature, my approach requires assumptions on the input equations, such as monotonic responses and the availability of instrumental variables, which are not required in the literature that imposes the fixed effects assumption. My identification approach also requires observing the beginning time period of the process, so that there exists an observed time period at which there are no lagged unobservables to consider. Identification from the nonparametric recursive control function introduced here places strong demands on the data, as it requires that the initial time period be observed and instrumental variables that satisfy a full support condition be included in the data. Any corresponding estimator that controls for all lagged unobservables will necessarily suffer a curse of dimensionality, as the number of lagged unobservables grows with time. As a practical solution, I show that a semiparametric version of the control function identifies the nonseparable panel model under shape restrictions that maintain the qualitative features of interest even when the instruments have small support and without a curse of dimensionality. I do so by extending to a panel context the cross-sectional insight of Ma and Koenker 4

5 (2006), Jun (2009), and, especially, Masten and Torgovitsky (2016), and by imposing that unobservables satisfy the k th -order Markov assumption, where k is finite. In small-sample Monte Carlo studies with binary instruments, it is shown that the estimator suffers little bias, while other potential estimators are shown to be severely biased and the biases become worse over time. Note that, in both the nonparametric and semiparametric approaches, identification is proven separately for the case in which lagged unobservables in the outcome equation also enter the input equation, and for the case in which measurement error in the output equation is of infinite dimension. In an empirical application, I study the productivity of capital and labor in the population of firms in Norway using a 12-year panel of administrative balance sheet data matched to tax records for the population of workers in Norway. The existing literature on firm productivity under endogeneity assumes that the elasticities of firm production with respect to capital and labor are constant across firms a property inherent to the traditional Cobb-Douglas specification of the firm production function (Marschak and Andrews, 1944). I relax this assumption by allowing for these elasticities to depend nonseparably on unobserved firm productivity, so that firm productivity shocks may be labor-augmenting, capital-augmenting, or both. Like Olley and Pakes (1996) and the related literature, I control flexibly for unobserved productivity using monotonicity of the input equations. Unlike Olley and Pakes (1996) and the related literature, I permit nonseparability of productivity shocks, allow for productivity shocks to have greater than first-order Markov dependence, and allow that the firm imperfectly observes its own productivity shock. 2 Illustration of the Identification Approach 2.1 Motivation from Heterogeneous Firm Productivity To introduce notation, motivate the relaxation of time-homogeneity, and introduce the empirical context, consider the model of Olley and Pakes (1996), but adding heterogeneity to the production function as well as the information set. For simplicity, ignore the depreciation 5

6 of capital, the exit option of firms, and exogenous covariates. Denote Y t as revenues, X t as the capital stock, and U t as unobserved productivity, which is assumed to be continuouslydistributed and scalar. The firm s production function is, Y t = g t (X t, U t, ɛ t ). (2.1) where ɛ t denotes iid measurement error, which may be vector-valued. Combining equations (1) and (3) from Olley and Pakes (1996), the profit-maximizing firm solves, V t (X t, U t ) = max X t+1 E [ Y t C t (X t+1 X t, Z t ) + βv t+1 (X t+1, U t+1 ) Jt ], (2.2) where C t is the cost function, Z t is the price of capital, β is the discount rate, and J t is the information set. The solution to the firm s problem is, X t+1 = h t (X t, Z t, J t ). (2.3) which is equation (5) from Olley and Pakes (1996) but with a general information set. Consider three cases to motivate the identification challenge addressed by this paper that cannot be addressed by existing approaches. For the first case, suppose U t follows a first order Markov chain and U t J t. Then, (2.1) and (2.3) have the form, Y t = g t (X t, U t, ɛ t ), X t = h t 1 (X t 1, Z t 1, U t 1 ), (2.4) which is equivalent to the model of Olley and Pakes (1996) when g t has the Cobb-Douglas specification and Z t is constant across firms. The identification approach of Olley and Pakes (1996), Levinsohn and Petrin (2003), and a related literature uses the insight that the only unobservable in the output equation that may depend on X t, U t, also appears in the input equation for X t+1. This motivates the isolation of U t from an input equation so that it can be controlled as if it were an observable when estimating the output equation. In practice, the equations for optimal investment or material expenditure at t are often used instead of X t+1 to isolate U t ; see the survey by Ackerberg et al. (2015). 6

7 For the second case, suppose U t / J t but V t J t 1, where V t is a signal about U t. For example, V t = U t + ζ t, where ζ t is an iid measurement error. Then, (2.1) and (2.3) become, Y t = g t (X t, U t, ɛ t ), X t = h t 1 (X t 1, Z t 1, V t ). (2.5) This differs from the model of Olley and Pakes (1996) in that the unobservable in the output equation that may depend on X t, U t, is not identical to the unobservable that can be isolated from the input equation for X t+1, V t+1, so the identification approach from this literature does not apply. Instead, (2.5) is the triangular model developed in a cross-sectional context by Chesher (2003) and Imbens and Newey (2009), where the price of capital, Z t, is now interpreted as an excluded instrument. Briefly, if h t 1 is monotonic in V t and Z t 1 (V t, U t ), then (2.5) satisfies the key identifying assumptions of Imbens and Newey (2009) when Z t 1 also satisfies a full-support condition, and those of Torgovitsky (2015) when ɛ t is excluded from g t and Z t varies at each value in the support of X t. For the final case, consider again the second case, but allowing the process U t to be a Markov chain of unrestricted order (k = ). The firm s solution is, Y t = g t (X t, U t, ɛ t ), X t = h t 1 (X t 1, Z t 1, V 1, V 2,..., V t ), (2.6) where the solution includes all lags of V t because, e.g., V 1 is a predictor of U 1 and U 1 is a predictor of U t when k =. (2.6) differs from the model of Chesher (2003) and Imbens and Newey (2009) in that there are multiple unobservables in the input equation for X t. If U t were a fixed effect, i.e., U t = U, t, then the identification approach of Chernozhukov et al. (2013, 2015) would apply. Otherwise, this model has not previously been identified without separability, and is identified in this paper. This paper also identifies more general cases that are difficult to motivate in the firm production model but may be relevant in other contexts, e.g., allowing V t, U t, and X t to appear in the equation for Y t, t > t. 7

8 2.2 Illustration of the Identification Approach Here, I consider a correlated random coefficients (Heckman and Vytlacil, 1998) specification of the firm productivity model in (2.6). I show identification in two time periods with a binary instrument to illustrate the key insight of this paper in a simplified context. Omitting the measurement error ɛ t for simplicity, a random coefficients specification of the firm production function in (2.1) is, Y t = A t (U t ) + B t (U t )X t, (2.7) where A t = U t + α t, B t = U t + β t, U t is an unobserved random variable normalized to have mean zero, and α t > 0, β t > 0 are constants. The parameter of interest is the average marginal effect of X t on Y t, defined as E[B t ], which equals β t since U t has mean zero. If X t U t, then linear regression of Y t on a constant and X t identifies β t as the slope on X t (Amemiya, 1985). Formally, if X t U t, the OLS estimator of β t has probability limit, Cov (Y t, X t ) Var (X t ) = Cov (U t + α t + (U t + β t )X t, X t ) Var (X t ) = β t, (2.8) where the second equality uses that Cov (U t, X t ) = 0 and the third equality uses that, Cov ((U t + β t )X t, X t ) = E[(U t + β t )X 2 t ] E[(U t + β t )X t ]E[X t ] = β t Var(X t ), (2.9) since E[U t X t ] = 0 and E[U t X 2 t ] = 0. The identification challenge is that X t depends on U t. However, suppose there exists a control function R t satisfying the property that X t U t R t = r t. Then, if Var(X t R t = r t ) > 0, OLS local to R t identifies β t, as, Cov (Y t, X t R t = r t ) Var (X t R t = r t ) = Cov (U t + α t + (U t + β t )X t, X t R t = r t ) Var (X t R t = r t ) = β t, (2.10) since Cov (U t, X t R t = r t ) = 0, E[U t X t R t = r t ] = 0, and E[U t Xt 2 R t = r t ] = 0. In order to recover a control function R t satisfying X t U t R t = r t, consider the equations that determine X t. Consider Z t {0, 1}, and let V 1 and V 2 have any continuous joint distribution. A random coefficients specification of (2.6) at t = 1 is, X 1 = τ 1 (V 1 ) + θ 1 (V 1 )Z 0 (2.11) 8

9 where τ 1 (V 1 ) = V 1 + τ 1, γ 1 (V 1 ) = V 1 + γ 1, and θ 1 (V 1 ) = V 1 + θ 1, and a random coefficients specification of (2.6) at t = 2 is, X 2 = τ 2 (V 1, V 2 ) + γ 2 (V 1, V 2 )X 1 + θ 2 (V 1, V 2 )Z 1. (2.12) where τ 2 (V 2 ) = V 2 + τ 2 + ψ 2 (V 1 ), γ 2 (V 2 ) = V 1 + V 2 + γ 2, and θ 2 (V 2 ) = V 1 + V 2 + θ 2. To identify β 1, notice that the rank of X 1 given Z 0 is equal to the rank of V 1 given Z 0 because X 1 is a strictly increasing function of V 1 given Z 0. Formally, denoting F A B (a b) Pr(A a B = b) for any random variables A and B, the rank of X 1 given Z 0 is defined by R 1 F X1 Z 0 (X 1 Z 0 ). The monotonicity of the input equation in V 1 implies R 1 = F V1 Z 0 (V 1 Z 0 ). Suppose also that Z 0 (V 1, U 1 ) X 0. Then, the conditioning on Z 0 can be relaxed, so that R 1 = F V1 (V 1 ). Thus, the data (X 1, Z 0 ) is sufficient information to recover the unconditional rank of V 1. Notice that V 1 being continuously-distributed implies that F 1 V 1 exists. Fixing R 1 = r 1, there exists v 1 such that v 1 = F 1 V 1 (r 1 ). Finally, X 1 U 1 V 1 = v 1 because all dependence between X 1 and U 1 is due to V 1, and conditioning on R 1 = r 1 is equivalent to conditioning on V 1 = v 1, so X 1 U 1 R 1 = r 1, and β 1 is identified by (2.10), subject to the support condition that Var(X 1 R 1 = r 1 ) > 0, r 1 [0, 1]. See Masten and Torgovitsky (2016) for additional details about and extensions to this result. The previous result is known, while the following result is, to my knowledge, novel. In order to identify β 2, notice that X 2 is a monotonic function of V 2 given not only the observables X 1 and Z 1, but also V 1, so that F X2 V 1,X 1,Z 2 (X 2 V 1, X 1, Z 2 ) = F V2 V 1,X 1,Z 2 (V 2 V 1, X 1, Z 2 ). While V 1 is unobservable, it was shown above that conditioning on V 1 = v 1 is equivalent to conditioning on R 1 = r 1. Defining R 2 F X2 R 1,X 1,Z 2 (X 2 R 1, X 1, Z 2 ), it follows that R 2 = F V2 V 1,X 1,Z 2 (V 2 V 1, X 1, Z 2 ). Since X 1 depends only on V 1 and Z 1, and V 1 is already conditioned on, X 1 can be replaced with Z 1 as R 2 = F V2 V 1,Z 1,Z 2 (V 2 V 1, Z 1, Z 2 ). Suppose that (Z 1, Z 2 ) (U 2, V 2 ) V 1. Conditioning on (Z 1, Z 2 ) can be relaxed, so R 2 = F V2 V 1 (V 2 V 1 ). Thus, (X 1, X 2, Z 1, Z 2 ) is sufficient data to recover the conditional rank of V 2 given V 1. Assuming V 2 is continuously-distributed at each value in the support of V 1, then F 1 V 2 V 1 exists. Fixing (R 1, R 2 ) = (r 1, r 2 ), there exists (v 1, v 2 ) such that v 1 = F 1 V 1 (r 1 ) and v 2 = F 1 V 2 V 1 (r 2 v 1 ). Thus, conditioning jointly on (V 1, V 2 ) = (v 1, v 2 ) is equivalent to conditioning 9

10 jointly on (R 1, R 2 ) = (r 1, r 2 ), so (X 1, X 2 ) U 2 (V 1, V 2 ) implies (X 1, X 2 ) U 2 (R 1, R 2 ), so that (R 1, R 2 ) provides a (vector-valued) control function such that β 2 is identified by (2.10), subject to the support condition that Var(X 2 R 1 = r 1, R 2 = r 2 ) > 0, (r 1, r 2 ) [0, 1] 2. This is the insight of this paper: the recursively-constructed vector of control functions (R 1, R 2 ) permits identification of the effect of X t on Y t with unrestricted persistence of the nonseparable unobservables over time, under monotonicity restrictions on the unobservables and independence and support conditions on the instruments. 2.3 Discussion and Preview of General Results Before proceeding to general results, it is useful to briefly discuss three properties of the identification results above that will be discussed in greater detail in a general context below. First, the identification results above did not impose any parametric restrictions on the joint distribution of the unobservables over time. For example, it was not necessary to assume that the dependence in V 1 and V 2 is due to a fixed effect, as required by Chernozhukov et al. (2013, 2015). The fixed effect assumption is clearly a special case, e.g., if V t = V + η t, where η t is iid, the above identification arguments for β 1 and β 2 remain valid. Second, this identification approach allows for the number of nonseparable unobservables to be greater than the number of instruments in the input equation, and does not impose independence of all unobservables from all instruments over time. In (2.12), there are two nonseparable unobservables, V 1 and V 2, but only one instrument, Z 1, and Z 1 may be arbitrarily dependent on (V 1, U 1 ). It is shown below that, with T time periods, identification may be achieved even when the dimension of the unobservables is many times greater than that of the instruments, and the instruments and unobservables may evolve jointly over time. Third, identification at t = 2 requires that one of the unobservables, V 2, enters the input equation monotonically and satisfies a conditional independence condition with respect to the instrument. However, other unobservables in the input equation do not need to satisfy monotonicity and may be correlated with the instrument at t = 2. For example, in (2.12), X 2 V 2 = 1 + X 1 + Z 1, which is always positive, while X 2 V 1 = ψ (V 1 ) + X 1 + Z 1, which may 10

11 be positive for some values of V 1 and negative for others. This may be less objectionable than the assumption that there is only one unobservable and it enters monotonically; see the discussion by Hoderlein and Mammen (2007) and Kasy (2011). 3 Nonparametric Identification 3.1 Triangular Panel Model This subsection presents the notation and equations required for the general statement of the triangular panel model, which is the most general model identified in this paper. Time is indexed by t = 1, 2,..., T, where T is the final observed time period. We use uppercase letters to denote random variables, lower-case letters to denote constants, and bold font to denote vectors. For any random variable A t, denote the history of A t by A (t) (A 1, A 2,..., A t ). For convenience, denote A (0). The CDF of A t is denoted F At and the conditional CDF of A t given B t is denoted F At B t. The observable random variables are defined as follows: Let X t,k denote the k th scalar input, X t (X t,1, X t,2,..., X t,k ) denote the vector of inputs at time t, Y t denote the scalar output, Z t denote the vector of instruments at time t, and W denote the vector of exogenous observable heterogeneity. The observed data is (Y (T ), W, X (T ), Z (T ) ). 1 The unobservable random variables are defined as follows: Let V t,k denote the scalar unobservable heterogeneity associated with input X t,k, and V t (V t,1, V t,2,..., V t,k ) denote the vector of input heterogeneity at time t. Let U t denote the scalar unobservable heterogeneity associated with output Y t. Let ɛ t denote measurement error associated with Y t, with possibly infinite dimension. The triangular panel model is defined by, X t,k = h t,k ( W, X (t 1), Z (t), U (t 1), V (t 1), V t,k ), k = 1, 2,..., K Y t = g t ( W, X (t), U (t), V (t), ɛ t ), (3.1) 1 All identification results will hold if there are multiple outputs or if the exogenous covariates vary over time in a known way (e.g., age increases by one unit in each time period), but I focus on a single output Y t and time-invariant exogenous observables W for simplicity. 11

12 where the conditional CDF of the history of the unobservables F V (T ),U (T ) X (t 1),Z (t 1),V (t 1),U (t 1) is unrestricted. 2 The set of equations for X t are referred to as the first stage or input equations at t, and the equation for Y t as the second stage or output equations at t. Remark 2.1: Notice that the model in the initial period t = 1 reduces to, X 1,k = h 1,k (W, Z 1, V 1,k ), k = 1, 2,..., K Y 1 = g 1 (W, X 1, U 1, V 1, ɛ 1 ), (3.2) If there is only one endogenous variable (K = 1), this is the triangular model identified by Imbens and Newey (2009), so the triangular model of Imbens and Newey (2009) can be thought of as defining the initial conditions for the triangular panel model considered here. Remark 2.2: There are more first stage unobservables than endogenous inputs if t 2. In particular, 1+(K +1)(t 1) unobservables appear in h t,k, which is greater than K if t 2 and increases by K + 1 unobservables in each time period. The second stage similarly adds K + 1 endogenous unobservables in each period, as well as the exogenous unobservables ɛ t, which may be infinite in dimension. Remark 2.3: Suppose that (i) the output equation is memoryless in the sense that no lagged observables or unobservables are included, (ii) V t = V 1, t, so that V t is a fixed effect, and (iii) the only time-varying unobservables included in the output equation are purely transitory (i.e., U t = 0, t). Then, the output equation of (3.1) reduces to, Y t = g t (W, X t, V 1, ɛ t ). (3.3) which is the nonseparable fixed effects model studied by Altonji and Matzkin (2005), Chernozhukov et al. (2013, 2015). Thus, the output model in (3.1) is strictly more general than 2 In Skorokhod s representation, unobservables evolve according to, ( ) V t,k = ζt k X (t 1), Z (t 1), V (t 1), U (t 1), νt k, k = 1, 2,..., K, ( ) U t = ζt U X (t 1), Z (t 1), V (t 1), U (t 1), νt U, where νt k ( V (t 1), U (t 1)) Uniform(0, 1), k = 1, 2,..., K, νt U ( V (t 1), ɛ (t 1)) Uniform(0, 1), the functions (ζt 1, ζt 2,..., ζt K, ζt U ) and the copula of ( ) νt 1, νt 2,..., νt K, νt U is unrestricted. This representation makes explicit that unobservables at time t may depend generally on inputs and instruments from previous time periods, and may follow any persistent process, such as a random walk or moving average process. 12

13 the nonseparable fixed effects output model. 3.2 Identifying Assumptions For notational convenience, define the historical conditioning set at t by, M t ( X (t 1), Z (t 1), U (t 1), V (t 1)). (3.4) Identification in this section relies on the conditional rank device (or control function), R t,k F Xt,k W,Z t,m t (X t,k W, Z t, M t ), (3.5) which extends the conditional rank device of Imbens and Newey (2009) to a panel environment. The identifying assumptions imposed on the triangular panel model are as follows: A.1 (U t, V t ) (Z t, W) M t ; A.2 Conditional on M t, V t,k is a continuously-distributed scalar with conditional distribution, F Vt,k M t, that is strictly increasing across its support; A.3 h t,k is strictly monotonic in V t,k with probability one; A.4 Support ( R (t) W = w, X (t) = x (t)) = [0, 1] tk, (w, x (t) ) Support ( W, X (t)) ; A.5 ɛ t ɛ t, t t, and ɛ t (W, M T +1 ); and, A.6 E[ Y t ] <. A.1 extends assumption (i) of Theorem 1 by Imbens and Newey (2009) to a panel environment. Because it only requires contemporaneous independence at time t, it permits V t to depend generally on any lagged values of the instruments. For example, in the motivating model of firms and workers discussed above, this allows that labor productivity responds to changes in capital costs, e.g., through on-the-job training. A.2 extends assumption (ii) of Theorem 1 by Imbens and Newey (2009) to a panel environment. This assumptions means that V t has a smooth conditional distribution, conditional 13

14 on lagged values of observables and unobservables. This restricts us from considering models in which the error follows a discrete Markov chain with finite support, which is often assumed in empirical macroeconomic models. A.3 extends model equation 2.2 by Imbens and Newey (2009) to a panel environment. The strength of first stage monotonicity assumptions in cross-sectional contexts has been discussed by Hoderlein and Mammen (2007) and Torgovitsky (2015), and a test has been developed by Hoderlein et al. (2016). However, this restriction is much less severe here as it allows for many unobservables in the first stage that do not satisfy monotonicity, U (t 1) and V (t 1), while only one unobservable must satisfy monotonicity, V t,k. A.4 extends assumption 2 by Imbens and Newey (2009) to a panel environment. It requires that the instrument has sufficient variation to trace out the full support of the conditional distribution of V t,k at each time period. A.4 is testable and depends crucially on the instruments available in the data. Full support assumptions of this form are often found not to hold empirically, so the next section of this paper presents semiparametric model restrictions under which A.4 is not necessary, but A.4 is maintained in this section. A.5 defines explicitly what it means for ɛ t to be measurement error in Y t : it is strictly independent of all observable and unobservable model components other than Y t, while A.6 ensures that the outcome has a finite first moment. Lastly, we introduce two assumptions, where only one assumption will be imposed at a time, and substantially different identification results are achieved depending on which of the two assumptions is imposed for each t: B.1 U (t) is excluded from h t,k, k; and, B.2 g t excludes ɛ t and is strictly monotonic in U t with probability one. B.1 allows for second stage unobservables to have infinite dimension, but imposes that second stage unobservables are excluded from the first stage. In practice, this will allow us to integrate out the second stage unobservables in order to identify averages of g t, without needing to account for second stage unobservables in the first stage. 14

15 B.2 is analogous to the restriction imposed on the model of Imbens and Newey (2009) by Torgovitsky (2015) in order to relax the full-support condition on the ranks. However, it is less objectionable, as it allows for many unobservables in the output equation, ( U (t), V (t)), with g t only required to be monotonic in one of these unobservables, U t. 3.3 Identification Results The nonparametric identification results of this paper are now presented in two theorems: Theorem 1 (Nonparametric identification of the triangular panel model with infinite-dimensional second stage unobservables). Suppose assumptions A.1 -A.6 and B.1 hold. Then, for each t, (a) R t,k = F Vt,k M t (V t,k M t ), k; (b) (U t, ɛ t ) X t R (t), M t ; and, (c) E[E[Y t W = w, X (t) = x (t), R (t) ]] = E[g t (w, x (t), V (t), U (t), ɛ t )]. Proof. See Appendix A.1. The proof is carried out by induction. First, it is established at time t = 1 following Imbens and Newey (2009), recalling that the model in the initial time period t = 1 differs only from the model identified by Imbens and Newey (2009) in that there are multiple endogenous inputs (see Remark 2.1). Second, it is established that the theorem holds for arbitrary t 2 if the theorem holds for t 1 using the conditional rank device. The second step relies crucially on the exclusion of U (t) from the first stage, as this allows for the isolation of V t,k through the one-to-one mapping between V t,k and X t,k for each t, k, and the isolation of each V t,k permits controlling for V (t 1) in the second stage to solve the endogeneity problem. By contrast, the next theorem identifies similar objects when including U (t) in the first stage but excluding the infinite-dimensional exogenous unobservable ɛ t from the second stage: 15

16 Theorem 2 (Nonparametric identification of the triangular panel model with output unobservables included recursively in the input equations). Suppose assumptions A.1 -A.6 and B.2 hold. Furthermore, define, Q t F Yt W,V t,m t (Y t W, V t, M t ), (3.6) Then, (a) R t,k = F Vt,k M t (V t,k M t ), k, and Q t = F Ut V t,m t (U t V t, M t ); (b) (U t X t ) (R t, M t ); and, (c) E[E[Y W = w, X (t) = x (t), R (t), Q (t) ]] = E[g t (w, x (t), V (t), U (t) )]. Proof. See Appendix A.2. The proof is very similar to that of Theorem 1, with the additional challenge of proving the second statement in Theorem 2(a) by induction. 4 Semiparametric Identification and Estimation 4.1 Random Coefficient and Markov Chain Assumptions There are three reasons why directly implementing the sample counterpart to the nonparametric estimator in Theorem 1(a) or Theorem 2(a) for the input equations and Theorem 1(c) or Theorem 2(c) for the output equations is impractical. First, the estimator suffers from a cross-sectional curse of dimensionality: even at time t = 1, there may be many controls to include in the nonparametric regression, as discussed by Imbens and Newey (2009) and Torgovitsky (2015). Second, the estimator suffers from a panel curse of dimensionality: because new conditional rank devices, instruments, and inputs are added to the model over time as part of the histories of these variables, the number of controls to include in the nonparametric regression increases rapidly over time. Third, the estimator suffers from unavailability of full 16

17 support instruments: instruments often have small support, so an identification approach that relies on large support instruments is not always useful. In order to address the cross-sectional curse of dimensionality as well as the unavailability of full support instruments, this section follows the approach of Ma and Koenker (2006), Jun (2009), and Masten and Torgovitsky (2016), who impose a semiparametric functional form in both the input and output equations. I extend this to approach to a panel environment: V t,k enters the input equation for X t,k through correlated random coefficients. When B.1 is imposed (so that U t is excluded), ɛ t enters the output equation for Y t through correlated random coefficients; when B.2 is imposed (so that ɛ t is excluded), U t enters the output equation for Y t through correlated random coefficients. The specification may include transformations of the observable variables, such as squared terms or interaction terms, which Masten and Torgovitsky (2016) refer to as derived endogenous variables, in contrast to (untransformed) basic endogenous variables. The panel context has the added complexity that lagged unobservables are included in the basic endogenous variables, and may also be included in the derived endogenous variables. Note that, although this semiparametric restriction maintains nonseparability between contemporaneous unobservables and any basic or derived endogenous variables, it requires us to explicitly specify any interaction between lagged unobservables and any other endogenous variables, or interactions among any observables. The fully nonparametric approach can be thought of as including all possible (infinitely-many) derived endogenous variables, so the semiparametric approach avoids the curse of dimensionality by choosing a finite subset these. To address the panel curse of dimensionality, this section proposes imposing an s th -order Markov-like assumption on the panel model, which I will refer to as the memory limit of order s. The memory limit is modeled in two parts: an exclusion part and a Markov chain part. For the exclusion part, any observables and unobservables at time t s or earlier are excluded from the input and output equations at t. For the Markov chain part, the unobservables are jointly assumed to be a Markov chain of order s. For notational purposes, while the history of any random variable A t is defined as A (t) (A 1, A 2,..., A t ) above, this 17

18 section defines the memory of any random variable A t as A (t,s) (A t s+1, A t s+2,..., A t ), which has s components. For convenience, denote A (t,0). Memory limits are common in panel applications: a first-order Markov chain (such as a random walk) satisfies s = 1, an autoregressive process of order 2 satisfies s = 2, and the nonseparable fixed effects model of Chernozhukov et al. (2015) satisfies s = 0 (memoryless). Combining the assumptions above with the memory limit s, the triangular panel model in (3.1) simplifies to, X t,k = M tπ t,k (V t,k ), k = 1, 2,..., K (4.1) Y t = P tψ t (U t, ɛ t ), where Π t,k and Ψ t are vector-valued random coefficients and the matrices M t and P t include lagged observable and unobservable components, including derived variables, as, M t = m t (W, Z (t,s), U (t 1,s), V (t 1,s) ) (4.2) P t = p t (W, X (t,s), U (t 1,s), V (t 1,s) ) where m t and p t select the basic and derived endogenous variables to include. That M t and P t include unobserved components is part of the identification challenge, and is not involved in the correlated random coefficients literature cited above. For example, suppose K = 1 (so that the k subscript can be omitted), Z t is scalar, and W = (1) (the intercept only). If s = 0 and only basic endogenous variables are selected by m t and p t, the model is, X t = (1, Z t, Z t 1 ) Π t,k (V t ), (4.3) Y t = (1, X t, V t 1, X t V t 1 ) Ψ t (U t, ɛ t ), which is the model considered as an illustrative example in Subsection 2.2. If s = 1 and all interactions between lagged unobservables and current observables are selected, it is, X t = (1, Z t, Z t 1, U t 1, V t 1, Z t V t 1, Z t U t 1 ) Π t,k (V t ), Y t = (1, X t, X t 1, U t 1, V t 1, X t V t 1, X t U t 1 ) Ψ t (U t, ɛ t ), (4.4) which is already a very flexible specification, e.g., (0, X t, 0, 0, 0, V t 1, U t 1 ) Ψ t (U t, ɛ t ) is the marginal effect of X t on Y t, which includes many dimensions of heterogeneity from unobservables, observables, and their interaction. 18

19 Remark 3.1: In (4.1), the model at the initial time period t = 1 reduces to, X 1,k = (W, Z 1 ) Π 1,k (V 1,k ), k = 1, 2,..., K Y 1 = (W, X 1 ) Ψ 1 (ɛ 1 ), (4.5) where U 1 is ignored to simplify notation, which is exactly the model identified by Masten and Torgovitsky (2016). Furthermore, if ɛ 1 is scalar and K = 1, this is exactly the model identified by Ma and Koenker (2006) under additional parametric restrictions and by Jun (2009) without additional parametric restrictions. As a result, the model of Jun (2009) can be thought of as defining the initial conditions if ɛ 1 is scalar and K = 1, and the model of Masten and Torgovitsky (2016) can be thought of as defining the initial conditions otherwise. Remark 3.2: In (4.1), there are more unobservables in the input equation than endogenous inputs in the output equation at t if t 2 and s 1. In particular, if only the basic endogenous variables are included, K +2 unobservables are nonseparable while the remaining s(k + 1) K 1 unobservables are separable from the observables. If desired, interactions between lagged unobservables and endogenous observables can be specified as in (4.4). Remark 3.3: In (4.1), suppose s = 0 (memoryless), ɛ t is excluded from the output equation, and U t = ν +η t, where ν is a fixed effect and η t is a strictly exogenous measurement error. Then, the output equation at t becomes, Y t = (W, X t ) Ψ t (U t ), (4.6) which is the random coefficients panel model considered by Graham and Powell (2012) and is similar to the random location-scale panel model considered by Chernozhukov et al. (2015). 4.2 Identification with Small-support Instruments This subsection first presents identification results under high-dimensional unobservables in the output equation under B.1, then presents identification results permitting lagged unobservables from the output equation to be included in the input equation. Throughout this section, A.1, A.2, A.3, A.5, and A.6 are assumed to hold. The conditional rank 19

20 devices and instrumental support assumption A.4 are modified to take advantage of the functional form restrictions. For identification with high-dimensional unobservables in the output equation, consider the conditional rank device, R t,k = 1 0 1[M tπ t,k (r) X t,k ]dr, (4.7) which makes use of the pre-arrangement operator of Chernozhukov et al. (2010). Furthermore, replace A.4 by, A.4* E[M t M t R t = r] and E[P t P t R t = r] are invertible, for almost every r [0, 1] K. Because Z t is a component of M t, the first part of A.4* requires that there is variation in the instruments at each value of the conditional rank device, a condition that may very well be satisfied by instruments with small support. It is the same as the familiar instrumental relevance condition in linear instrumental variables contexts, but local to R t. The following theorem holds: Theorem 3 (Semiparametric identification of the triangular panel model with infinite-dimensional output equation unobservables). Suppose that assumptions A.1, A.2, A.3, A.4*, A.5, A.6, and B.1 hold. Then, for each t, (a) Rt,k = F V t,k V (t 1)(V t,k V (t 1) ), k; (b) (U t, V t, ɛ t X t ) (W, X (t 1), R (t 1) ); and, (c) E [E[P t P t R t ] 1 E[P t Y t R t ]] = E[Ψ t (U t, V t, ɛ t )]. Proof. See Appendix C. The proof follows by induction. At t = 1, the theorem can be shown following Masten and Torgovitsky (2016), as the model at t = 1 is identical to the model identified by Masten and Torgovitsky (2016). Then, it is shown that the theorem holds at t if it holds at t 1 using similar reasoning to the proof of Theorem 1. In particular, it is shown that M t and P t are recovered from the conditional rank device at t, given M t and P t 1. 20

21 Next, identification results are presented under B.2. To this end, define the conditional rank devices, and, 1 R t,k = 1[M tπ t,k (Q t 1, r) X t,k ]dr, k, (4.8) 0 Q t = 1 0 1[P tψ t (q, R t) Y t ]dq. (4.9) In place of A.4, consider the following assumption: A.4 E[M t M t R t = r] and E[P t P t R t = r] are invertible, for almost every r [0, 1] K, and Ψ t is strictly monotonic in U t with probability one. The invertibility assumptions in A.4 are analogous to those of A.4*, but incorporating the conditional rank devices R t,k and Q t. The following theorem holds: Theorem 4 (Semiparametric identification of the triangular panel model with lagged unobservables from the output equation included in the input equation). Suppose that assumptions A.1, A.2, A.3, A.4, A.5, A.6, and B.2 hold. Then, for each t, (a) R t,k = F V t,k M t (V t,k M t ), k, and Q t = F Ut V t,m t (U t V t, M t ); (b) (U t, V t, ɛ t X t ) (W, X (t 1), R (t 1) ); and, [ ] (c) E E[P t P t R t] 1 E[P t Y t R t] = E[Ψ t (V t, U t )]. Proof. See Appendix D. As with Theorem 3, the theorem can be shown at t = 1 following Masten and Torgovitsky (2016), as the model at t = 1 is identical to the model identified by Masten and Torgovitsky (2016). Then, it is shown that the theorem holds at t if it holds at t 1 using similar reasoning to the proof of Theorem 2. In particular, it is shown that M t and P t are recovered from both the input and out equation conditional rank devices at t, given M t and P t 1. 21

22 4.3 Practical Estimators and Monte Carlo Study Estimators that directly implement the identification approaches of Theorems 3 and 4 are now proposed. These estimators are practical in that they do not suffer a curse of dimensionality and do not rely on the instruments having large support. Then, data is simulated from models satisfying the model assumptions here but violating the assumptions of existing identification approaches. The new estimators are shown to outperform existing approaches when applied to the simulated data even in small samples. Consider the identification results of Theorems 3. At t = 1, Masten and Torgovitsky (2016) show that R 1 can be recovered by applying the linear quantile regression of X 1,k on W, Z 1 to estimate Π 1,k (r) at various ranks r using the method of Koenker and Bassett (1978), and then applying the pre-arrangement operator of Chernozhukov et al. (2010). To extend the implementation of Masten and Torgovitsky (2016) to a panel environment, iteratively replace M t (W, Z (t,s), X (t 1,s), R (t 1,s) ) by M t (W, Z (t,s), X (t 1,s), R (t 1,s) ), and iteratively replace P t (W, X (t,s), R (t 1,s) ) by P t (W, X (t,s), R (t 1,s) ). In particular, the estimator for the conditional rank device at t is, while the estimator for Ψ t (r) is, R t,k = 1 0 1[ M t Π t,k (r) X t,k ]dr. (4.10) ( N ( ) ) 1 ( R t,i r N ( ) ) R t,i r Ψ t (r) = κ P t,i P h t,i κ P t,i Y t. (4.11) h i=1 Because R t has the uniform distribution by construction, the average effect of X t on Y t can be estimated by 1 0 Ψ t (r)dr, where the integral is replaced by a numerical approximation in practice. Together (4.10) and (4.11) define the estimator that directly implements the identification strategy of Theorem 3. Next, to implement the estimator corresponding to Theorem 4, iteratively replace M t (W, Z (t,s), X (t 1,s), Q (t 2,s), R (t 1,s) ) by M t (W, Z (t,s), X (t 1,s), Q (t 2,s), R (t 1,s) ), and iteratively replace P t (W, X (t,s), Q (t 1,s), R (t 1,s) ) by P t (W, X (t,s), Q (t 1,s), R (t 1,s) ). i=1 22

23 Estimators for the conditional rank devices at t are, and, 1 R t,k = 1[ M Π t t,k ( Q t, r) X t,k ]dr. (4.12) Q t = [ P t Ψ t,k (q, R t) Y t ]dq. (4.13) and then apply the estimator for the output equation in (4.11). The conditional rank devices can be implemented using the kernel-weighted approximations, and, R t,k r S K q S 1[ M Π ( t t,k (q, r) X t,k ]κ ( Q ) t q)/h, (4.14) Q t 1[ P Ψ ( t t,k (q, r) Y t ]κ ( R ) t r)/h, (4.15) q S r S K where S is a discrete approximation of the unit-interval. Now, consider a model satisfying the assumptions of Theorem 3 with K = 1, s = 3, and bivariate measurement error in the outcome equation. At time t = 1, specify, X 1 = V 1 + Z 1 V 1, Y 1 = A 1 + B 1 X 1. Next, at time t = 2, specify, X 2 = V 2 + (Z 2 + V 1 )V 2, Y 2 = A 2 + X 2 B 2 + V 1. Lastly, for t = 3, 4,..., T, specify, X t = V t + (Z t + V t 1 )V 2, Y t = A t + X t B t + V t 2 + V t 1. Specify T = 10, A t = V t + ɛ A t, B t = 2(V t + ɛ B t ), ɛ j t iid N (0, 1/10), j = A, B, Z t iid Bernoulli(1/2), and (V 1, V 2,..., V T ) is drawn from the Gaussian copula with covariance 23

24 Table 1: Monte Carlo Study of Estimation Bias Panel A. Output Equation Estimation Bias Bias in Estimator of E[B t ] = 1 t = 1 t = 2 t = 3 t = 5 t = 10 Ordinary Least Squares (0.011) (0.012) (0.012) (0.012) (0.012) Two Stage Least Squares (0.036) (0.050) (0.046) (0.048) (0.048) Cross-sectional Masten & Torgovitsky (0.063) (0.116) (0.192) (0.188) (0.192) New Estimator (0.063) (0.064) (0.059) (0.075) (0.078) Panel B. Recovery of Ranks from Input Equation Correlation between V t and V t t = 1 t = 2 t = 3 t = 5 t = 10 Cross-sectional Masten & Torgovitsky New Estimator Notes: Estimates of bias and correlation are averaged across 100 simulations of size N = 5, 000. The kernel κ is biweight with bandwidth h = Standard errors are based on B = 40 bootstrap draws. matrix where each diagonal element is 1 and each off-diagonal element if 1/10. It follows that E[B t ] = 1, which is the parameter of interest. Table 1, Panel A, demonstrates the success of the new estimator relative to the other potential estimators. OLS and TSLS exhibit large positive biases. The method of Masten and Torgovitsky (2016) applied to the cross-sectional data recovers the true parameter at time t = 1, but has a small bias when t = 2 due to the omission of V t 1 and a larger bias when t > 2 due to the omission of both V t 1 and V t 2. The new estimator is very close to the true parameter at all time periods, as it explicitly accounts for the history of V t. Table 1, Panel B provides additional information on why the new estimator performs better than the cross-sectional approach. The cross-sectional estimator recovers V t at t = 1 but not at t > 1, while the new estimator also recovers V t at t > 1 with little bias. Of particular interest, V t is still recovered with little bias at t =

25 5 Data and Empirical Results 5.1 Norwegian Tax Records on Firms and Workers The empirical context is the population of firms in Norway, observed in a 12 year panel of tax records of annual frequency from 2003 until Each observation-year includes gross revenues, labor costs, intermediate input costs (other than labor costs), accumulated capital (fixed assets), industry code, municipal code, and the year the firm was founded. I form value-added as the difference between gross revenues and intermediate input costs. Furthermore, to permit the identification of a panel process with persistence of at least three periods, I require that all variables are observed for at least three adjacent periods. Because the identification approach requires that the initial period of production be observed, I only consider firms that are founded within the data window. The resulting sample size is 19, 613. To construct the local price of labor, I match the region and industry of the firm to the earnings tax records of all workers in the same region and industry. Assuming that each firm is a price-taker, prices within the industry and region are independent of firmspecific productivity shocks. Furthermore, as the price of labor rises, firms are expected to respond by changing capital investment and labor expenditures. Together, these imply that local prices are both relevant and independent of firm-specific unobservables, which are the conditions required for the local price of labor to serve as a valid instrumental variable. 5.2 Empirical Results There are three empirical questions of interest. First, what is the average ceteris paribus elasticity of production with respect to capital and labor? This is the question addressed by the existing literature, including Olley and Pakes (1996) and Levinsohn and Petrin (2003). Second, do these averages vary over time? Time-variation is ruled out ex ante by pooled models that assume elasticities do not vary with time, including those employed by Levinsohn and Petrin (2003). Third, do the elasticities of production with respect to capital and labor vary with the firm productivity shock? This question is ruled out ex ante by the Cobb- 25

26 Figure 1: Time Variation in Average Elasticities of Capital and Labor OLS Levinsohn Petrin Het. Effects, Year 1 Het. Effects, Year 2 Het. Effects, Year OLS Levinsohn Petrin Het. Effects, Year 1 Het. Effects, Year 2 Het. Effects, Year 3 (a) Capital (b) Labor Douglas specification standard in the literature (Ackerberg et al., 2015). Figure 1 demonstrates that the average elasticity of production with respect to capital is rising over the first three years of firm production. The pooled estimator of Levinsohn and Petrin (2003) successfully recovers average productivity of labor across all years, but misses the year-to-year increase. Figure 2 demonstrates that firm productivity shocks augment the productivity of labor, as higher firm productivity raises the elasticity with respect to labor. By contrast, higher firm productivity does not appear to augment the elasticity with respect to capital. 6 Conclusions This paper has developed an approach to identify ceteris paribus effects of continuous regressors in nonseparable panel models without time homogeneity. From the insight that the passing of time creates a triangular structure in which shocks realized in the future are excluded at present, a novel recursive control function was proven to control for persistence in the unobservables. To overcome the curse of dimensionality and deal with the empirical reality that many instruments have limited support, it was shown that the recursive control function identifies the model under semiparametric restrictions even if the instruments have 26

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018