Appendix A: The time series behavior of employment growth

Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time series behavior of employment growth In this paper I present evidence that sample selection or attrition introduces very little bias into growth rate equations over time periods of approximately five to ten years. In this appendix I present the results of a time series analysis of three sets of firms (those in the sample from 97 to 979, from 976 to 983, and the combined sample from 97 to 983) with some confidence that these results are not very biased by the exclusion of entrants and exiters. In Tables A and A I show the covariance matrix of the logarithm of employment over time for the first two samples of firms, both in levels and in first differences. Note that in both cases, the overall mean for each year has been removed from the data. These tables suggest that the log employment time series has the following characteristics: it has an AR component with a root near one, and possibly a small MA component or higher order AR terms. In addition, the hypothesis that the variance of growth rates is equal across the years can be rejected. Accordingly, I parametrize the process as a standard ARMA model with the variance of the innovation changing over time: ( α L α L ) Y = ( µ L) ε Eε = σ () t t t t Under the assumption of multivariate normality of the ε t, it is possible to estimate the parameters of this process by maximum likelihood and the covariance matrices are a sufficient statistic for the problem. The method by which I perform this estimation is described in Hall (979). Macurdy (98) has shown that these estimates consistent even if the disturbances are not multivariate normal, although the estimated standard errors will no longer be correct. The differenced matrix was also estimated with industry means removed for each year to control for possible industry effects of the oil price shocks in 973-74 and 978-79, but this made little difference, reducing the diagonal elements by about five percent and leaving the off-diagonal elements essentially unchanged. The likelihood function being maximized can be written as follows: T N log L = ( log π + log Ω( θ ) ) tr ( Y ' YΩ ( θ ) ) N is the number of firms, T is the number of time periods, and Y Y and Ω(θ) are the observed covariance matrix of the data and predicted covariance matrix from the model respectively.

Before describing the result of my estimation of the model, I need to say something about the treatment of initial conditions. I have assumed that the process for each firm began at a random time in the past and at a random level, and accordingly, have estimated the initial variance as a free parameter (and, in the case of AR() or ARMA(,), two initial variances and a covariance are free). Justification for this procedure is provided both by Anderson and Hsiao (98) and Macurdy (985). It will not be correct if the unknown initial condition is a fixed constant. It is difficult to conceive of an experiment with these data that would distinguish the two possibilities, although the smooth lognormality of the size distribution gives me some confidence that the first assumption is not unreasonable. The consequence of this treatment of initial conditions is to add two or three more parameters when the model is expanded to include a second order term, rather than only one. This procedure has a tendency to increase the log likelihood by more than is accounted for by the additional AR parameter, due to the fact that the first two variances and the associated covariance are now estimated freely. For example, this accounts for the fact that the 97-79 data prefers the ARMA(,) strongly, even though this model seems to have redundant roots (compare.757 and.748). Using these assumptions about initial conditions, I estimated the ARMA model on three sets of data: the two samples from 97 to 979 and 976 to 983 shown in Tables A and A, and a longer sample from 97 to 983 that contained 96 firms. The results are shown in Tables A3 and A4 and they are essentially the same across the three samples. In Table A3 I show the value of the log likelihood obtained for six different ARMA models, including ARMA (0,0) or Martingale/random walk, as well as an unconstrained model that allows for a free covariance matrix across time. In the table it can be seen that the gain in the likelihood per degree of freedom is vastly greater going from a simple random walk to an ARMA(,) model than from the ARMA(,) to the unconstrained model. The Akaike information criteria suggest that either AR() or ARMA(,) are to be preferred among the levels models, while ARMA(,) is preferred for the first differenced data. In Table A4 I show the estimated values of the roots of the different processes. For example, the ARMA(,) estimates for 97-979 suggest that the employment process be described as follows: (.76 L)( 0.98 L) Y = (.75 L) ε It can be seen from this table that the estimates obtained with first differenced data are entirely consistent with those obtained using levels, because the dominant effect in the latter case is one autoregressive root near unity. It is also the case that both the ARMA(,) in levels and ARMA(,) in differences have near redundant roots for the 97 to 979 period (the t-statistic for equality is 0.9), while in the later period the roots are stable and significantly different from each other. I conclude that an adequate representation of the time series behavior of the data is ARIMA(,,), with a possible preference for a slightly simpler model in the case of the earlier period because of the unstable and near redundant roots. In the paper I interpret these time series results in the context of several slightly more informative models. t t

Appendix B: Sample selection and stochastic threshold models In this appendix I derive the relationship of the stochastic threshold model to the generalized Tobit (sample selection) model and discuss the consequences of the identifying assumptions used in estimating each model. Although there is nothing new here, the literature on Tobit models (Amemiya 984, Maddala 983) does not seem to contain a discussion of the connection between the two models. Such a connection is useful, because it implies that the same computer program can be used to estimate either model. First I present the standard censored regression model with a stochastic threshold due to Nelson (977). Denote the size of the firm in the second period as y i and the unobserved threshold below which the firm will drop out of the sample as y i. Then the model can be written as follows: y = X β + u if y y i i i i i y not observed if y < y i i i y = X β + u i i i The disturbance vector u = (u i, u i) has a bivariate normal distribution with mean zero and variance σ Euu ' = ρσσ σ The X i includes all the exogenous and predetermined variables for the model, including the size in the initial period. Some of the β s may be zero if there are exclusion restrictions. Nelson shows that this model requires at least one exclusion restriction or the restriction ρ = 0 in order to identify all the parameters. The above model can be rewritten as a standard generalized Tobit model of the following form: y = X β + ν if z = y y 0 i i i i i i y not observed if z = y y < 0 z i i i i = X δ + ν where δ = β β i i i The covariance matrix of the disturbances is now the following: ω Eνν ' = λωω ω 3

It is customary when estimating this model to normalize the residual variance of the unobserved latent variable z i to be unity so that its disturbance is ν iω and the coefficient vector estimated is δ/ω. Because z is completely unobserved, this normalization is innocuous and β is still completely identified. However, estimation of the sample selection model is not sufficient to identify the parameters of the stochastic threshold model. It can be easily shown that the relationship between the covariance matrices of the two sets of disturbances is the following: ω Euu ' = ω + λωω ω + λωω + ω So that we have σ = ω β = β ω ( δ / ω ) ρ = ( σ + λω ) / σ σ = σ + λω σ + ω Given estimates of δ/ω, λ, and σ, we will need ω in order to identify the parameters of the stochastic threshold model. As Nelson showed, identification can be achieved either with an exclusion restriction on one of the β s or by setting the correlation between the two equations, ρ, to zero. Therefore, the identifying assumption I used in the sample selection model is not sufficient to identify the parameters of the stochastic threshold model. On the other hand, in the presence of one of the identifying assumption for the Nelson model, it is no longer necessary to normalize the variance of ν i to unity. Thus the stochastic threshold is in some sense a special case of the more general sample selection model. In this paper I chose to use the more general model in order to capture the notion that the firm may drop out of the sample for reasons other than a size threshold. Appendix C: Testing for heteroskedasticity in the sample selection model This appendix develops a Lagrange Multiplier test for heteroskedasticity of the disturbance in the regression equation of the sample selection model, following a test suggested by Lee and Maddala (985) for the Tobit model. The alternative being considered is that the disturbance ν i of the growth rate equation has a variance that is a (possibly nonlinear) function of the regressors, in particular, of firm size: σ = α + γ G( X i ) 4

Under the null hypothesis, the vector γ is zero and the disturbance therefore homoskedastic. This test has the usual properties of an LM test: it is asymptotically locally most powerful under the alternative being considered. As in Lee and Maddala, it turns out that the exact form of G goes not matter, since it is being approximated by linear functions of X i near γ=0. The likelihood function for the generalized Tobit model outlined in the text is the following: log L = log[ Φ( Z δ )] 0 i + log Φ ( Ziδ + ρν i / σ ) ν i / σ logσ log( ρ ) where the summation over 0 and denotes the sum over not observed and observed data respectively. Φ(.) denotes the standard normal cumulative distribution function. Differentiating with respect to σ (under the null of homoskedasticity), I obtain the following expression for the score: log L = ( ρν i / σ ) λ( Ziδ + ρν i / σ ) + ν i / σ ) σ σ () H0 where λ(.) denotes the inverse Mills ratio, φ(.)/φ(.). The LM (score) test for γ = 0 is then a test of the following form: log L '( ) 0 G α X i = σ where the degrees of freedom for the test are the number of regressors in X i and all quantities are evaluated at the maximum likelihood estimates obtained under the null hypothesis. Note that because we are testing for heteroskedasticity of ν i only and not of ν i, only the observations for which y i is observed enter the test statistic, in contrast to the Tobit model case, where the disturbance of the selection equation and regression equation are the same. To perform the actual test, I use the regression methodology of Breusch and Pagan (979), which implicitly estimates the variance of this statistic from its sample variance. This computation is invariant to any renormalization which does not depend on the observations so that the G (α) term drops out. The quantity which is regressed on the X i to perform the test is given in the square brackets of equation (). Note that if the estimated ρ is zero, this is the conventional LM test for heteroskedasticity, where ν i is regressed on a constant and the X i. 5

Table A: Log Employment Covariance Matrix 349 Firms Levels 97 973 974 975 976 977 978 979 97.76 973.704.678 974.695.670.705 975.655.63.666.655 976.60.587.63.6.60 977.575.559.594.586.577.58 978.546.53.568.600.55.560.566 979.54.530.573.563.557.570.58.643 First differences 973-7 974-73 975-74 976-75 977-76 978-77 979-78 973-7 0.034 974-73 0.009 0.047 975-74 0.00-0.000 0.073 976-75 -0.003-0.0006-0.0007 0.033 977-76 0.0038-0.000-0.009-0.000 0.095 978-77 0.0030 0.008 0.0000-0.00-0.0035 0.075 979-78 0.004 0.0066-0.003-0.003-0.0039-0.0059 0.0466 Overall year means removed. The asymptotic s.e. is approximately 0.09 for the levels and 0.00 for the first differences.

Table A: Log Employment Covariance Matrix 098 Firms Levels 976 977 978 979 980 98 98 983 976.95 977.9.90 978.87.86.84 979.83.8.8.8 980.8.8.79.8.83 98.80.79.78.80.83.87 98.78.78.77.79.8.86.9 983.73.74.73.75.78.8.87.89 First differences 977-76 978-77 979-78 980-79 98-80 98-8 983-8 977-76 0.0333 978-77 0.0035 0.034 979-78 0.0040 0.003 0.0347 980-79 -0.00 0.008 0.0054 0.0405 98-80 0.0000 0.004 0.0040 0.0086 0.0345 98-8 0.009 0.00 0.0000 0.0064-0.0005 0.056 983-8 0.004 0.0035-0.006-0.005-0.000 0.0058 0.0597 Overall year means removed. The asymptotic s.e. is approximately 0.0 for the levels and 0.005 for the first differences.

Table A3: Time Series Estimates for Log Employment 97-79 976-83 97-83 Model # params Log L # params Log L # params Log L Levels Random walk 8-5.3 8-40.9-07.0 AR() 9-07.0 9-5.7 3-65.9 MA() 0 -.9 0-96.3 4-6.7 ARMA(,) -86.0-76. 5-33.7 AR() -85. -69.9 5-3. ARMA(,) 3-36.4 3-55.8 7-7. Unconstrained 36 0.0 36 0.0 78 0.0 (5.) (406.6) (89.0) First differences Random walk 7-87.6 7-7.9-93.4 MA() 9-74.7 9-89.9 3-6.9 ARMA(,) 0-9.9 0-73. 4-4.6 Unconstrained 8 0.0 8 0.0 66 0.0 (757.4) (79.6) (36.0) The logarithm of the likelihood is measured relative to the unconstrained model, which freely fits each covariance to a separate parameter. The actual values of the unconstrained log likelihoods are shown in parentheses below each column.

Table A4: Parameter Estimates for the Time Series Models Roots of AR process Levels 97-79 0.989 (0.00) 0 0 AR() 976-83 0.99 (0.00) 0 0 97-83 0.990 (0.00) 0 0 97-79 0-0.053 (0.0) MA() 976-83 0-0.095 (0.0) 97-83 0-0.074 (0.00) 97-79 0.99 (0.00) 0-0.0553 (0.0) ARMA(,) 976-83 0.990 (0.00) 0-0.05 (0.03) 97-83 0.99 (0.00) 0-0.0765 (0.0) 97-79 0.99 (0.07) 0.060 (0.08) 0 AR() 976-83 0.990 (0.08) 0.0 (0.0) 0 97-83 0.990 (0.003) 0.08 (0.006) 0 Roots of MA process 97-79 0.984 (0.44).757 (0.94).748 (0.38) ARMA(,) 976-83 0.99 (0.75) 0.574 (0.388) 0.449 (0.096) 97-83.000 (0.767) 0.939 (0.0) 0.99 (0.057) First differences 97-79 0-0.054 (0.0) MA() 976-83 0-0.05 (0.05) 97-83 0-0.076 (0.0) 97-79.745 (0.47).737 (0.50) ARMA(,) 976-83 0.553 (0.097) 0.43 (0.0) 97-83 0.878 (0.05) 0.8 (0.056)