Returns to Tenure. Christopher Taber. March 31, Department of Economics University of Wisconsin-Madison

Returns to Tenure Christopher Taber Department of Economics University of Wisconsin-Madison March 31, 2008

Outline 1 Basic Framework 2 Abraham and Farber 3 Altonji and Shakotko 4 Topel

Basic Framework Lets start with a basic framework for thinking about turnover and job specific human capital. The first important concept is the returns to seniority. If there is firm specific human capital and if the worker has some bargaining power the longer a worker has worked at a firm, the wage will go up This is called the seniority effect or the returns to tenure

Another reason for the seniority effect is for principal-agent type stuff If you are worried about the worker shirking you might want to backload pay If they get caught, they get fired Will only happen at the firm level since if you switch jobs, they won t be honored to honor this

Matching component The second issue is the matching component People are better at some jobs than others For example I am not so bad at economics, but I would be pretty bad working in an auto shop There are multiple reasons why this is important although I need some friction or lack of information so that people don t just automatically move to the best job immediately

Putting these together we get an expression like this: w ijt = αt ijt + βe it + θ i + η ij + ε ijt (I don t like this notation but everyone else used it so I will) where Variable w ijt T ijt E it θ i η ij ε ijt Definition Wage Tenure Experience Permanent Individual Component Firm/Worker Match Transitory error term

So how will this work? If a worker is sitting at a firm and gets an outside offer what do they worry about? Why can t we just run OLS to estimate the model? The biggest problem is that we are worried that T ijt will be postively correlated with η ij. That is people who are particularly well matched to a job are likely to stay at that job for a long time To deal with this we need an instrument-a variable that is correlated with T ijt but uncorrelated with η ij. A number of different papers have examined this question.

Outline 1 Basic Framework 2 Abraham and Farber 3 Altonji and Shakotko 4 Topel

The first we look at is Abraham and Farber. There is another issue here which is that once a job has started, T ijt and E it are perfectly collinear Abraham and Farber choose a slightly different parameterization. The define (my notation) E 0 ij as the level of experience at the beginning of the job. This means that E it = E 0 ij + T ijt.

Put this into the parameterization as w ijt = αt ijt + βe it + θ i + η ij + ε ijt ( ) = αt ijt + β Eij 0 + T ijt + θ i + η ij + ε ijt = (α + β) T ijt + βe 0 ij + θ i + η ij + ε ijt α T ijt + βe 0 ij + θ i + η ij + ε ijt They begin by worrying about the relationship betweeen E 0 ij and η ij People who have been in the labor market for a long period of time are going to tend to have found better matches Thus these variables will tend to be positively related

Abraham and Farber recognize this by writing the regression of η ij on T ijt as η ij = δe 0 ij + φ ij Substituting in w ijt = α T ijt + βe 0 ij + θ i + η ij + ε ijt = α T ijt + (β + δ) E 0 ij + θ i + φ ij + ε ijt Note that δ is really part of the causal return to experience, so β + δ represents the full returns to experience

We have still not solved the problem that T ijt is likely to be related to φ ij. Let D ij be the completed duration of the job (not typically observed). They define D ij = γη ij + ɛ ij = δγeij 0 + γφ ij + ɛ ij.

They argue that in a given cross section Thus E ( T ijt ) = 1 2 E ( D ij ) T ijt 1 2 D ij + ξ ijt = δγ 2 E 0 ij + γ 2 φ ij + 1 2 ɛ ij + ξ ijt

They argue that this suggests that ξ ijt comes out as a natural instrument for T ijt It is cocorrelated with T ijt by design It is uncorrelated with everything else by design Thus if you observe completed duration, you can construct this as the instrument is T ijt D ij /2. More generally if spells aren t 1/2 of completed duration on average you can just run the regression and use the residual. T ijt = ω 0 + ω 1 D ij + ξ ijt

There is also a simpler but similar approach. We can just include D ij in the regression model. w ijt = α T ijt + (β + δ) E 0 ij + ρd ij + θ i + η ij + ε ijt The idea is quite simply that conditional on D ij there is no reason for T ijt to be correlated with φ ij. Lets see what this converges to Lets remind ourselves about a couple tricks in figuring this stuff out. First think about a regression where we partition our model into Y = X 1 β 1 + X 2 β 2 + U

Using the partioned inverse formula it is straight forward to show that ˆβ 1 can be obtained by: Regress X 1 on X 2. Take the residual from these regressions Regress Y on the residuals The other thing is that if we know ˆβ 1 we can obtain ˆβ 2 by regressing Y X 1 ˆβ 1 on X 2.

This is straight forward to see as we can write the moment conditions from this as ) (Y X 1 ˆβ 1 X 2 ˆβ 2 = 0. X 2 Now lets use this to look at the probem above. w ijt = α T ijt + (β + δ) E 0 ij + ρd ij + θ i + η ij + ε ijt First think of estimating the plim of α.

Since T ijt will be uncorrelated with E 0 ij conditional on D ij, regressing T ijt on D ij and E 0 ij yields: so α cov(ξ ijt, w ijt ) var(ξ ijt ) T ijt = ω 0 + ω 1 D ij + ξ ijt = cov(ξ ijt, α T ijt + (β + δ) E 0 ij + ρd ij + θ i + η ij + ε ijt ) var(ξ ijt ) = α cov(ξ ijt, T ijt ) var(ξ ijt ) = α cov(ξ ijt, ω 0 + ω 1 D ij + ξ ijt ) var(ξ ijt ) = α

But now what about the coefficient on experience? Lets think about the moment conditions from the regression ( [ ]) E Eij 0 w ijt αt ijt b 1 Eij 0 b 2 D ij = 0 [ ]) E (D ij w ijt αt ijt b 1 Eij 0 b 2 D ij = 0 ( E E [ ]) ( Eij 0 βeij 0 + θ i + η ij + ε ijt = b 1 var [ ]) (D ij βeij 0 + θ i + η ij + ε ijt = b 1 cov Eij 0 ) ( ) + b 2 cov Eij 0, D ij ) + b 2 var ( ) D ij ( E 0 ij, D ij To economize on space, let V denote variance and C covariance ( ) ( ) ( ) ( ) βv Eij 0 + δv Eij 0 =b 1 V Eij 0 + b 2 C Eij 0, D ij ( ) βc Eij 0, D ij + γc ( ) ( ) η ij =b1 C Eij 0, D ij + b 2 V ( ) D ij

( ) βv Eij 0 V ( ) ( ) D ij + δv Eij 0 V ( ) D ij b 1 = ( ) V Eij 0 V ( ) ( ) ( ) D ij C Eij 0, D ij C Eij 0, D ij ( ) ( ) βc Eij 0, D ij C Eij 0, D ij + γv ( ) ( ) η ij C Eij 0, D ij ( ) V Eij 0 V ( ) ( ) ( ) D ij C Eij 0, D ij C Eij 0, D ij ( ) δv Eij 0 V ( ) ( ) ( ) D ij γv ηij C Eij 0, D ij =β + ( ) V Eij 0 V ( ) ( ) ( ) D ij C Eij 0, D ij C Eij 0, D ij ( ) δv Eij 0 V ( ) ( ) ( [ ] ) γη ij + ɛ ij γv ηij C Eij 0, γ δeij 0 + φ ij + ɛ ij =β + ( ) V Eij 0 V ( ) ( ( [ ] )) 2 γη ij + ɛ ij C Eij 0, γ δeij 0 + φ ij + ɛ ij

= β + δ ( V E 0 ij ( ) V Eij 0 V ( ) ɛ ij ) V ( ) [ ( γη ij + ɛ ij γδv E 0 ij )] 2 This extra term must be less than 1 so the coefficient is biased downward Thus we understate the returns to experience This means that we will overstate the net returns to tenure estimated as : α b 1 Thus at least we get some idea of how big this effect could be

Obviously there is one remaining big problem In many case we do not know D ij exactly They come up with a way of simulating it They use a proportional hazard Weibull model Pr (D T ) = exp ( λt τ ) with λ = e Z Γ You can estimate Γ and τ by maximum likelihood.

Then assuming that all jobs end at age 65 it is straight forward to show that E (D D > S f, Z ) = 1 exp ( ) λsf τ S65 S f λτt τ e λtτ dt+ exp ( λs65 τ ) exp ( ) λsf τ S 65 They use this as a proxy for D ij when they don t have it. Lets look at the results

There are a number of problems with this approach that we might be worried about. What are the main ones?

Outline 1 Basic Framework 2 Abraham and Farber 3 Altonji and Shakotko 4 Topel

The next paper we look at that examines this question is Altonji and Shakotko (RES, 1987) They take the same basic set up Lets focus on the specification w ijt = αt ijt + βe it + θ i + η ij + ε ijt They actually have higher order terms and some other stuff-but that is not important for the main idea We are worried that T ijt is correlated with θ i and η ij.

Let τ ij be the set of t for which we can observe individual i on job j and N ij the number of such observations They then use as their instrument T ijt T ijt T ijt where T ijt 1 N ij t τ ij T ijt This has the really cool feature of being uncorrelated with θ i and η ij by construction.

There is one major problem though: We still have that E it is likely to be positively correlated with η ij In general we think that this means that β is likely to be biased upward Since T ijt and E it are going to tend to be positively related, this means that α will tend to be biased downward

Lets try to work this out formally in a way similar to before. We get the moment conditions: [ ( ) ] E Tijt wijt b 1 T ijt b 2 E it = 0 E [ E it ( wijt b 1 T ijt b 2 E it )] = 0 which gives ( ) ( ) cov Tijt, αt ijt + βe it + θ i + η ij + ε ijt = cov Tijt, b 1 T ijt + b 2 E it cov ( ) ( ) E it, αt ijt + βe it + θ i + η ij + ε ijt = cov Eit, b 1 T ijt + b 2 E it ( ) ( ) ( ) ( ) αc Tijt, T ijt + βc Tijt, E it = b 1 C Tijt, T ijt + b 2 C Tijt, E it αc ( E it, T ijt ) + βv (Eit, ) + cov ( E it, η ij ) = b1 C ( E it, T ijt ) + b2 V (E it )

Solving the equations gives [ ( ) ( )] αc Tijt, T ijt + βc Tijt, E it V (E it ) b 1 = ( ) C Tijt, T ijt V (E it ) C ( ) ( ) E it, T ijt C Tijt, E it [ ( ) αc Eit, T ijt + βv (Eit, ) + C ( )] ( ) E it, η ij C Tijt, E it =α ( ) C Tijt, T ijt V (E it ) C ( ) ( ) E it, T ijt C Tijt, E it cov ( ) ( ) E it, η ij cov Tijt, E it ( ) cov Tijt, T ijt var (E it ) cov ( ) ( ) E it, T ijt cov Tijt, E it So the estimator is biased.

They come back and deal with this later, but lets forget about this for now. Lets just focus on the IV1 estimator-the rest try to get more efficient estimates in various ways They also use T 2 ijt and OLDJOB ijt T ijt > 0. The point of the OLDJOB variable is to allow more flexibility in the relationship

They find small effects. Next they try to worry about the possible bias One thing to compare it to is fixed effects estimation of w ijt = α 1 T ijt + α 2 T 2 ijt + α 3OLDJOB ijt + β 1 E it + β 2 E 2 it + θ i + η ij + ε ijt Note that in this case α 1 and β 1 can not be separately estimated as they are perfectly collinear within a job However we can estimate their sum and all of the other coefficients We can compare that to the IV estimates They are quite similar and you can t reject that they are different

Another thing the tried to do was to try to instrument for education using Ẽijt this led to implausible (and I think imprecise) estimates. Finally they try making a bunch of assumptions about cov(e it, η ij ) to get an idea what the bias might look like These results are presented in the following Table

Altogether, Altonji and Shakotko would claim the returns to tenure are small

Outline 1 Basic Framework 2 Abraham and Farber 3 Altonji and Shakotko 4 Topel

Topel takes a different approach and gets a different answer Lets focus on the same basic model w ijt = αt ijt + βe it + θ i + η ij + ε ijt He notices that for people who do not change jobs w ijt w ijt 1 =α ( T ijt T ijt 1 ) + β (Eit E it 1 ) + ε ijt ε ijt 1 =α + β + ε ijt ε ijt 1

Thus one can just get a consistent estimate of α + β by differencing and running fixed effects. If we knew β we would be done. How might we estimate β? Think only using people at the time of hire For them, T ijt = 0. Thus for new hires the wage equation is w ijt = βe it + θ i + η ij + ε ijt

Will OLS give us consistent estimates of β? No, it will be upward biased. We then estimate α as ˆα = α + β ˆβ Thus if ˆβ is biased upwards, ˆα will be biased downward Thus he interprets his estimate as a lower bound of the effect

He actually does something related, but better We showed before that w ijt = (α + β) T ijt + βe 0 ij + θ i + η ij + ε ijt so w ijt α + βt ijt +βe 0 ij + θ i + η ij + ε ijt He implements the second stage by using this approach. Lets look at the results

Why are these results so different from Abraham and Farber and from Altonji and Shakoto? Topel claims that the difference from Abraham and Farber primarily come because of the assumption they make. Take their model as w ijt = (α + β) T ijt + (β + δ) E 0 ij + ρd ij + θ i + φ ij + ε ijt Their key idea is that controlling for D ij will allow us to get consistent estimates (or close)

Topel points out that one can rewrite this model as w ijt = (α + β) T ij + (α + β) ( T ijt T ij ) + (β + δ) E 0 ij +ρd ij + θ i + φ ij + ε ijt Since ( T ijt T ij ) is orthogonal to everything else in the model, the coefficient in front of it is essentially the fixed effect estimator

He interprets what AF do is to restrict these parameters to be the same However, this is testable When he separates things out in this way, this is what he finds.

Note that the AS idea is similar to Topel Both take advantage of the fact that ( T ijt T ij ) is exogenous Both have a coefficient on E which is biased upward Topel claims that the difference between his and AS s results come from the facts that: Altonji s estimate is biased downward Measurement error is a big deal Altonji and Shakoto do not control for time effects in as good a way Here is his evidence on the subject

Altonji did not conceed He came back with a response to Topel (jointly written with Williams) They claim that after dealing with all of these issues more closely, you get effects that are smaller than Topel but bigger than AS In their preferred estimates, the effect of ten years of seniority on log wages is approximately 0.11 I do not want to get into all of these details, but if you are interested in this subject you should read those papers.