Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix Tabes Predicted # of schoos Count Actua Poisson NB Semi-P,3 975,3,39 357 68 355 362 2 35 267 4 39 3 64 94 67 65 4 39 29 36 35 5 8 8 22 2 6 5 2 4 3 7 7 9 9 8 7 6 9 4 5 5 2 4 4 3 3 3 2 3 2 2 3 2 2 4 5 6 7 8 9 2+ 3 4 4 og-ikeihood -2,63.6 -,899. -,893.6 2 84.5E+8 5.5 6.5 p-vaue..745.688 Tabe 5 : Actua vs. schoos predicted distribution of counts of high-scorers across

44 THE AMERICAN ECONOMIC REVIEW 2 Proofs Proof of Proposition It is standard that under mode Y i NB, e X i and under mode 2 + e X i Y i NB (Xi ) g(x i ), e g(x i). (See Boswe and Pati (97) or Karin (966, p. 345).) Hence, the distributions in the two modes are identica if e X i + e X i = (X i ) g(x i ) ; and = e g(x i). The first hods for a X i if g(x i )= (X i ). The second then hods if + e X i which hods for g(x i ) = og + e X i = e g(x i),. Proof of Proposition 2 Appying the resut of Proposition to the outcome of this mode conditiona on u i we see that the conditiona distribution is NB p, pe X i u i. The mean + pe X i u i and variance of a NB(r, p) distribution are E(Y )= rp rp p and Var(Y )= = ( p) 2 E(Y ) p.this gives E(Y i X i,u i )=e X i u i and Var(Y i X i,u i )=E(Y i X i,u i )+ p E(Y i X i,u i ) 2. The resut on the expectation of Y i X i foows from iterated expectations: E(Y i X i )=E ui (E(Y i X i,u i )) = E ui e X i u i = e X i. And the formua for the variance foows from the conditiona variance formua: Var(Y i X i ) = E ui Var(Y i X i,u i ) + Var ui E(Y i X i,u i ) = E ui e X i u i + p e 2X i u 2 i + Var ui e X i u i = e X i Eu i + p e 2X i Var(u i )+(Eu i ) 2 + e 2X i Var(u i ) = e X i + p e 2X i ( u + ) + e 2X i u = e X i + e 2X i ( p u + p + u ). Proof of Proposition 3 Suppose the Y it are generated as described. Then using Proposition 2 and

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 45 iterated expectations over t we have E(Y it X i )= 2 E(Y i X i )+ 2 E(Y i2 X i )= 2 ex i + 2 ex i = e X i. The conditiona variance formua gives Var(Y it X i )=E t Var(Y it X i,t)+var t (E(Y it X i )). The first term on the RHS of this expression is ust the variance in the singe period mode given by Proposition 2, and the second term is zero, so we find Var(Y it X i )=e X i + e 2X i ( p u + p + u ). The mean of Y i X i foows from an identica cacuation: E(Y i X i )=E (Y i + Y i2 )=E(Y i X i )+E(Y i2 X i )=2e X i. The variance is is a itte more compicated. We have Var(Y i X i ) = Var (Y i + Y i2 ) = Var(Y i X i ) + Var(Y i2 X i ) + 2Cov(Y i,y i2 X i ). To find the covariance we condition on u i and use the fact that Y i and Y i2 are conditionay independent given X i and u i : Cov(Y i,y i2 X i ) = E(Y i Y i2 X i ) E(Y i X i )E(Y i2 X i ) = E ui (E(Y i Y i2 X i,u i )) e 2X i = E ui (E(Y i X i,u i )E(Y i2 X i,u i )) e 2X i = E ui e X i u i e X i u i e 2X i = e 2X i E ui (u 2 i t) = e 2X i u Pugging back into the formua for the variance we find Var(Y i X i ) = 2 e X i + e 2X i ( u + p + u p ) +2e 2X i u 2 = 2e X i + 2e X i u + 2 p + 2 p u Using these formuas we wi have Var(Y it X i )=E(Y it X i )+ E(Y it X i ) 2 and Var(Y i X i )=E(Y i X i )+ E(Y i X i ) 2 if and ony if two conditions hod: = u + p + u p ; and = u + 2 p + 2 u p. The first equation can hod for nonnegative ( u, p ) ony if u 2 [, ]. Given any such u the first equation wi hod for an unique p : p ( u ) u + u.given

46 THE AMERICAN ECONOMIC REVIEW this vaue for p we have p + p u = u so the second equation becomes = u + 2 ( u) which is true for u =2. The formua for p foows by substitution. Proof of Proposition 4 Let the density f(x) be represented as f (x) =x e x P = g (x).the distribution of y i is then described by Z Pr{y i = k z i } = e (e z i u i) e z k i u i X u i e u i @ g (u i ) A du i k! = Z = e (e z i +)u e z k i i u i X u @ i g (u i ) A du i. k! = Let z i = e z i + u i,sodz i = e z i + du i.then Pr{y i = k z i } = = Z (e z i appe e z zk i i k! e k z i + ) k+ +2 e z i e z i + k (e z i Z e z zk i i k! z + i X + ) zi @ = X @ = g g zi e z i + zi e z i + A A dz i. dz i e z i + To simpify we use two we-known identities: the monomia formua for Laguerre poynomias, u k i k! = kx = ( ) k + k (u i ), and the series expansion ui + = ( + ) X = + (u i ). The former impies that z k i k! = kx = ( ) k + k (z i )

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 47 and the atter impies that zi e z i + = (e z i + ) X = e ( ) z i + (z i ). Substituting these formuas into the formua for y i gives Pr{y i = k z i } = e k z i Z (e z i + ) k+ +2 X @ g (e z i + ) = zi + e z i X = Laguerre poynomias are orthogona with Z e ( zi e z i n (z i ) m (z i ) dz i = Using this, the the formua for y i simpifies to ) z i e Pr{y i = k z i } = k z i P k k + (e z i +) k+ + = ( ) k e = k z i (e z i +) k+ + This competes the proof. " Pk = (+ +) kx = + ( ) k + k (z i )! (z i ) if m 6= n (n+ +) n! if m = n! (+ +)!! k + e z i ( )! k A dz i. P = g (e z i +) e( ) z i! P = g e z i e z i + + +!!#!! The conditions given in the text for the u to be a vaid density, for E(u) =, and the expression for the Var(u) can be derived by appying the formua Z zi e z i n (z i ) m (z i ) dz i = if m 6= n (n+ +) n! if m = n with n = and m =, 2, 3 using (x) =, L( ) (x) = x +(+ ), and x2 2 (x) = 2 ( + 2)x + ( +)( +2) 2. 3 Bootstrap Procedure We obtained standard errors for our semiparametric estimates and confidence bands for the distribution of unobserved heterogeneity using both parametric and nonparametric bootstrapping procedures. In each iteration of the bootstrap, we generate a simuated dataset {ỹ i, z i },984 i=, then estimate the parame-

48 THE AMERICAN ECONOMIC REVIEW ters, g,..., g N, using the semiparametric estimation procedure described in Section IV. Standard errors are cacuated as the standard deviation of each estimated parameter across, simuations. For exampe, the standard error of ˆ is cacuated as SE(ˆ ) = s P = ( ˆ ) 2. Another functiona of interest is a 95% confidence band on the estimated density and CDF of unobserved heterogeneity. For each u 2 (, ) and for each simuation of the bootstrap, we cacuate the density f and CDF F as those generated by the parameter vector, g,..., g N,. Denote as f p (u)thep th percentie of f(u) across, simuations; then the 95% confidence band for ˆf (u) is f2.5 (u), f 97.5 (u). The confidence band for ˆF is cacuated simiary. Confidence bands for u 2 (, 3) and u 2 (3, ) are shown in Section IV for the production of AMC high-scorers and in Section V for the production of SAT high-scorers. In each simuation of the parametric bootstrap, we use the parameter estimates obtained using our semiparametric procedure to generate simuated outcomes. First, we draw a random sampe z of size,984 (with repacement) from the set of covariates z isted in Tabe 3. We aso draw a random sampe ũ of size,984 from the CDF ˆF, which we estimated using the procedure in Section IV on the true dataset. For each i =,...,, 984, we then generate i = e z ˆũ i i and draw ỹi P from a Poisson distribution with rate parameter i. Finay, we estimate P, gp,..., gp N, P on the simuated dataset (ỹ P, z ). The nonparametric bootstrap proceeds simiary, except that we use the empirica distribution of y rather than the estimated theoretica distribution of y. That is, for each simuation, we draw a random sampe (ỹ NP, z ) of size, 984 (with repacement) from the set of outcomes y and covariates z, then estimate NP, g NP,..., gnp N, NP on the simuated dataset (ỹ NP, z ). As in the semiparametric estimation on our fu sampe, the resuts of each bootstrap estimation may depend on the starting vaues chosen; in our resuts, we present those estimates for which the ikeihood is highest after trying numerous starting vaues. 57 We begin each bootstrap by running a tria bootstrap of 2 simuations for severa candidate starting vaues: those resuting in the highest ikeihood in the fu sampe estimation and the center of each range of starting vaues for which the resuting ikeihood is cose to that of the best starting vaues. We then use the 57 In practice, we used starting vaues from either a Poisson or negative binomia regression, aong with one of two potentia sets of starting vaues for our parameters,g,...,g N.Thefirstsetofparameters we tried was the best-fit parameters of the candidate distributions described in Appendix A.2, so that the optimization woud be aowed to converge to a number of di erenty-shaped distributions. We aso tried setting each g i =andvarying between -.9 and 2. The atter approach often yieded the highest ikeihood.

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 49 vaues that provide the highest average og-ikeihood in the tria bootstrap as the starting vaues in the fu bootstrap. If our mode is specified correcty, then the parametric bootstrap is more e - cient; if the mode is misspecified, then the nonparametric bootstrap wi be more appropriate. See Efron and Tibshirani (993) for a discussion. In our appication, neither procedure provides smaer or arger standard errors or confidence bands across a parameters or outcomes, but parametric standard errors are often sighty smaer, and parametric bands are often sighty narrower and smoother. In the body of the paper, we present the resuts of the parametric bootstrap, but our interpretation of the resuts is una ected by the choice of bootstrap procedure. 4 Simuations The simuations impemented our estimation procedure on datasets created by drawing each z i from a uniform distribution with support [, ]; drawing each u i from the desired error distribution; forming i = e z i u i, where = [ 4.27,,,,.,.,.2]; and drawing y i from a Poisson distribution with rate parameter i. Each simuated variabe incuded 2, 5 observations. The distributions of the simuated covariates and the vaues for were chosen so that the mean and variance of the simuated e z i woud roughy match the mean and variance of the fitted vaues in a negative binomia regression of the count of AMC 2 high-scorers on schoo-eve covariates. The u i were chosen from one of three distributions depending on the simuation: an exponentia distribution with mean and standard deviation, a ognorma distribution with mean and variance 3, and a uniform distribution on [, 2]. The motivation for these choices was to demonstrate the performance of our procedure for a diverse set of underying distributions: the exponentia distribution is within the cass of modes being estimated even if N =, the ognorma distribution cannot be fit perfecty with afiniten and has a thicker upper tai, and the uniform distribution is a more chaenging distribution to reproduce with a series expansion. We estimated the mode using N =, 2, 4, 6, and 8 terms. 58 The estimated coe cients ˆ on the observed characteristics are fairy precise and show amost no bias. Tabe 6 presents some summary statistics on the estimates for simuations with N = 8 Laguerre poynomias. 59 The first coumn ists the true vaues for the coe cients on each simuated covariate. The next three coumns ist the mean and standard deviation (in parentheses) of the estimates across the simuated datasets for each simuated distribution. There are no notabe di erences across heterogeneity distributions in the consistency or precision of estimated ˆ s. 58 For these estimations we did not restrict g to be / ( +2) and instead ensured that the estimated distributions have mean by rescaing the preiminary estimates by dividing by the mean. 59 Summary statistics for estimates of ˆ using N =, 2, 4, 6aresimiar.

5 THE AMERICAN ECONOMIC REVIEW True Mean and SD of estimated coe cients Variabe Coe s. Exponentia u Lognorma u Uniform u Constant -4.27-4.269-4.265-4.2777 (.536) (.57) (.9) z..997.9977.9984 (.55) (.593) (.76) z 2....26 (.537) (.424) (.4) z 3..9995.999.9 (.37) (.377) (.269) z 4..994.993.998 (.27) (.54) (.9) z 5..997.996. (.26) (.27) (.5) z 6.2.996.994.23 (.84) (.25) (.32) Notes: True and estimated coe cients from semi-parametric mode estimation using simuated data, varying the distribution of underying heterogeneity. Resuts dispayed for the exponentia () distribution, the ognorma (, 3 ) distribution, and the uniform [, 2] distribution with 2,5 simuated observations. Mean estimates across, simuated datasets shown; standard deviations in parentheses. Tabe 6 : Estimated coe cients on observed characteristics in simuations

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 5 Tabe 7 provides some statistics on how we the mode was abe to estimate the distribution of unobserved heterogeneity. The rows correspond to the distribution from which the u s were drawn. The coumns correspond to the number N of Laguerre poynomias used in the estimations. The metric used to measure performance is integrated squared error (ISE) if the estimated density function from simuation run i is ˆf i (x), where the true data generation process has unobserved heterogeneity from distribution f(x), the ISE of that estimated density is R ( ˆf i (x) f(x)) 2 dx. The vaues in Tabe 7 are median ISE across, simuation runs. Median ISE for various modes True distribution of u N = N =2 N =4 N =6 N =8 Exponentia..45.4.2.243 Lognorma.33.5.9.48.67 Uniform [, 2].55.449.833.795.9 Notes: Median integrated squared error of estimated distributions from semi-parametric mode estimation using simuated data, varying the distribution of underying heterogeneity. Resuts dispayed for the exponentia () distribution, the ognorma (, 3 ) distribution, and the uniform [, 2] distribution with 2,5 simuated observations. Median ISE across, simuated datasets shown, varying the number of Laguerre poynomias. Tabe 7 : Goodness of fit of estimated distributions of unobserved heterogeneity in simuations: median MISE for various modes and true distributions The exponentia mode fits fairy we for a N. As one woud expect, the N =fitisbest: thetruemodeisinthen = cass and estimating additiona unnecessary parameters ust increases the scope for overfitting. The fit worsens graduay as N increases, but never becomes terribe; at N = 8, the worst fit, the median ISE is.24. To get a fee for the magnitudes, the MISE woud be.2 if the density of an exponentia distribution were over- or under- estimated by % at every vaue of u. Note aso that the exponentia distribution with mean is the gamma distribution invoved in the Poisson-gamma ustification for the negative binomia when =. Hence, the estimates of this mode can provide a sense for how we our semiparametric mode wi estimate the distribution of underying heterogeneity in a case where the negative binomia is correcty specified. The ognorma distribution does not fit as we when N =. This shoud be expected: the ognorma is not a member of the parametric famiy we are estimating and indeed no matter what is estimated the ISE cannot possiby be beow.7. Larger N make it theoreticay possibe to fit the distribution

52 THE AMERICAN ECONOMIC REVIEW much better (the parameter vectors that give distributions cosest to the true ognorma have ISEs of.756,.2,.4, and.2 for N = 2, 4, 6, and 8 respectivey), but again there is the o setting e ect that there is more scope for overfitting. The tradeo between the two e ects resuts in fairy simiar fits across the range of N. The median ISE is smaest for the N =2mode. The fits to the uniform distribution are much worse. Here, there is no parameter combination that produces a very good fit when N is sma, and overfitting becomes a concern when N is arge. 6 The best fit is obtained for N = 6, where the median ISE is 45% ower than the median ISE for the worst fit of N = 2. Figure 5 provides a graphica iustration of the performance of our method. In each of the three panes we present the true distribution in bod and three estimated distributions corresponding to the simuations (using N = 4) that were at the 25 th percentie, the 5 th percentie, and the 75 th percentie in the MISE measure of goodness of fit. In the exponentia and og-norma cases the estimated distributions seem to fit reasonaby we for vaues of around the mean (u = ) and to fit quite we for higher vaues of u. The estimated distributions are farther from the truth at ow vaues of u. This shoud be expected once we are considering a popuation of schoos in which a schoos wi in practice have zero or one high-scoring student per year, a singe year s data wi not aow one to say whether a schoos are identica or whether there is heterogeneity. Aso as expected, our method performs somewhat poory for the uniform distribution with its bounded support. However, we are encouraged to note that, even for this di cut case, the estimated distribution does mosty spread out the mass over the correct [, 2] interva. 6 Theoretica ower bounds coming from the parameter vectors that make the estimated distributions as cose as possibe to the true distribution are ISE s of.877,.456,.397,.273,.269 for N =, 2, 4, 6, 8.

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 53 Simuation Resuts: Uniform Distribution, N=4.2 density f(u).8.6.4.2.5.5 2 2.5 3 Mutipicative schoo effect u 25th percentie Median 75th percentie True Distribution Figure 5. : Actua vs. Estimated Distributions: 25 th, 5 th, and 75 th percentie fits in simuations