Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Similar documents
CS229 Lecture notes. Andrew Ng

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

SydU STAT3014 (2015) Second semester Dr. J. Chan 18

A. Distribution of the test statistic

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Stochastic Variational Inference with Gradient Linearization

Automobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

Some Measures for Asymmetry of Distributions

A Comparison Study of the Test for Right Censored and Grouped Data

Week 6 Lectures, Math 6451, Tanveer

STA 216 Project: Spline Approach to Discrete Survival Analysis

A proposed nonparametric mixture density estimation using B-spline functions

Separation of Variables and a Spherical Shell with Surface Charge

Fitting Algorithms for MMPP ATM Traffic Models

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

Two-Stage Least Squares as Minimum Distance

The EM Algorithm applied to determining new limit points of Mahler measures

How the backpropagation algorithm works Srikumar Ramalingam School of Computing University of Utah

SVM: Terminology 1(6) SVM: Terminology 2(6)

General Certificate of Education Advanced Level Examination June 2010

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

XSAT of linear CNF formulas

Control Chart For Monitoring Nonparametric Profiles With Arbitrary Design

$, (2.1) n="# #. (2.2)

On the Goal Value of a Boolean Function

Appendix for Stochastic Gradient Monomial Gamma Sampler

A Separability Index for Distance-based Clustering and Classification Algorithms

A Brief Introduction to Markov Chains and Hidden Markov Models

Torsion and shear stresses due to shear centre eccentricity in SCIA Engineer Delft University of Technology. Marijn Drillenburg

Math 124B January 17, 2012

International Journal of Mass Spectrometry

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

arxiv: v1 [math.ca] 6 Mar 2017

Appendix for Stochastic Gradient Monomial Gamma Sampler

14 Separation of Variables Method

8 Digifl'.11 Cth:uits and devices

Explicit overall risk minimization transductive bound

C. Fourier Sine Series Overview

Learning Fully Observed Undirected Graphical Models

Legendre Polynomials - Lecture 8

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

Generalized Bell polynomials and the combinatorics of Poisson central moments

An explicit Jordan Decomposition of Companion matrices

IE 361 Exam 1. b) Give *&% confidence limits for the bias of this viscometer. (No need to simplify.)

A Separability Index for Distance-based Clustering and Classification Algorithms

THINKING IN PYRAMIDS

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

AFormula for N-Row Macdonald Polynomials

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Partial permutation decoding for MacDonald codes

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS

Math 124B January 31, 2012

Course 2BA1, Section 11: Periodic Functions and Fourier Series

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

Math 1600 Lecture 5, Section 2, 15 Sep 2014

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint

A GENERALIZED SKEW LOGISTIC DISTRIBUTION

Pricing Multiple Products with the Multinomial Logit and Nested Logit Models: Concavity and Implications

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

Age of Information: The Gamma Awakening

A Separability Index for Distance-based Clustering and Classification Algorithms

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5].

GOYAL BROTHERS PRAKASHAN

CS 331: Artificial Intelligence Propositional Logic 2. Review of Last Time

Testing for the Existence of Clusters

Mat 1501 lecture notes, penultimate installment

Asynchronous Control for Coupled Markov Decision Systems

General Certificate of Education Advanced Level Examination June 2010

Lecture Note 3: Stationary Iterative Methods

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Problem set 6 The Perron Frobenius theorem.

Chapter 7 PRODUCTION FUNCTIONS. Copyright 2005 by South-Western, a division of Thomson Learning. All rights reserved.

Statistics for Applications. Chapter 7: Regression 1/43

More Scattering: the Partial Wave Expansion

HYDROGEN ATOM SELECTION RULES TRANSITION RATES

arxiv: v1 [math.co] 17 Dec 2018

c 2007 Society for Industrial and Applied Mathematics

Efficiently Generating Random Bits from Finite State Markov Chains

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

4 1-D Boundary Value Problems Heat Equation

Two-sample inference for normal mean vectors based on monotone missing data

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES

7. CREST-TO-TROUGH WAVE HEIGHT DISTRIBUTION

Cryptanalysis of PKP: A New Approach

Haar Decomposition and Reconstruction Algorithms

Discrete Techniques. Chapter Introduction

Biometrics Unit, 337 Warren Hall Cornell University, Ithaca, NY and. B. L. Raktoe

From Margins to Probabilities in Multiclass Learning Problems

Melodic contour estimation with B-spline models using a MDL criterion

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Online Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li)

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Interconnect effects on performance of Field Programmable Analog Array

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Transcription:

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix Tabes Predicted # of schoos Count Actua Poisson NB Semi-P,3 975,3,39 357 68 355 362 2 35 267 4 39 3 64 94 67 65 4 39 29 36 35 5 8 8 22 2 6 5 2 4 3 7 7 9 9 8 7 6 9 4 5 5 2 4 4 3 3 3 2 3 2 2 3 2 2 4 5 6 7 8 9 2+ 3 4 4 og-ikeihood -2,63.6 -,899. -,893.6 2 84.5E+8 5.5 6.5 p-vaue..745.688 Tabe 5 : Actua vs. schoos predicted distribution of counts of high-scorers across

44 THE AMERICAN ECONOMIC REVIEW 2 Proofs Proof of Proposition It is standard that under mode Y i NB, e X i and under mode 2 + e X i Y i NB (Xi ) g(x i ), e g(x i). (See Boswe and Pati (97) or Karin (966, p. 345).) Hence, the distributions in the two modes are identica if e X i + e X i = (X i ) g(x i ) ; and = e g(x i). The first hods for a X i if g(x i )= (X i ). The second then hods if + e X i which hods for g(x i ) = og + e X i = e g(x i),. Proof of Proposition 2 Appying the resut of Proposition to the outcome of this mode conditiona on u i we see that the conditiona distribution is NB p, pe X i u i. The mean + pe X i u i and variance of a NB(r, p) distribution are E(Y )= rp rp p and Var(Y )= = ( p) 2 E(Y ) p.this gives E(Y i X i,u i )=e X i u i and Var(Y i X i,u i )=E(Y i X i,u i )+ p E(Y i X i,u i ) 2. The resut on the expectation of Y i X i foows from iterated expectations: E(Y i X i )=E ui (E(Y i X i,u i )) = E ui e X i u i = e X i. And the formua for the variance foows from the conditiona variance formua: Var(Y i X i ) = E ui Var(Y i X i,u i ) + Var ui E(Y i X i,u i ) = E ui e X i u i + p e 2X i u 2 i + Var ui e X i u i = e X i Eu i + p e 2X i Var(u i )+(Eu i ) 2 + e 2X i Var(u i ) = e X i + p e 2X i ( u + ) + e 2X i u = e X i + e 2X i ( p u + p + u ). Proof of Proposition 3 Suppose the Y it are generated as described. Then using Proposition 2 and

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 45 iterated expectations over t we have E(Y it X i )= 2 E(Y i X i )+ 2 E(Y i2 X i )= 2 ex i + 2 ex i = e X i. The conditiona variance formua gives Var(Y it X i )=E t Var(Y it X i,t)+var t (E(Y it X i )). The first term on the RHS of this expression is ust the variance in the singe period mode given by Proposition 2, and the second term is zero, so we find Var(Y it X i )=e X i + e 2X i ( p u + p + u ). The mean of Y i X i foows from an identica cacuation: E(Y i X i )=E (Y i + Y i2 )=E(Y i X i )+E(Y i2 X i )=2e X i. The variance is is a itte more compicated. We have Var(Y i X i ) = Var (Y i + Y i2 ) = Var(Y i X i ) + Var(Y i2 X i ) + 2Cov(Y i,y i2 X i ). To find the covariance we condition on u i and use the fact that Y i and Y i2 are conditionay independent given X i and u i : Cov(Y i,y i2 X i ) = E(Y i Y i2 X i ) E(Y i X i )E(Y i2 X i ) = E ui (E(Y i Y i2 X i,u i )) e 2X i = E ui (E(Y i X i,u i )E(Y i2 X i,u i )) e 2X i = E ui e X i u i e X i u i e 2X i = e 2X i E ui (u 2 i t) = e 2X i u Pugging back into the formua for the variance we find Var(Y i X i ) = 2 e X i + e 2X i ( u + p + u p ) +2e 2X i u 2 = 2e X i + 2e X i u + 2 p + 2 p u Using these formuas we wi have Var(Y it X i )=E(Y it X i )+ E(Y it X i ) 2 and Var(Y i X i )=E(Y i X i )+ E(Y i X i ) 2 if and ony if two conditions hod: = u + p + u p ; and = u + 2 p + 2 u p. The first equation can hod for nonnegative ( u, p ) ony if u 2 [, ]. Given any such u the first equation wi hod for an unique p : p ( u ) u + u.given

46 THE AMERICAN ECONOMIC REVIEW this vaue for p we have p + p u = u so the second equation becomes = u + 2 ( u) which is true for u =2. The formua for p foows by substitution. Proof of Proposition 4 Let the density f(x) be represented as f (x) =x e x P = g (x).the distribution of y i is then described by Z Pr{y i = k z i } = e (e z i u i) e z k i u i X u i e u i @ g (u i ) A du i k! = Z = e (e z i +)u e z k i i u i X u @ i g (u i ) A du i. k! = Let z i = e z i + u i,sodz i = e z i + du i.then Pr{y i = k z i } = = Z (e z i appe e z zk i i k! e k z i + ) k+ +2 e z i e z i + k (e z i Z e z zk i i k! z + i X + ) zi @ = X @ = g g zi e z i + zi e z i + A A dz i. dz i e z i + To simpify we use two we-known identities: the monomia formua for Laguerre poynomias, u k i k! = kx = ( ) k + k (u i ), and the series expansion ui + = ( + ) X = + (u i ). The former impies that z k i k! = kx = ( ) k + k (z i )

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 47 and the atter impies that zi e z i + = (e z i + ) X = e ( ) z i + (z i ). Substituting these formuas into the formua for y i gives Pr{y i = k z i } = e k z i Z (e z i + ) k+ +2 X @ g (e z i + ) = zi + e z i X = Laguerre poynomias are orthogona with Z e ( zi e z i n (z i ) m (z i ) dz i = Using this, the the formua for y i simpifies to ) z i e Pr{y i = k z i } = k z i P k k + (e z i +) k+ + = ( ) k e = k z i (e z i +) k+ + This competes the proof. " Pk = (+ +) kx = + ( ) k + k (z i )! (z i ) if m 6= n (n+ +) n! if m = n! (+ +)!! k + e z i ( )! k A dz i. P = g (e z i +) e( ) z i! P = g e z i e z i + + +!!#!! The conditions given in the text for the u to be a vaid density, for E(u) =, and the expression for the Var(u) can be derived by appying the formua Z zi e z i n (z i ) m (z i ) dz i = if m 6= n (n+ +) n! if m = n with n = and m =, 2, 3 using (x) =, L( ) (x) = x +(+ ), and x2 2 (x) = 2 ( + 2)x + ( +)( +2) 2. 3 Bootstrap Procedure We obtained standard errors for our semiparametric estimates and confidence bands for the distribution of unobserved heterogeneity using both parametric and nonparametric bootstrapping procedures. In each iteration of the bootstrap, we generate a simuated dataset {ỹ i, z i },984 i=, then estimate the parame-

48 THE AMERICAN ECONOMIC REVIEW ters, g,..., g N, using the semiparametric estimation procedure described in Section IV. Standard errors are cacuated as the standard deviation of each estimated parameter across, simuations. For exampe, the standard error of ˆ is cacuated as SE(ˆ ) = s P = ( ˆ ) 2. Another functiona of interest is a 95% confidence band on the estimated density and CDF of unobserved heterogeneity. For each u 2 (, ) and for each simuation of the bootstrap, we cacuate the density f and CDF F as those generated by the parameter vector, g,..., g N,. Denote as f p (u)thep th percentie of f(u) across, simuations; then the 95% confidence band for ˆf (u) is f2.5 (u), f 97.5 (u). The confidence band for ˆF is cacuated simiary. Confidence bands for u 2 (, 3) and u 2 (3, ) are shown in Section IV for the production of AMC high-scorers and in Section V for the production of SAT high-scorers. In each simuation of the parametric bootstrap, we use the parameter estimates obtained using our semiparametric procedure to generate simuated outcomes. First, we draw a random sampe z of size,984 (with repacement) from the set of covariates z isted in Tabe 3. We aso draw a random sampe ũ of size,984 from the CDF ˆF, which we estimated using the procedure in Section IV on the true dataset. For each i =,...,, 984, we then generate i = e z ˆũ i i and draw ỹi P from a Poisson distribution with rate parameter i. Finay, we estimate P, gp,..., gp N, P on the simuated dataset (ỹ P, z ). The nonparametric bootstrap proceeds simiary, except that we use the empirica distribution of y rather than the estimated theoretica distribution of y. That is, for each simuation, we draw a random sampe (ỹ NP, z ) of size, 984 (with repacement) from the set of outcomes y and covariates z, then estimate NP, g NP,..., gnp N, NP on the simuated dataset (ỹ NP, z ). As in the semiparametric estimation on our fu sampe, the resuts of each bootstrap estimation may depend on the starting vaues chosen; in our resuts, we present those estimates for which the ikeihood is highest after trying numerous starting vaues. 57 We begin each bootstrap by running a tria bootstrap of 2 simuations for severa candidate starting vaues: those resuting in the highest ikeihood in the fu sampe estimation and the center of each range of starting vaues for which the resuting ikeihood is cose to that of the best starting vaues. We then use the 57 In practice, we used starting vaues from either a Poisson or negative binomia regression, aong with one of two potentia sets of starting vaues for our parameters,g,...,g N.Thefirstsetofparameters we tried was the best-fit parameters of the candidate distributions described in Appendix A.2, so that the optimization woud be aowed to converge to a number of di erenty-shaped distributions. We aso tried setting each g i =andvarying between -.9 and 2. The atter approach often yieded the highest ikeihood.

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 49 vaues that provide the highest average og-ikeihood in the tria bootstrap as the starting vaues in the fu bootstrap. If our mode is specified correcty, then the parametric bootstrap is more e - cient; if the mode is misspecified, then the nonparametric bootstrap wi be more appropriate. See Efron and Tibshirani (993) for a discussion. In our appication, neither procedure provides smaer or arger standard errors or confidence bands across a parameters or outcomes, but parametric standard errors are often sighty smaer, and parametric bands are often sighty narrower and smoother. In the body of the paper, we present the resuts of the parametric bootstrap, but our interpretation of the resuts is una ected by the choice of bootstrap procedure. 4 Simuations The simuations impemented our estimation procedure on datasets created by drawing each z i from a uniform distribution with support [, ]; drawing each u i from the desired error distribution; forming i = e z i u i, where = [ 4.27,,,,.,.,.2]; and drawing y i from a Poisson distribution with rate parameter i. Each simuated variabe incuded 2, 5 observations. The distributions of the simuated covariates and the vaues for were chosen so that the mean and variance of the simuated e z i woud roughy match the mean and variance of the fitted vaues in a negative binomia regression of the count of AMC 2 high-scorers on schoo-eve covariates. The u i were chosen from one of three distributions depending on the simuation: an exponentia distribution with mean and standard deviation, a ognorma distribution with mean and variance 3, and a uniform distribution on [, 2]. The motivation for these choices was to demonstrate the performance of our procedure for a diverse set of underying distributions: the exponentia distribution is within the cass of modes being estimated even if N =, the ognorma distribution cannot be fit perfecty with afiniten and has a thicker upper tai, and the uniform distribution is a more chaenging distribution to reproduce with a series expansion. We estimated the mode using N =, 2, 4, 6, and 8 terms. 58 The estimated coe cients ˆ on the observed characteristics are fairy precise and show amost no bias. Tabe 6 presents some summary statistics on the estimates for simuations with N = 8 Laguerre poynomias. 59 The first coumn ists the true vaues for the coe cients on each simuated covariate. The next three coumns ist the mean and standard deviation (in parentheses) of the estimates across the simuated datasets for each simuated distribution. There are no notabe di erences across heterogeneity distributions in the consistency or precision of estimated ˆ s. 58 For these estimations we did not restrict g to be / ( +2) and instead ensured that the estimated distributions have mean by rescaing the preiminary estimates by dividing by the mean. 59 Summary statistics for estimates of ˆ using N =, 2, 4, 6aresimiar.

5 THE AMERICAN ECONOMIC REVIEW True Mean and SD of estimated coe cients Variabe Coe s. Exponentia u Lognorma u Uniform u Constant -4.27-4.269-4.265-4.2777 (.536) (.57) (.9) z..997.9977.9984 (.55) (.593) (.76) z 2....26 (.537) (.424) (.4) z 3..9995.999.9 (.37) (.377) (.269) z 4..994.993.998 (.27) (.54) (.9) z 5..997.996. (.26) (.27) (.5) z 6.2.996.994.23 (.84) (.25) (.32) Notes: True and estimated coe cients from semi-parametric mode estimation using simuated data, varying the distribution of underying heterogeneity. Resuts dispayed for the exponentia () distribution, the ognorma (, 3 ) distribution, and the uniform [, 2] distribution with 2,5 simuated observations. Mean estimates across, simuated datasets shown; standard deviations in parentheses. Tabe 6 : Estimated coe cients on observed characteristics in simuations

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 5 Tabe 7 provides some statistics on how we the mode was abe to estimate the distribution of unobserved heterogeneity. The rows correspond to the distribution from which the u s were drawn. The coumns correspond to the number N of Laguerre poynomias used in the estimations. The metric used to measure performance is integrated squared error (ISE) if the estimated density function from simuation run i is ˆf i (x), where the true data generation process has unobserved heterogeneity from distribution f(x), the ISE of that estimated density is R ( ˆf i (x) f(x)) 2 dx. The vaues in Tabe 7 are median ISE across, simuation runs. Median ISE for various modes True distribution of u N = N =2 N =4 N =6 N =8 Exponentia..45.4.2.243 Lognorma.33.5.9.48.67 Uniform [, 2].55.449.833.795.9 Notes: Median integrated squared error of estimated distributions from semi-parametric mode estimation using simuated data, varying the distribution of underying heterogeneity. Resuts dispayed for the exponentia () distribution, the ognorma (, 3 ) distribution, and the uniform [, 2] distribution with 2,5 simuated observations. Median ISE across, simuated datasets shown, varying the number of Laguerre poynomias. Tabe 7 : Goodness of fit of estimated distributions of unobserved heterogeneity in simuations: median MISE for various modes and true distributions The exponentia mode fits fairy we for a N. As one woud expect, the N =fitisbest: thetruemodeisinthen = cass and estimating additiona unnecessary parameters ust increases the scope for overfitting. The fit worsens graduay as N increases, but never becomes terribe; at N = 8, the worst fit, the median ISE is.24. To get a fee for the magnitudes, the MISE woud be.2 if the density of an exponentia distribution were over- or under- estimated by % at every vaue of u. Note aso that the exponentia distribution with mean is the gamma distribution invoved in the Poisson-gamma ustification for the negative binomia when =. Hence, the estimates of this mode can provide a sense for how we our semiparametric mode wi estimate the distribution of underying heterogeneity in a case where the negative binomia is correcty specified. The ognorma distribution does not fit as we when N =. This shoud be expected: the ognorma is not a member of the parametric famiy we are estimating and indeed no matter what is estimated the ISE cannot possiby be beow.7. Larger N make it theoreticay possibe to fit the distribution

52 THE AMERICAN ECONOMIC REVIEW much better (the parameter vectors that give distributions cosest to the true ognorma have ISEs of.756,.2,.4, and.2 for N = 2, 4, 6, and 8 respectivey), but again there is the o setting e ect that there is more scope for overfitting. The tradeo between the two e ects resuts in fairy simiar fits across the range of N. The median ISE is smaest for the N =2mode. The fits to the uniform distribution are much worse. Here, there is no parameter combination that produces a very good fit when N is sma, and overfitting becomes a concern when N is arge. 6 The best fit is obtained for N = 6, where the median ISE is 45% ower than the median ISE for the worst fit of N = 2. Figure 5 provides a graphica iustration of the performance of our method. In each of the three panes we present the true distribution in bod and three estimated distributions corresponding to the simuations (using N = 4) that were at the 25 th percentie, the 5 th percentie, and the 75 th percentie in the MISE measure of goodness of fit. In the exponentia and og-norma cases the estimated distributions seem to fit reasonaby we for vaues of around the mean (u = ) and to fit quite we for higher vaues of u. The estimated distributions are farther from the truth at ow vaues of u. This shoud be expected once we are considering a popuation of schoos in which a schoos wi in practice have zero or one high-scoring student per year, a singe year s data wi not aow one to say whether a schoos are identica or whether there is heterogeneity. Aso as expected, our method performs somewhat poory for the uniform distribution with its bounded support. However, we are encouraged to note that, even for this di cut case, the estimated distribution does mosty spread out the mass over the correct [, 2] interva. 6 Theoretica ower bounds coming from the parameter vectors that make the estimated distributions as cose as possibe to the true distribution are ISE s of.877,.456,.397,.273,.269 for N =, 2, 4, 6, 8.

VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 53 Simuation Resuts: Uniform Distribution, N=4.2 density f(u).8.6.4.2.5.5 2 2.5 3 Mutipicative schoo effect u 25th percentie Median 75th percentie True Distribution Figure 5. : Actua vs. Estimated Distributions: 25 th, 5 th, and 75 th percentie fits in simuations