Notes O Media ad Quatile Regressio James L. Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Coditioal Media Restrictios ad Least Absolute Deviatios It is well-kow that the expected value of a radom variable Y miimizes the expected squared deviatio betwee Y ad a costat; that is, µ Y E[Y ] = argmie(y c) 2, c assumig E Y 2 is fiite. (I fact, it is oly ecessary to assume E Y is fiite, if the miimad is ormalized by subtractig Y 2, i.e., µ Y E[Y ] = argmie[(y c) 2 Y 2 ] c = argmi[c 2 2cE[Y ]], c ad this ormalizatio has o effect o the solutio if the stroger coditio holds.) It is less well-kow that a media of Y, defied as ay umber η Y for which Pr{Y η Y } 2 ad Pr{Y η Y } 2, miimizes a expected absolute deviatio criterio, η Y =argmi c E[ Y c Y ], though the solutio to this miimizatio problem eed ot be uique. Whe the c.d.f. F Y is strictly icreasig everywhere (i.e., Y is cotiuously distributed with positive desity), the uiqueess is ot a issue, ad η Y = F Y (/2). I this case, the first-order coditio for miimizatio of E[ Y c Y ] is for sg(u) the sig (or sigum ) fuctio here defied to be right-cotiuous. 0= E[sg(Y c)], sg(u) 2 {u <0},
Thus, just as least squares (LS) estimatio is the atural geeralizatio of the sample mea to estimatio of regressio coefficiets, least absolute deviatios (LAD) estimatio is the geeralizatio of the sample media to the liear regressio cotext. For the liear structural equatio y i = x 0 iβ 0 + ε i, if the error terms ε i are assumed to have (uique) coditioal media zero give the regressors x i,i.e. E[sg(ε i ) x i ] = 0, E[sg(ε i c) x i ] 6= 0 if c = c(x i ) 6= 0, the the true regressio coefficiets β 0 are idetified by β 0 =argmi β E[ y i x 0 iβ ε i ], ad, give a i.i.d. sample of size from this model, a atural sample aalogue to β 0 is ˆβ arg mi β arg mi β y i x 0 iβ S (β), a slightly-ostadard extremum estimator (because the miimad is ot twice cotiuously differetiable for all β. Cosistecy of LAD Demostratio of cosistecy of ˆβ is straightforward, because the LAD miimad S (β) is clearly cotiuous i β with probability oe; i fact, S (β) is covex i β, so cosistecy follows if S ca be show to coverge poitwise to a fuctio that is uiquely miimized at the true value β 0. (Typically we eed to show uiform covergece, but poitwise covergece of covex fuctios implies their uiform covergece o compact subsets.) To prove cosistecy, we eed to impose some coditios o the model; here are some coditios that will suffice: A. The data {(y i,x 0 i )0 } are idepedet ad idetically distributed across i; A2. The regressors have bouded secod momet, i.e., E[ x i 2 ] <. A3. The error terms ε i are cotiously distributed give x i, with coditioal desity f(ε x i ) satisfyig the coditioal media restrictio, i.e. Z 0 f(λ x i )dλ = 2. A4. The regressors ad error desity satisfy a local idetificatio coditio amely, the matrix is positive defiite. C E[f(0 x i )x i x 0 i] 2
Note that momets of y i or ε i eed ot exist uder these assumptios, which is why LAD estimatio is attractive for heavy-tailed error distributios. Coditio A4. combies a uique media assumptio (implied by positivity of the coditioal desity f(ε x i ) at ε =0) with the usual full-rak assumptio o the secod momets of the regressors. Imposig these coditios, the first step i the cosistecy proof is to calculate the probability limit of the miimad. To avoid assumig E[ y i ] <, it is coveiet to ormalize the miimad S (β) by subtractig off its value at the true parameter β 0, which clearly does ot affect the miimizig value ˆβ. That is, where ˆβ = argmis (β) S (β 0 ) β = argmi [ y i x 0 β iβ y i x 0 iβ 0 ] = argmi [ ε i x 0 β iδ ε i ], δ β β 0. But sice x i δ ε i x 0 iδ ε i x i δ by the triagle ad Cauchy-Schwarz iequalities, the ormalized miimad is a sample average of i.i.d. radom variables with fiite first (ad eve secod) momets uder coditio A2, so by Khitchie s Law of Large Numbers, S (β) S (β 0 ) p S(δ) E[S (β) S (β 0 )] = E[ ε i x 0 iδ ε i ] = E (ε i x 0 iδ)sg{ε i x 0 iδ} ε i sg{ε i } = E (ε i x 0 iδ)(sg{ε i x 0 iδ} sg{ε i }) " Z # 0 = E 2 [λ (x 0 iδ)]f(λ x i )dλ x 0 i δ, where the secod-to-last equality uses the fact that E[(x 0 i δ)sg{ε i}] =E[E[(x 0 i δ)sg{ε i} x i ]] = 0. (The itegral i the last equality is well-defied for both positive ad egative values of x 0 iδ, uder the stadard covetio R b a df = R a b df.) By ispectio, the limit S(δ) equals zero at δ = β β 0 =0, ad is o-egative otherwise (sice the sig of the itegrad is the same as the sig of the lower limit x 0 i δ). Furthermore, sice S (β) S (β 0 ) is covex for all, so is its probability limit S(β β 0 );thus,ifβ = β 0 is a uique local miimizer, it is also a global miimizer, implyig cosistecy of ˆβ. But by Leibitz rule, S(δ) δ S(0) δ = 2E[x i = 0, Z 0 x 0 i δ f(λ x i )dλ], 3
ad 2 S(δ) δ δ 0 = 2E[x i x 0 i f(x 0 iδ x i )], 2 S(0) δ δ 0 = 2E[x i x 0 i f(0 x i )] 2C, which is positive defiite by coditio A4. So δ =0=β β 0 is ideed a uique local (ad global) miimizer of S(δ) = S(β β 0 ), ad thus ˆβ p β 0. To geeralize this cosistecy result to oliear media regressio models y i = g(x i, β 0 )+ε i, 0 = E[sg{ε i } x i ], the regularity coditios would have to be stregtheed, sice covexity of the correspodig LAD miimad P i y i g(x i, β) i β is o loger assured. Stadard coditios would iclude the assumptio that the LAD criterio is miimized over a compact parameter space B (ad ot over all of R p ), ad a uiform Lipschitz cotiuity coditio o the media regressio fuctio g(x i, β) would typically be imposed, with the Lipschitz term assumed to have fiite momets. Fially, the idetificatio coditio A4 would have to be stregtheed to a global idetificatio coditio, such as: A4. For some τ > 0, the coditioal desity f(λ x i ) > τ if λ < τ, ad Pr{ g(x i, β) g(x i, β 0 ) τ} > 0 if β 6= β 0. Asymptotic Normality of LAD Returig to the liear LAD estimator, while demostratio of cosistecy of ˆβ ivolves routie applicatio of asymptotic argumets for extremum estimators, demostratio of -cosistecy ad asymptotic ormality is complicated by the factthattheladcriterio S(β) is ot cotiuously differetiable i β. For compariso, cosider the stadard theory for extremum estimators, where the estimator ˆθ is defied to miimize (or maximize) a twice-differetiable criterio, e.g., ˆθ =argmi θ Θ ρ(z i, θ), ad (for large ) to satisfy a first-order coditio for a iterior miimum, 0 = ρ(z i, ˆθ) θ ψ i (ˆθ), assumig cosistecy of ˆθ for a (iterior) parameter θ 0 has bee established. The true value θ 0 satisfies the correspodig populatio first-order coditio 0=E[ψ i (θ 0 )]; 4
derivatio of the asymptotic distributio of ˆθ is based upo a Taylor s series expasio of the sample first-order coditio for ˆθ aroud ˆθ = θ 0 : 0 ψ i (ˆθ) (*) ψ i (θ 0 )+ " hich is solved for ˆθ to yield the asymptotic liearity expressio ˆθ = θ 0 + H0 # ψ i (θ 0 ) θ 0 (ˆθ θ 0 )+o p ( ˆθ θ 0 ), µ ψ i (θ 0 )+o p, where ψi (θ 0 ) H 0 E θ 0 2 ρ(z i, θ 0 ) = E θ θ 0 is mius oe times the expected Hessia of the origial miimad. From the liearity expressio, it follows that (ˆθ θ0 ) d N (0,H0 V 0H0 ), where V 0 is the asymptotic covariace matrix of the sample average of ψ i (θ 0 ), i.e., ψ i (θ 0 ) d N (0,V 0 ), which is established by appeal to a suitable cetral limit theorem. I the LAD case (where θ β), the criterio fuctio ρ(z i, θ) = y i x 0 iβ is ot cotiuously differetiable at values of β for which y i subgradiet = x 0 iβ; furthermore, the (discotiuous) ρ(z i, θ) θ = sg{y i x 0 iβ}x i itself has a derivative that is idetically zero wherever it is defied. Thus the Taylor s expasio (*) is ot applicable to this problem, eve though a approximate first-order coditio sg{y i x 0 iˆβ}x i = o p µ ca be established for this problem. This coditio ca be show to hold by showig that each elemet of the subgradiet of the LAD criterio, whe evaluated at the miimizig value ˆβ, is bouded i magitude 5
by the differece betwee the right ad left derivatives of the criterio, so that sg{y i x 0 iˆβ}x {y i i = x 0 iˆβ}x i " # {y i = x 0 iˆβ} max i µ = K o p, where K dim{β} ad E[ x i 2 ] < by A2. Though the subgradiet for the LAD miimizatio is ot differetiable, its expected value ρ(zi, θ) E = E sg{y i x 0 θ iβ}x i = E[sg{ε i x 0 iδ}x i ] "Ã Z! # x 0 i δ = 2E f(λ x i )dλ x i is differetiable i δ = β β 0. The Taylor s series expasio would thus be applicable if the order of the expectatio (over y i ad x i ) ad differetiatio (over θ) could somehow be iterchaged. To do this rigorously, a stochastic equicotiuity coditio o the sample average momet fuctio Ψ (β) 0 sg{y i x 0 iβ}x i must be established; specifically, the stochastic equicotiuity coditio is that, for ay ˆβ p β, h Ψ (ˆβ) Ψ (β 0 ) E Ψ (β) Ψ (β 0 ) β=ˆβi p 0, or, writte alteratively, h Ψ (β) E[ Ψ (β)] β=ˆβ Ψ (β 0 ) E[ Ψ (β 0 )] i p 0. (**) x i Ituitively, while we would expect the ormalized differece Ψ (β) E[ Ψ (β)] to have a limitig ormal distributio for each fixed value of β by a cetral limit theorem, the stochastic equicotiuity coditio specifies that the ormalized differece, evaluated at the cosistet estimator ˆβ, is asymptotically equivalet to its value evaluated at β 0 = plim ˆβ. Such a coditio ca be established for this LAD (ad related quatile regressio) problems usig empirical process theory; oce it has bee established, it ca be used to derive the asymptotic ormal distributio of ˆβ. Isertig the previous results that Ψ (ˆβ) µ sg{y i x 0 iˆβ}x i = o p ad E[ Ψ (β 0 )] E sg{y i x 0 iβ 0 }x i = E[sg{ε i }x i ] = 0 6
ito (**), it follows that h Ψ (β 0 ) E[ Ψ (β)] β=ˆβi p 0, ad a mea-value expasio of E[ Ψ (β)] β=ˆβ aroud ˆβ = β 0 yields ³ˆβ β0 = H0 Ψ (β 0 )+o p () = H0 sg{ε i }x i + o p (), where ow H 0 E[ Ψ (β)] β 0 β=β0 = E [sg{y i x 0 i β}x i] β 0 = 2E[f(0 x i )x i x 0 i] 2C, assumed positive defiite i A4 above. Applicatio of the Lideberg-Levy cetral limit theorem to Ψ (β 0 ) yields the asymptotic distributio of the LAD estimator ˆβ as β=β0 (ˆβ β0 ) d N (0, 4 C DC ), for ³ sg{yi D = E x 0 iβ}x i sg{yi x 0 0 iβ}x i ³ sg{yi = E x 0 iβ} 2 xi x 0 i = E[x i x 0 i]. I the special case where ε i is idepedet of x i, so the coditioal desity f(0 x i ) equals the margial desity f(0), the asymptotic distributio simplifies to (ˆβ β0 ) d N (0, [2f(0)] 2 D ). Alteratively, for the oliear media regressio model y i = g(x i, β 0 )+ε i, 0 = E[sg{ε i } x i ], the relevat matrices C ad D would be defied as C g(xi, β E 0 ) β D E g(x i, β 0 ) β 0, f(0 x i ) g(x i, β 0 ) β which reduce to the previous defiitios whe g(x i, β) x 0 i β. g(x i, β 0 ) β 0, 7
Asymptotic Covariace Matrix Estimatio To use the asymptotic ormality of ˆβ to do the usual large-sample iferece o β 0, cosistet estimators of the matrices C E[f(0 x i )x i x 0 i ] ad D E[x ix 0 i ] must be costructed. The latter is easy; clearly ˆD p D. x i x 0 i Cosistetly estimatig the matrix C is trickier, due to the presece of the ukow coditioal desity fuctio f(0 x i ); while the error desity might be parametrized, ad its (fiite-dimesioal) parameter vector cosistetly estimated usig stadard methods, this would ru couter to the spirit of the LAD theory so far, which does ot rely upo a parametric form for the error terms. A alterative, oparametric estimatio strategy ca be based upo kerel estimatio methods for desity fuctios. A specific form for a estimator is Ĉ h i h { y i x 0 iˆβ h/2} x i x 0 i, where h = h is a user-chose badwidth term that is assumed to ted to zero as the sample size icreases. The term h { u h/2} (which is evaluated at u = y i x 0 iˆβ) is essetially a umerical derivative of the fuctio sg{u}, based upo the small perturbatio h. It ca be show that Ĉ p C as, provided that h = h 0 i such a way that h, usig the sorts of mea ad variace calculatios used to demostrate cosistecy of the stadard kerel desity estimator. A geeralizatio of this estimator would be " Ã!# Ĉ h y i x 0 K iˆβ x i x 0 h i, where the kerel fuctio K( ) satisfies Z K(u)du = (for example, K(u) could be a desity fuctio for a cotiuous radom variable). The estimator Ĉ is a special case with K(u) ={ u /2}, the desity for a uiform radom variable o the iterval [ /2, /2]. Ad both the estimators for C ad D ca be exteded to the oliear media regressio model by replacig the terms x i x 0 i with the more geeral [ g(x i, ˆβ)/ β] h ˆβ)/ βi 0 g(x i,. Quatile Regressio 8