Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Similar documents
Statistical Inference Based on Extremum Estimators

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

11 THE GMM ESTIMATION

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Lecture 19: Convergence

An Introduction to Asymptotic Theory

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Sequences and Series of Functions

Lecture 33: Bootstrap

Solution to Chapter 2 Analytical Exercises

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Topic 9: Sampling Distributions of Estimators

LECTURE 8: ASYMPTOTICS I

Asymptotic Results for the Linear Regression Model

Machine Learning Brett Bernstein

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Efficient GMM LECTURE 12 GMM II

Kernel density estimator

Topic 9: Sampling Distributions of Estimators

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

7.1 Convergence of sequences of random variables

2.2. Central limit theorem.

Exponential Families and Bayesian Inference

Topic 9: Sampling Distributions of Estimators

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

of the matrix is =-85, so it is not positive definite. Thus, the first

Regression with an Evaporating Logarithmic Trend

Probability and Statistics

MA Advanced Econometrics: Properties of Least Squares Estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Rates of Convergence by Moduli of Continuity

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Quantile regression with multilayer perceptrons.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Distribution of Random Samples & Limit theorems

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

Random Variables, Sampling and Estimation

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

Fall 2013 MTH431/531 Real analysis Section Notes

Lecture 7: Properties of Random Samples

4. Partial Sums and the Central Limit Theorem

1 Covariance Estimation

Properties and Hypothesis Testing

32 estimating the cumulative distribution function

Lecture 20: Multivariate convergence and the Central Limit Theorem

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Rank tests and regression rank scores tests in measurement error models

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Notes 19 : Martingale CLT

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

SEMIPARAMETRIC SINGLE-INDEX MODELS. Joel L. Horowitz Department of Economics Northwestern University

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

The central limit theorem for Student s distribution. Problem Karim M. Abadir and Jan R. Magnus. Econometric Theory, 19, 1195 (2003)

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Lecture 15: Density estimation

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Unbiased Estimation. February 7-12, 2008

Learning Theory: Lecture Notes

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

This section is optional.

Math 61CM - Solutions to homework 3

Self-normalized deviation inequalities with application to t-statistic

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

Berry-Esseen bounds for self-normalized martingales

Machine Learning Brett Bernstein

Mathematical Statistics - MS

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Empirical Processes: Glivenko Cantelli Theorems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

The Growth of Functions. Theoretical Supplement

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

An Introduction to Randomized Algorithms

Algebra of Least Squares

Advanced Stochastic Processes.

MAS111 Convergence and Continuity

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Statistical Properties of OLS estimators

LAD ASYMPTOTICS UNDER CONDITIONAL HETEROSKEDASTICITY WITH POSSIBLY INFINITE ERROR DENSITIES. Jin Seo Cho, Chirok Han, and Peter C. B.

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

7.1 Convergence of sequences of random variables

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

The standard deviation of the mean

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Output Analysis and Run-Length Control

Singular Continuous Measures by Michael Pejic 5/14/10

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

STA Object Data Analysis - A List of Projects. January 18, 2018

Math Solutions to homework 6

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Transcription:

Notes O Media ad Quatile Regressio James L. Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Coditioal Media Restrictios ad Least Absolute Deviatios It is well-kow that the expected value of a radom variable Y miimizes the expected squared deviatio betwee Y ad a costat; that is, µ Y E[Y ] = argmie(y c) 2, c assumig E Y 2 is fiite. (I fact, it is oly ecessary to assume E Y is fiite, if the miimad is ormalized by subtractig Y 2, i.e., µ Y E[Y ] = argmie[(y c) 2 Y 2 ] c = argmi[c 2 2cE[Y ]], c ad this ormalizatio has o effect o the solutio if the stroger coditio holds.) It is less well-kow that a media of Y, defied as ay umber η Y for which Pr{Y η Y } 2 ad Pr{Y η Y } 2, miimizes a expected absolute deviatio criterio, η Y =argmi c E[ Y c Y ], though the solutio to this miimizatio problem eed ot be uique. Whe the c.d.f. F Y is strictly icreasig everywhere (i.e., Y is cotiuously distributed with positive desity), the uiqueess is ot a issue, ad η Y = F Y (/2). I this case, the first-order coditio for miimizatio of E[ Y c Y ] is for sg(u) the sig (or sigum ) fuctio here defied to be right-cotiuous. 0= E[sg(Y c)], sg(u) 2 {u <0},

Thus, just as least squares (LS) estimatio is the atural geeralizatio of the sample mea to estimatio of regressio coefficiets, least absolute deviatios (LAD) estimatio is the geeralizatio of the sample media to the liear regressio cotext. For the liear structural equatio y i = x 0 iβ 0 + ε i, if the error terms ε i are assumed to have (uique) coditioal media zero give the regressors x i,i.e. E[sg(ε i ) x i ] = 0, E[sg(ε i c) x i ] 6= 0 if c = c(x i ) 6= 0, the the true regressio coefficiets β 0 are idetified by β 0 =argmi β E[ y i x 0 iβ ε i ], ad, give a i.i.d. sample of size from this model, a atural sample aalogue to β 0 is ˆβ arg mi β arg mi β y i x 0 iβ S (β), a slightly-ostadard extremum estimator (because the miimad is ot twice cotiuously differetiable for all β. Cosistecy of LAD Demostratio of cosistecy of ˆβ is straightforward, because the LAD miimad S (β) is clearly cotiuous i β with probability oe; i fact, S (β) is covex i β, so cosistecy follows if S ca be show to coverge poitwise to a fuctio that is uiquely miimized at the true value β 0. (Typically we eed to show uiform covergece, but poitwise covergece of covex fuctios implies their uiform covergece o compact subsets.) To prove cosistecy, we eed to impose some coditios o the model; here are some coditios that will suffice: A. The data {(y i,x 0 i )0 } are idepedet ad idetically distributed across i; A2. The regressors have bouded secod momet, i.e., E[ x i 2 ] <. A3. The error terms ε i are cotiously distributed give x i, with coditioal desity f(ε x i ) satisfyig the coditioal media restrictio, i.e. Z 0 f(λ x i )dλ = 2. A4. The regressors ad error desity satisfy a local idetificatio coditio amely, the matrix is positive defiite. C E[f(0 x i )x i x 0 i] 2

Note that momets of y i or ε i eed ot exist uder these assumptios, which is why LAD estimatio is attractive for heavy-tailed error distributios. Coditio A4. combies a uique media assumptio (implied by positivity of the coditioal desity f(ε x i ) at ε =0) with the usual full-rak assumptio o the secod momets of the regressors. Imposig these coditios, the first step i the cosistecy proof is to calculate the probability limit of the miimad. To avoid assumig E[ y i ] <, it is coveiet to ormalize the miimad S (β) by subtractig off its value at the true parameter β 0, which clearly does ot affect the miimizig value ˆβ. That is, where ˆβ = argmis (β) S (β 0 ) β = argmi [ y i x 0 β iβ y i x 0 iβ 0 ] = argmi [ ε i x 0 β iδ ε i ], δ β β 0. But sice x i δ ε i x 0 iδ ε i x i δ by the triagle ad Cauchy-Schwarz iequalities, the ormalized miimad is a sample average of i.i.d. radom variables with fiite first (ad eve secod) momets uder coditio A2, so by Khitchie s Law of Large Numbers, S (β) S (β 0 ) p S(δ) E[S (β) S (β 0 )] = E[ ε i x 0 iδ ε i ] = E (ε i x 0 iδ)sg{ε i x 0 iδ} ε i sg{ε i } = E (ε i x 0 iδ)(sg{ε i x 0 iδ} sg{ε i }) " Z # 0 = E 2 [λ (x 0 iδ)]f(λ x i )dλ x 0 i δ, where the secod-to-last equality uses the fact that E[(x 0 i δ)sg{ε i}] =E[E[(x 0 i δ)sg{ε i} x i ]] = 0. (The itegral i the last equality is well-defied for both positive ad egative values of x 0 iδ, uder the stadard covetio R b a df = R a b df.) By ispectio, the limit S(δ) equals zero at δ = β β 0 =0, ad is o-egative otherwise (sice the sig of the itegrad is the same as the sig of the lower limit x 0 i δ). Furthermore, sice S (β) S (β 0 ) is covex for all, so is its probability limit S(β β 0 );thus,ifβ = β 0 is a uique local miimizer, it is also a global miimizer, implyig cosistecy of ˆβ. But by Leibitz rule, S(δ) δ S(0) δ = 2E[x i = 0, Z 0 x 0 i δ f(λ x i )dλ], 3

ad 2 S(δ) δ δ 0 = 2E[x i x 0 i f(x 0 iδ x i )], 2 S(0) δ δ 0 = 2E[x i x 0 i f(0 x i )] 2C, which is positive defiite by coditio A4. So δ =0=β β 0 is ideed a uique local (ad global) miimizer of S(δ) = S(β β 0 ), ad thus ˆβ p β 0. To geeralize this cosistecy result to oliear media regressio models y i = g(x i, β 0 )+ε i, 0 = E[sg{ε i } x i ], the regularity coditios would have to be stregtheed, sice covexity of the correspodig LAD miimad P i y i g(x i, β) i β is o loger assured. Stadard coditios would iclude the assumptio that the LAD criterio is miimized over a compact parameter space B (ad ot over all of R p ), ad a uiform Lipschitz cotiuity coditio o the media regressio fuctio g(x i, β) would typically be imposed, with the Lipschitz term assumed to have fiite momets. Fially, the idetificatio coditio A4 would have to be stregtheed to a global idetificatio coditio, such as: A4. For some τ > 0, the coditioal desity f(λ x i ) > τ if λ < τ, ad Pr{ g(x i, β) g(x i, β 0 ) τ} > 0 if β 6= β 0. Asymptotic Normality of LAD Returig to the liear LAD estimator, while demostratio of cosistecy of ˆβ ivolves routie applicatio of asymptotic argumets for extremum estimators, demostratio of -cosistecy ad asymptotic ormality is complicated by the factthattheladcriterio S(β) is ot cotiuously differetiable i β. For compariso, cosider the stadard theory for extremum estimators, where the estimator ˆθ is defied to miimize (or maximize) a twice-differetiable criterio, e.g., ˆθ =argmi θ Θ ρ(z i, θ), ad (for large ) to satisfy a first-order coditio for a iterior miimum, 0 = ρ(z i, ˆθ) θ ψ i (ˆθ), assumig cosistecy of ˆθ for a (iterior) parameter θ 0 has bee established. The true value θ 0 satisfies the correspodig populatio first-order coditio 0=E[ψ i (θ 0 )]; 4

derivatio of the asymptotic distributio of ˆθ is based upo a Taylor s series expasio of the sample first-order coditio for ˆθ aroud ˆθ = θ 0 : 0 ψ i (ˆθ) (*) ψ i (θ 0 )+ " hich is solved for ˆθ to yield the asymptotic liearity expressio ˆθ = θ 0 + H0 # ψ i (θ 0 ) θ 0 (ˆθ θ 0 )+o p ( ˆθ θ 0 ), µ ψ i (θ 0 )+o p, where ψi (θ 0 ) H 0 E θ 0 2 ρ(z i, θ 0 ) = E θ θ 0 is mius oe times the expected Hessia of the origial miimad. From the liearity expressio, it follows that (ˆθ θ0 ) d N (0,H0 V 0H0 ), where V 0 is the asymptotic covariace matrix of the sample average of ψ i (θ 0 ), i.e., ψ i (θ 0 ) d N (0,V 0 ), which is established by appeal to a suitable cetral limit theorem. I the LAD case (where θ β), the criterio fuctio ρ(z i, θ) = y i x 0 iβ is ot cotiuously differetiable at values of β for which y i subgradiet = x 0 iβ; furthermore, the (discotiuous) ρ(z i, θ) θ = sg{y i x 0 iβ}x i itself has a derivative that is idetically zero wherever it is defied. Thus the Taylor s expasio (*) is ot applicable to this problem, eve though a approximate first-order coditio sg{y i x 0 iˆβ}x i = o p µ ca be established for this problem. This coditio ca be show to hold by showig that each elemet of the subgradiet of the LAD criterio, whe evaluated at the miimizig value ˆβ, is bouded i magitude 5

by the differece betwee the right ad left derivatives of the criterio, so that sg{y i x 0 iˆβ}x {y i i = x 0 iˆβ}x i " # {y i = x 0 iˆβ} max i µ = K o p, where K dim{β} ad E[ x i 2 ] < by A2. Though the subgradiet for the LAD miimizatio is ot differetiable, its expected value ρ(zi, θ) E = E sg{y i x 0 θ iβ}x i = E[sg{ε i x 0 iδ}x i ] "Ã Z! # x 0 i δ = 2E f(λ x i )dλ x i is differetiable i δ = β β 0. The Taylor s series expasio would thus be applicable if the order of the expectatio (over y i ad x i ) ad differetiatio (over θ) could somehow be iterchaged. To do this rigorously, a stochastic equicotiuity coditio o the sample average momet fuctio Ψ (β) 0 sg{y i x 0 iβ}x i must be established; specifically, the stochastic equicotiuity coditio is that, for ay ˆβ p β, h Ψ (ˆβ) Ψ (β 0 ) E Ψ (β) Ψ (β 0 ) β=ˆβi p 0, or, writte alteratively, h Ψ (β) E[ Ψ (β)] β=ˆβ Ψ (β 0 ) E[ Ψ (β 0 )] i p 0. (**) x i Ituitively, while we would expect the ormalized differece Ψ (β) E[ Ψ (β)] to have a limitig ormal distributio for each fixed value of β by a cetral limit theorem, the stochastic equicotiuity coditio specifies that the ormalized differece, evaluated at the cosistet estimator ˆβ, is asymptotically equivalet to its value evaluated at β 0 = plim ˆβ. Such a coditio ca be established for this LAD (ad related quatile regressio) problems usig empirical process theory; oce it has bee established, it ca be used to derive the asymptotic ormal distributio of ˆβ. Isertig the previous results that Ψ (ˆβ) µ sg{y i x 0 iˆβ}x i = o p ad E[ Ψ (β 0 )] E sg{y i x 0 iβ 0 }x i = E[sg{ε i }x i ] = 0 6

ito (**), it follows that h Ψ (β 0 ) E[ Ψ (β)] β=ˆβi p 0, ad a mea-value expasio of E[ Ψ (β)] β=ˆβ aroud ˆβ = β 0 yields ³ˆβ β0 = H0 Ψ (β 0 )+o p () = H0 sg{ε i }x i + o p (), where ow H 0 E[ Ψ (β)] β 0 β=β0 = E [sg{y i x 0 i β}x i] β 0 = 2E[f(0 x i )x i x 0 i] 2C, assumed positive defiite i A4 above. Applicatio of the Lideberg-Levy cetral limit theorem to Ψ (β 0 ) yields the asymptotic distributio of the LAD estimator ˆβ as β=β0 (ˆβ β0 ) d N (0, 4 C DC ), for ³ sg{yi D = E x 0 iβ}x i sg{yi x 0 0 iβ}x i ³ sg{yi = E x 0 iβ} 2 xi x 0 i = E[x i x 0 i]. I the special case where ε i is idepedet of x i, so the coditioal desity f(0 x i ) equals the margial desity f(0), the asymptotic distributio simplifies to (ˆβ β0 ) d N (0, [2f(0)] 2 D ). Alteratively, for the oliear media regressio model y i = g(x i, β 0 )+ε i, 0 = E[sg{ε i } x i ], the relevat matrices C ad D would be defied as C g(xi, β E 0 ) β D E g(x i, β 0 ) β 0, f(0 x i ) g(x i, β 0 ) β which reduce to the previous defiitios whe g(x i, β) x 0 i β. g(x i, β 0 ) β 0, 7

Asymptotic Covariace Matrix Estimatio To use the asymptotic ormality of ˆβ to do the usual large-sample iferece o β 0, cosistet estimators of the matrices C E[f(0 x i )x i x 0 i ] ad D E[x ix 0 i ] must be costructed. The latter is easy; clearly ˆD p D. x i x 0 i Cosistetly estimatig the matrix C is trickier, due to the presece of the ukow coditioal desity fuctio f(0 x i ); while the error desity might be parametrized, ad its (fiite-dimesioal) parameter vector cosistetly estimated usig stadard methods, this would ru couter to the spirit of the LAD theory so far, which does ot rely upo a parametric form for the error terms. A alterative, oparametric estimatio strategy ca be based upo kerel estimatio methods for desity fuctios. A specific form for a estimator is Ĉ h i h { y i x 0 iˆβ h/2} x i x 0 i, where h = h is a user-chose badwidth term that is assumed to ted to zero as the sample size icreases. The term h { u h/2} (which is evaluated at u = y i x 0 iˆβ) is essetially a umerical derivative of the fuctio sg{u}, based upo the small perturbatio h. It ca be show that Ĉ p C as, provided that h = h 0 i such a way that h, usig the sorts of mea ad variace calculatios used to demostrate cosistecy of the stadard kerel desity estimator. A geeralizatio of this estimator would be " Ã!# Ĉ h y i x 0 K iˆβ x i x 0 h i, where the kerel fuctio K( ) satisfies Z K(u)du = (for example, K(u) could be a desity fuctio for a cotiuous radom variable). The estimator Ĉ is a special case with K(u) ={ u /2}, the desity for a uiform radom variable o the iterval [ /2, /2]. Ad both the estimators for C ad D ca be exteded to the oliear media regressio model by replacig the terms x i x 0 i with the more geeral [ g(x i, ˆβ)/ β] h ˆβ)/ βi 0 g(x i,. Quatile Regressio 8