Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Similar documents
What s New in Econometrics? Lecture 14 Quantile Methods

New Developments in Econometrics Lecture 16: Quantile Estimation

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

Chapter 8. Quantile Regression and Quantile Treatment Effects

Quantile Processes for Semi and Nonparametric Regression

Flexible Estimation of Treatment Effect Parameters

Binary Models with Endogenous Explanatory Variables

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Working Paper No Maximum score type estimators

Lecture 2: Consistency of M-estimators

Econ 273B Advanced Econometrics Spring

Syllabus. By Joan Llull. Microeconometrics. IDEA PhD Program. Fall Chapter 1: Introduction and a Brief Review of Relevant Tools

Generalized Method of Moments Estimation

Truncated Regression Model and Nonparametric Estimation for Gifted and Talented Education Program

Econometric Analysis of Cross Section and Panel Data

Unconditional Quantile Regression with Endogenous Regressors

Discrete Dependent Variable Models

Tobit and Interval Censored Regression Model

A Note on Implementing Box-Cox Quantile Regression

Asymmetric least squares estimation and testing

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Comments on: Panel Data Analysis Advantages and Challenges. Manuel Arellano CEMFI, Madrid November 2006

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

Generated Covariates in Nonparametric Estimation: A Short Review.

IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION

A Local Generalized Method of Moments Estimator

The relationship between treatment parameters within a latent variable framework

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

3 Joint Distributions 71

Identification and Estimation of Marginal Effects in Nonlinear Panel Models 1

Closest Moment Estimation under General Conditions

Lecture 9: Quantile Methods 2

The properties of L p -GMM estimators

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press.

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Ultra High Dimensional Variable Selection with Endogenous Variables

Quantile and Average Effects in Nonseparable Panel Models 1

Tobit and Selection Models

Informational Content in Static and Dynamic Discrete Response Panel Data Models

A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. By Tiemen Woutersen. Draft, September

Quantile regression and heteroskedasticity

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

On IV estimation of the dynamic binary panel data model with fixed effects

Conditions for the Existence of Control Functions in Nonseparable Simultaneous Equations Models 1

Statistics: Learning models from data

Locally Robust Semiparametric Estimation

Multiscale Adaptive Inference on Conditional Moment Inequalities

APEC 8212: Econometric Analysis II

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

The Numerical Delta Method and Bootstrap

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

What s New in Econometrics. Lecture 13

Closest Moment Estimation under General Conditions

Sources of Inequality: Additive Decomposition of the Gini Coefficient.

Course Description. Course Requirements

ESTIMATION OF NONPARAMETRIC MODELS WITH SIMULTANEITY

Issues on quantile autoregression

A Resampling Method on Pivotal Estimating Functions

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Expecting the Unexpected: Uniform Quantile Regression Bands with an application to Investor Sentiments

Discussion Paper Series

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

1. GENERAL DESCRIPTION

Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure

Identification in Nonparametric Limited Dependent Variable Models with Simultaneity and Unobserved Heterogeneity

A Bootstrap Test for Conditional Symmetry

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

An Encompassing Test for Non-Nested Quantile Regression Models

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

The Influence Function of Semiparametric Estimators

Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( )

Intermediate Econometrics

Robust covariance estimation for quantile regression

QUANTILE REGRESSION WITH CENSORING AND ENDOGENEITY

Estimation of Treatment Effects under Essential Heterogeneity

Unconditional Quantile Regressions

On the L p -quantiles and the Student t distribution

Department of Economics. Working Papers

Simple Estimators for Monotone Index Models

Recovering Copulae from Conditional Quantiles

Generalized quantiles as risk measures

Statistical Properties of Numerical Derivatives

A Goodness-of-fit Test for Copulas

Introduction to Statistical Analysis

Partial Identification of Nonseparable Models using Binary Instruments

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

11. Bootstrap Methods

Nonparametric Identication of a Binary Random Factor in Cross Section Data and

Birkbeck Working Papers in Economics & Finance

Nonparametric Methods

ECO Class 6 Nonparametric Econometrics

Simple Estimators for Semiparametric Multinomial Choice Models

ECO 2403 TOPICS IN ECONOMETRICS

Improving linear quantile regression for

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Bayesian Interpretations of Heteroskedastic Consistent Covariance. Estimators Using the Informed Bayesian Bootstrap

Transcription:

Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F (r) τ}. F 1 (τ) is a generalized inverse function. support of F and hence often unbounded. It is a left-continuous function with range equal to the A simple example Suppose that Y is discrete with pmf Pr (Y = s) =0.2 for s {1, 2, 3, 4, 5}. For τ =0.25, 0.5, 0.75, wehave {r : F (r) 0.25} = {r : r 2} q 0.25 = F 1 (0.25) = 2 {r : F (r) 0.50} = {r : r 3} q 0.5 = F 1 (0.50) = 3 {r : F (r) 0.75} = {r : r 4} q 0.75 = F 1 (0.75) = 4. Asymmetric absolute loss function). For τ (0, 1) Let us define the check function (or asymmetric absolute loss ρ τ (u) =[τ1 (u 0) + (1 τ) 1 (u <0)] u =[τ 1 (u <0)] u. Note that ρ τ (u) is a continuous piecewise linear function, but nondifferentiable at u =0. We should think of u as an individual error u = y r and ρ τ (u) as the loss associated with u. 1 Using ρ τ (u) as a specification of loss, it is well known that q τ minimizes expected loss: s 0 (r) E [ρ τ (Y r)] = τ Z r (y r) df (y) (1 τ) Z r (y r) df (y). Any element of {r : F (r) =τ} minimizes expected loss. If the solution is unique, it coincides with q τ as definedabove. Ifnot,wehaveanintervalofτth quantiles and the smallest element is chosen so that the quantile function is left-continuous (by convention). In decision theory the situation is as follows: we need a predictor or point estimate for a random variable with posterior cdf F. It turns out that the τth quantile is the optimal predictor that minimizes expected loss when loss is described by the τth check function. 1 An alternative shorter notation is ρ τ (u) =τu + +(1 τ) u where u + = 1 (u 0) u and u = 1 (u <0) u. 1

Equivariance of quantiles under monotone transformations This is an interesting property of quantiles not shared by expectations. Let g (.) be a nondecreasing function. Then, for any random variable Y Q τ [g (Y )] = g [Q τ (Y )]. Thus, the quantiles of g (Y ) coincide with the transformed quantiles of Y. To see this note that Pr [Y Q τ (Y )] = τ Pr (g (Y ) g [Q τ (Y )]) = τ. Sample quantiles by the empirical cdf: Given a random sample {Y 1,...,Y N } we obtain sample quantiles replacing F F N (r) = 1 N 1(Y i r). That is, we choose bq τ = FN 1 (τ) inf {r : F N (r) τ}, which minimizes Z s N (r) = ρ τ (y r) df N (y) = 1 N ρ τ (Y i r). (1) An important advantage of expressing the calculation of sample quantiles as an optimization problem, as opposed to a problem of ordering the observations, is computational (specially in the regression context). The optimization perspective is also useful for studying statistical properties. Linear program representation An alternative presentation of the minimization of (1) is min r,u + i,u i subject to 2 τu + i +(1 τ) u i Y i r = u + i u i, u+ i 0,u i 0, (i =1,...,N) ª 2N where here u + i,u i denote 2N artificial additional arguments, which allow us to represent the original problem in the form of a linear program. A linear program takes the form: 3 min x c 0 x subject to Ax b, x 0. The simplex algorithm for numerical solution of this problem was created by George Dantzig in 1947. 2 Note that u + u = 1 (u 0) u 1 (u <0) u = 1 (u 0) u + 1 (u <0) u = u. 3 See Koenker, 2005, section 6.1, for an introduction oriented to quantiles. 2

Nonsmoothness in sample but smoothness in population The sample objective function s N (r) is continuous but not differentiable for all r. Moreover, the gradient or moment condition b N (r) = 1 N [1(Y i r) τ] is not continuous in r. Note that if each Y i is distinct, so that we can reorder the observations to satisfy Y 1 <Y 2 <...<Y N, for all τ we have b N (bq τ ) F N (bq τ ) τ 1 N. Despite lack of smoothness in s N (r) or b N (r), smoothness of the distribution of the data can smooth their population counterparts. Suppose that F is differentiable at q τ with positive derivative f (q τ ),thens 0 (r) is twice continuously differentiable with derivatives: 4 d dr E [ρ τ (Y r)] = τ [1 F (r)] + (1 τ) F (r) =F (r) τ E [1(Y r) τ] d 2 dr 2 E [ρ τ (Y r)] = f (r). Consistency Consistency of sample quantiles follows from the theorem by Newey and McFadden (1994) that we discussed in a previous class note. This theorem relies on continuity of the limiting objective function and uniform convergence. The quantile sample objective function s N (r) is continuous and convex in r. Suppose that F is such that s 0 (r) is uniquely maximized at q τ.bythelawof large numbers s N (r) converges pointwise to s 0 (r). Then use the fact that pointwise convergence of convex functions implies uniform convergence on compact sets. 5 Asymptotic normality The asymptotic normality of sample quantiles cannot be established in the standard way because of the nondifferentiability of the objective function. However, it has long been known that under suitable conditions sample quantiles are asymptotically normal and there are direct approaches to establish the result. 6 Here we just re-state the asymptotic normality result 4 The required derivatives are: d r (y r) f (y) dy = d r yf (y) dy d [rf (r)] = rf (r) [F (r)+rf (r)] = F (r) dr dr dr and d (y r) f (y) dy = d yf (y) dy d dr r dr r dr {r [1 F (r)]} = d r yf (y) dy yf (y) dy d {r [1 F (r)]} dr dr = rf (r) {[1 F (r)] rf (r)} = [1 F (r)]. 5 See Amemiya (1985, p. 150) for a proof of consistency of the median, and Koenker (2005, p. 117 119) for conditional and unconditional quantiles. 6 See for example the proofs in Cox and Hinkley (1974, p. 468) and Amemiya (1985, p. 148 150). 3

for unconditional quantiles following the discussion in the class note on nonsmooth GMM around Newey and McFadden s theorems. The general idea is that as long as the limiting objective function is differentiable the familiar approach for differentiable problems is possible if a stochastic equicontinuity assumption holds. Fix 0 < τ < 1. IfF is differentiable at q τ with positive derivative f (q τ ),then N (bqτ q τ )= 1 N N X Consequently, N (bqτ q τ ) d N µ 0, 1 (Y i q τ ) τ f (q τ ) τ (1 τ) [f (q τ )] 2. + o p (1). The term τ (1 τ) in the numerator of the asymptotic variance tends to make bq τ more precise in the tails, whereas the density term in the denominator tends to make bq τ less precise in regions of low density. Typically the latter effect will dominate so that quantiles closer to the extremes will be estimated with less precision. Computing standard errors Theasymptoticnormalityresultjustifies the large N approximation bf (bq τ ) p N (bqτ q τ ) N (0, 1) τ (1 τ) where b f (bq τ ) is a consistent estimator of f (q τ ). 7 Since F (r + h) F (r h) 1 f (r) =lim lim E [1( Y r h)], h 0 2h h 0 2h an obvious possibility is to use the histogram estimator bf (r) = F N (r + h N ) F N (r h N ) 2h N = 1 2Nh N = 1 2Nh N 1( Y i r h N ) for some h N > 0 sequence such that h N 0 as N. 8 Thus, bf (bq τ )= 1 2Nh N 1( Y i bq τ h N ). [1(Y i r + h N ) 1 (Y i r h N )] Other alternatives are kernel estimators for f (q τ ), the bootstrap, or directly obtain an approximate confidence interval using the normal approximation to the binomial distribution (Chamberlain, 1994; Koenker, 2005, p. 73). 7 Alternativelywecanusethedensityf Uτ (r) of the error U τ = Y q τ noting that f (q τ )=f Uτ (0). 8 Asufficient condition for consistency is Nh N. One possibility is h N = an 1/3 for some a>0. 4

2 Conditional quantiles Consider the conditional distribution of Y given X: Pr (Y r X) =F (r; X) and denote the τth quantile of Y given X as Q τ (Y X) q τ (X) F 1 (τ; X). Now quantiles minimize expected asymmetric absolute loss in a conditional sense: q τ (X) =argmin c E [ρ τ (Y c) X]. Suppose that q τ (X) satisfies a parametric model q τ (X) =g (X, β τ ),then β τ =argmaxe [ρ τ (Y g (X, b))]. b Also, since in general Pr (Y q τ (X) X) =τ or E [1 (Y q τ (X)) τ X] =0, it turns out that β τ solves moment conditions of the form E {h (X)[1(Y g (X, β τ )) τ]} =0. Conditional quantiles in a location-scale model The standardized variable in a locationscale model of Y X has a distribution that is independent of X. Namely, letting E (Y X) =μ (X) and Var(Y X) =σ 2 (X), thevariable V = Y μ (X) σ (X) is distributed independently of X according to some cdf G. Thus, in a location scale model all dependence of Y on X occurs through mean translations and variance re-scaling. An example is the classical normal regression model: Y X N X 0 β, σ 2. In the location-scale model: µ Y μ (X) Pr (Y r X) =Pr σ (X) r μ (X) σ (X) µ r μ (X) X = G σ (X) and µ Qτ (Y X) μ (X) G = τ σ (X) 5

or Q τ (Y X) =μ (X)+σ (X) G 1 (τ) so that Q τ (Y X) = μ(x) σ (X) + G 1 (τ). X j X j X j Under homoskedasticity, Q τ (Y X) / X j is the same at all quantiles since they only differ by a constant term. More generally, in a location-scale model the relative change between two quantiles ln [Q τ 1 (Y X) Q τ 2 (Y X)] / X j is the same for any pair (τ 1, τ 2 ). These assumptions have been found to be too restrictive in studies of the distribution of individual earnings conditioned on education and labor market experience. In the classical normal regression model Q τ (Y X) =X 0 β + σφ 1 (τ). 3 Quantile regression A linear regression is an optimal linear predictor that minimizes average quadratic loss. Given data {Y i,x i } N OLS sample coefficients are given by bβ OLS =argmin b Yi Xib 0 2. If E (Y X) is linear it coincides with the least squares population predictor, so that β b OLS consistently estimates E (Y X) / X. As is well known the median may be preferable to the mean if the distribution is long-tailed. The median lacks the sensitivity to extreme values of the mean and may represent the position of an asymmetric distribution better than the mean. For similar reasons in the regression context one may be interested in median regression. That is, an optimal predictor that minimizes average absolute loss: bβ LAD =argmin b Yi Xib 0. If med (Y X) is linear it coincides with the least absolute deviation (LAD) population predictor, so that β b LAD consistently estimates med (Y X) / X. The idea can be generalized to quantiles other than τ = 0.5 by considering optimal predictors that minimize average asymmetric absolute loss: bβ (τ) =argmin b ρ τ Yi Xib 0. As before if Q τ (Y X) is linear b β (τ) consistently estimates Q τ (Y X) / X. Clearly, b β LAD = b β (0.5). 6

Structural representation Define U such that F (Y ; X) =U. It turns out that U is uniformly distributed independently of X between 0 and 1. 9 Also Y = F 1 (U; X) with U X U (0, 1). This is sometimes called the Skorohod representation. For example, the Skorohod representation of the Gaussian linear regression model is Y = X 0 β + σv with V = Φ 1 (U), sothatv X N (0, 1). Linear quantile model the linear quantile regression A semiparametric alternative to the normal linear regression model is Y = X 0 β (U) U X U (0, 1). where β (u) is a nonparametric function, such that x 0 β (u) is strictly increasing in u for each value of x in the support of X. Thus, it is a semiparametric one-factor random coefficients model. This model nests linear regression as a special case and allows for interactions between observable and unobservable determinants of Y. Partitioning β (U) into intercept and slope components β (U) = β0 (U), β 1 (U) 0 0, the normal linear regression arises as a particular case of the linear quantile model with β 1 (U) =β 1 and β 0 (U) =β 0 + σφ 1 (U). The error U can be understood as the rank of a particular unit in the population. For example of ability, in a situation where Y denotes log earnings and X contains education and labor market experience. The practical usefulness of this model is that for given τ (0, 1) estimation of β (τ) can be easily obtained as the τ-th quantile linear regression coefficient, since Q τ (Y X) =X 0 β (τ). 4 Asymptotic inference for quantile regression Using our earlier results, the first and second derivatives of the limiting objective function can be obtained as b E ρ τ Y X 0 b = E X 1 Y X 0 b τ ª 2 b b 0 E ρ τ Y X 0 b = E f X 0 b X XX 0 = H (b) Moreover, under some regularity conditions we can use Newey and McFadden s asymptotic normality theorem, leading to h i N bβ (τ) β (τ) 1 = H 1 X N 0 X i 1 Yi Xiβ 0 (τ) τ ª + o p (1). N 9 Note that if Pr (Y r X) =F (r; X) then Pr (F (Y ; X) F (r; X) X) =F (r; X) or Pr (U s X) =s. 7

where H 0 = H (β (τ)) is the Hessian of the limit objective function at the truth, and 1 X N X i 1 Yi Xiβ 0 (τ) τ ª d N (0,V0 ) N where ³ 1 V 0 = E Yi Xiβ 0 (τ) τ ª 2 Xi Xi 0 = τ (1 τ) E X i Xi 0. Note that the last equality follows under the assumption of linearity of conditional quantiles. Thus, h i N bβ d (τ) β (τ) N (0,W0 ) where W 0 = H 0 1 V 0 H 0 1. To get a consistent estimate of W 0 we need consistent estimates of H 0 and V 0. A simple estimator of H 0 suggested in Powell (1984, 1986), which mimics the histogram estimator discussed above, is as follows: bh = 1 2Nh N 1 ³ Yi Xi 0 β b (τ) h N X i Xi. 0 This estimator is motivated by the following iterated expectations argument: H 0 = E f X 0 β (τ) X XX 0 1 lim h 0 2h E E 1( Y X 0 β (τ) h) X XX 0 ª 1 = lim h 0 2h E 1( Y X 0 β (τ) h)xx 0. If the quantile function is correctly specified a consistent estimate of V 0 is bv = τ (1 τ) 1 N X i Xi. 0 Otherwise, a fully robust estimator can be obtained using ev = 1 N n h 1 Y i Xi 0 β b i 2 (τ) τo Xi Xi 0 Finally, if U τ = Y X 0 β (τ) is independent of X (as in the location model) it turns out that H 0 = f Uτ (0) E X i Xi 0 so that W 0 = τ (1 τ) E [f Uτ (0)] 2 Xi Xi 0 1, 8

which can be consistently estimated as à τ (1 τ) 1 cw NR = h i 2 X i Xi 0 bfuτ N (0) where bf Uτ (0) = 1 2Nh N! 1 1( Y i Xi 0 β b (τ) h N ). In summary, we have considered three different alternative estimators for standard errors: A nonrobust variance matrix estimator under independence W c NR, a robust estimator under correct specification: W c R = H b 1 V b H b 1, and a fully robust estimator under misspecification: W c FR = H b 1 V e H b 1. 5 Further topics Censored regression quantiles Powell s estimators. Chamberlain s minimum distance approach. Honoré (1992) panel data approaches. Instrumental variable models Chernozhukov and Hansen (2006) estimators. Chesher (2003) and Ma and Koenker (2006). Treatment effect perspectives. Crossings and rearrangements. Decompositions: Machado and Mata (2005) counterfactuals. Quantile regression under misspecification (Angrist et al, 2006). Asymptotic efficiency: GLS and optimal instrument arguments. Functional inference. 9

References [1] Amemiya, T. (1985): Advanced Econometrics, Blackwell. [2] Angrist, J., V. Chernozhukov, and I. Fernández-Val (2006): Quantile Regression under Misspecification with an Application to the U.S. Wage Structure, Econometrica, 74, 539-563. [3] Chamberlain, G. (1994): Quantile Regression, Censoring, and the Structure of Wages, in C.A. Sims (ed.), Advances in Econometrics, Sixth World Congress, vol. 1, Cambridge. [4] Chernozhukov, V. and C. Hansen (2006): Instrumental Quantile Regression Inference for Structural and Treatment Effect Models, Journal of Econometrics, 132, 491-525. [5] Chesher, A. (2003): Identification in Nonseparable Models, Econometrica, 71, 1405-1441. [6] Cox, D. R. andd. V. Hinkley(1974): Theoretical Statistics, Chapman and Hall, London. [7] Honoré, B. (1992): Trimmed LAD and Least Squares Estimation of Truncated and Censored Regression Models with Fixed Effects, Econometrica, 60, 533-565. [8] Koenker, R. and G. Basset (1978): Regression Quantiles, Econometrica, 46, 33-50. [9] Koenker, R. (2005): Quantile Regression, Cambridge University Press. [10] Ma, L. and R. Koenker (2006): Quantile Regression Methods for Recursive Structural Equation Models, Journal of Econometrics, 134, 471-506. [11] Machado, J. and J. Mata (2005): Counterfactual Decomposition of Changes in Wage Distributions using Quantile Regression, Journal of Applied Econometrics, 20, 445-465. [12] Newey, W. and D. McFadden (1994): Large Sample Estimation and Hypothesis Testing, in R. Engle and D. McFadden (eds.), Handbook of Econometrics, Vol. 4, Elsevier. [13] Powell, J. L. (1984), Least Absolute Deviations Estimation for the Censored Regression Model, Journal of Econometrics, 25, 303-25. [14] Powell, J. L. (1986): Censored Regression Quantiles, Journal of Econometrics, 32, 143-155. 10