Asymmetric least squares estimation and testing Whitney Newey and James Powell Princeton University and University of Wisconsin-Madison January 27, 2012
Outline ALS estimators Large sample properties Asymptotic rerlative efficiencies of alternative tests Conclusion
Linear model Data {(y i, x i ), i = 1,..., n} are from y i = x i β 0 + u i, where {x i } is a sequence of regression vectors of dimension p with first component x i = 1 and {u i } is a sequence of scalar error terms.
Regression quantile (RQ) estimator Regression quantile (RQ) estimator ˆ b(θ) = argmin β R pq n (β; θ) = argmin β R p n r θ (y i x i β), for fixed values of θ in (0,1) and r θ (λ) = θ 1(λ < 0) λ, 1(A) denoting the indicator function for the event A. i=1
Regression quantile (RQ) estimator Homoskedasticity case lim ˆ b(θ) = β 0 + η(θ)e 1, where e j denotes the jth unit vector and η(θ) = F 1 (θ), the quantile function for the error term u i. Heteroskedasticity case The probability limits for the slope coefficients will vary with θ, with differences depending on the joint distribution of u i and x i.
Asymmetric least squares (ALS) estimator Asymmetric least squares loss function for τ in (0,1) ALS estimator ρ τ (λ) = τ 1(λ < 0) λ 2, ˆ β(τ) = argmin β R n (β; τ) = argmin β n ρ τ (y i x i β) i=1
τth expectile: µ(τ) Expectile function µ(τ) = argmin m E[ρ τ (Y m) ρ τ (Y )], where the expectation is taken with respect to the distribution of the random variable Y, which is assumed to have finite mean. Solution of the equation: µ(τ) µ(τ) E(Y ) = [(2τ 1)/(1 τ)] where F (y) is the c.d.f. of Y 2. [µ(τ), ) (y µ(τ))df (y), µ(τ) is determined by the properties of the expectation of the random variable Y conditional on Y being in a tail of the distribution.
Properties of expectile function Expectile function summarizes the d.f. as the quantile function does. Let I F denotes the set {y 0 < F (y) < 1}. Theorem Suppose that E(Y ) = m exists. For each τ (0, 1), a unique solution µ(τ) to the equation exists and has the propeties: (i) As a function µ(τ): (0, 1) R, µ(τ) is strictly monotonic increasing. (ii) The range of µ(τ) is I F and µ(τ) maps (0,1) onto I F. (iii) For Ỹ = sy + t, where s > 0, the τth expectile µ(τ) of Ỹ satisfies µ(τ) = sµ(τ) + t. (iv) If F (y) is continuously differentiable, then µ(τ) is continuously differentiable, and for y m in I F and τ y such that y = µ(τ y ), F (y) = [y m + τ y µ (τ y )(1 2τ y )]/[µ (τ y )(1 2τ y ) 2 ], where this equation holds in the limit for y = m (and τ y = 1/2).
Plot of quantile(η(θ)) and expectile (µ(τ)) for the standard normal distribution
Relationship between µ(τ, x i ) and x i β(τ) β(τ) = argmin β R(β, τ) = argmin β E[ρ τ (y i x i β) ρ τ (y i )], which will be determined by the conditional distribution, y i x i. β(τ) = {E[ τ 1(y i < x i β(τ)) x i x i ]} 1 E[ τ 1(y i < x i β(τ)) x i y i ] µ(τ, x i ) = argmin m E[ρ τ (y i m) ρ τ (y i ) x i ] for almost all x. x i β(τ) will be a linear approximation to µ(τ, x i). Case: u i x i µ(τ, x i ) = x i β 0 + µ(τ), where µ(τ) is the τth expectile of ɛ i. β(τ) = β 0 + µ(τ)e 1, e 1 = (1, 0,..., 0) Case: u i = (x i γ 0)ɛ i and ɛ i x i µ(τ, x i ) = x i β 0 + µ(τ)x i γ 0 = x i [β 0 + µ(τ)γ 0 ] β(τ) = β 0 + µ(τ)γ 0
Symmetry of the conditional distribution The following theorem can be used to detect asymmetry of the conditional distribution y i given x i. Theorem If the distribution of y i conditional on x i is symmetric around x i β 0 with probability one, then. [β(τ) + β(1 τ)]/2 = β 0
Advantage of ALS estimators Computation of ˆβ(τ) :iterated weighted least squares estimators. ˆβ(τ) = [ n τ 1(y i < x i ˆβ(τ)) x i x i ] 1 i=1 n τ 1(y i < x i ˆβ(τ)) x i y i It is unnecessary to estimate the density function of the error terms for the joint asymptotic covariance matrix of several estimators cf. Regression quantiles require it. i=1
Assumptions Let l denote the Lebesgue measure on the real line and let z = (y, x ), where x is p 1 vector. 1 For each sample size n, z i = (y i, x i ), (i = 1,..., n), is i.i.d. and for γ n in (R) q, z i has a probability density function f (y i x i, γ n )g(x i ) with respect to a measure µ z = l µ x such that γ n = γ 0 + δ/ n Also, the conditional density f (y x, γ 0 ) is continuous in y for almost all x. Let E[ γ] denote the expectation taken at f (y x, γ)g(x), and let E[ ] = E[ γ 0 ]. Also, let ψ τ (λ) = τ 1(λ < 0) λ. 2 There is an open set Γ containing γ 0 such that for almost all z, the conditional density f (y x, γ) is continuous in γ on Γ. Also E[x i ψ τ (y i x i β(τ)) γ] is continuously differentiable in γ on Γ.
Assumptions 3 There is a constant d > 0 and a measurable function α(z) that satisfy sup Γ f (y x, γ) α(z) and z 4+d α(z)g(x)dµ z <, α(z)g(x)dµ z < 4 E[x i x i ] is nonsingular. 5 The observations satisfy y i = x i β 0 + u i, where u i = σ i ɛ i, σ i = 1 + x i γ nh + 1(ɛ > 0)x i γ ns, where γ nh = δ h / n, γ ns = δ s / n, and ɛ i is i.i.d., independent of x i, and symmetrically distributed around zero. Also, ɛ i has the c.d.f. F (ɛ), which has a continuous density f (ɛ). 6 x i has compact support. Also there exist finite constants D, d > 0 such that f (ɛ) D/(1 + ɛ 5+d ).
Asymptotic distribution of the vector of ALS estimators For a vector of weights (τ 1,..., τ m ), let ˆξ = vec[ ˆβ(τ 1 ),..., ˆβ(τ m )] and ξ = vec[β(τ 1 ),..., β(τ m )] be tha population counterpart. For u i (τ) = y i x i β(τ) and w i (τ) = τ 1(u i (τ) < 0), let W j = E[w i (τ j )x i x i ], W = diag[w 1,..., W m ] V jk = E[w i (τ j )w i (τ k )u i (τ j )u i (τ k )x i x i ], V = [V jk ], (j, k = 1,..., m), G j = E[w i (τ j )u i (τ j )x i γ 0 ]/ γ, G = [G 1,..., G m] Theorem If Assumptions 1-4 are satisfied, then for each τ in (0,1), a unique solution β(τ) exists. Also, n(ˆξ ξ) d N(W 1 Gδ, W 1 VW 1 ).
Sample moment estimator of the asymptotic covariance matrix of the ALS estimators Let Ŵ j = ˆV jk = n ŵ i (τ j )x i x i /n, Ŵ = diag[ŵ 1,..., Ŵ m ] i=1 n ŵ i (τ j )ŵ i (τ k )û i (τ j )û i (τ k )x i x i /n, ˆV = [ ˆV jk ], (j, k = 1,., m), i=1 where û i (τ) = y i x i ˆβ(τ) and ŵ i (τ) = τ 1(û i (τ) < 0). Theorem If Assumptions 1-4 are satisfied, then Ŵ 1 ˆV Ŵ 1 p W 1 VW 1
Asymptotic distribution of test statistic for general linear hypothesis Consider the general hypothesis: H 0 : Hξ = h. A test statistic T for the hypothesis: T = n(h ˆξ h) [HŴ 1 ˆV Ŵ 1 H ] 1 (H ˆξ h) Definition of Σ, D, µν are omitted. Please refer to p. 830. Theorem Suppose that Assumptions 1,4,5, and 6 are satisfied. Also suppose that H 0 is satisfied when γ = γ 0, Σ is nonsingular, and H has full row rank. Then T converges in distribution to a noncentral chi-squared with rank(h) d.f. and noncentality parameter (µ δ h + ν δ s ) H [H(Σ D 1 )H ] 1 H(µ δ h + ν δ s )
Test of homoskedasticity Contaminated Gaussian error distribution Comparison ALS test with τ =.54 RQ test with θ =.87. Absolute residual test Squared residual test ALS and RQ test θ and τ were selected based on calculation of weights which maximize the respective noncentrality parameters. Absolute and squared residual test based on the sample correlation of l(û i ) with x i. l(u) = u 2 for squared residual test, l(u) = u for absolute residual test
Local efficiencies of tests for heteroskedasticity, relative to squared residual regression
Test of conditional symmetry Contaminated Gaussian error distribution Comparison ALS test with τ =.54 RQ test with θ =.87. Squared residual test Test based on a comparison of least squares and least absolute deviations regession coefficients ALS and RQ test θ and τ were selected based on calculation of weights which maximize the respective noncentrality parameters.
Local efficiencies of tests for asymmetry, relative to squared residual regression
Conclusion ALS estimators useful summary statistics of conditional distribution of y i given x i. Tests of homoskedasticity and conditional symmetry based on ALS are reasonably efficient.
Thank you!