Inference for High Dimensional Robust Regression

Similar documents
Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results

High-dimensional regression:

Robust estimation, efficiency, and Lasso debiasing

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

Robust high-dimensional linear regression: A statistical perspective

high-dimensional inference robust to the lack of model sparsity

STAT 200C: High-dimensional Statistics

Economics 583: Econometric Theory I A Primer on Asymptotics

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

High-dimensional covariance estimation based on Gaussian graphical models

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

High-dimensional Ordinary Least-squares Projection for Screening Variables

STAT 200C: High-dimensional Statistics

Lecture 14 October 13

Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models

Least squares under convex constraint

Weak quenched limiting distributions of a one-dimensional random walk in a random environment

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Asymptotic Statistics-III. Changliang Zou

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Data Mining Stat 588

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

High-dimensional graphical model selection: Practical and information-theoretic limits

Can we trust the bootstrap in high-dimension?

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

Permutation-invariant regularization of large covariance matrices. Liza Levina

Finite Time Analysis of Vector Autoregressive Models under Linear Restrictions

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

19.1 Problem setup: Sparse linear regression

IEOR 265 Lecture 3 Sparse Linear Regression

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

Exponential tail inequalities for eigenvalues of random matrices

Reconstruction from Anisotropic Random Measurements

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

Asymptotic efficiency of simple decisions for the compound decision problem

(Part 1) High-dimensional statistics May / 41

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

1 Regression with High Dimensional Data

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

GARCH Models Estimation and Inference

An iterative hard thresholding estimator for low rank matrix recovery

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

M-Estimation under High-Dimensional Asymptotics

Can we trust the bootstrap in high-dimension?

Master s Written Examination

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Review of Econometrics

Graduate Econometrics I: Asymptotic Theory

The deterministic Lasso

M-estimation in high-dimensional linear model

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Inverse Statistical Learning

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

A Primer on Asymptotics

A Resampling Method on Pivotal Estimating Functions

Generalization theory

STAT 200C: High-dimensional Statistics

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

High-dimensional graphical model selection: Practical and information-theoretic limits

Concentration Inequalities for Random Matrices

Linear Methods for Regression. Lijun Zhang

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

sparse and low-rank tensor recovery Cubic-Sketching

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Robust Testing and Variable Selection for High-Dimensional Time Series

Stein s method and zero bias transformation: Application to CDO pricing

Different types of regression: Linear, Lasso, Ridge, Elastic net, Ro

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Combinatorial Types of Tropical Eigenvector

The Lindeberg central limit theorem

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Chp 4. Expectation and Variance

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning

Lecture 7 Introduction to Statistical Decision Theory

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments

High dimensional Ising model selection

Quasi-regression for heritability

arxiv: v3 [stat.me] 10 Mar 2016

Efficient and Robust Scale Estimation

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Negative Association, Ordering and Convergence of Resampling Methods

CHAPTER 3: LARGE SAMPLE THEORY

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

Can we do statistical inference in a non-asymptotic way? 1

Robust Principal Component Analysis

Transcription:

Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015

Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example

Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example

Setup Consider a linear regression model: Y i = X T i β 0 + ɛ i, i = 1, 2,..., n. Here Y i R, X i R p, β 0 R p and ɛ i R. OLS Estimator (p < n): M Estimator (p < n): ˆβ OLS 1 = arg min β R p n n (y i xi T β) 2 ; i=1 1 ˆβ = arg min β R p n n ρ(y i xi T i=1 β).

Background The limiting behavior for ˆβ when p is fixed ( ) L( ˆβ) N β 0, (X T X ) 1 E(ψ2 (ɛ)) [Eψ (ɛ)] 2 ;

Background The limiting behavior for ˆβ when p is fixed ( ) L( ˆβ) N β 0, (X T X ) 1 E(ψ2 (ɛ)) [Eψ (ɛ)] 2 ; Huber (1973) raised the question of understanding the behavior of ˆβ when p ;

Background The limiting behavior for ˆβ when p is fixed ( ) L( ˆβ) N β 0, (X T X ) 1 E(ψ2 (ɛ)) [Eψ (ɛ)] 2 ; Huber (1973) raised the question of understanding the behavior of ˆβ when p ; Huber (1973) proved that ˆβ OLS is jointly asymptotically normal iff κ = max(x (X T X ) 1 X T ) i,i 0 i which requires p n 0.

Background Portnoy (1984, 1985, 1986, 1987) proved the joint asymptotic normality of ˆβ in the case (p log n) 3 2 n 0;

Background Portnoy (1984, 1985, 1986, 1987) proved the joint asymptotic normality of ˆβ in the case (p log n) 3 2 n 0; Mammen (1989) provided an expansion for ˆβ (which leads to joint asymptotic normality) by assuming κn 1 3 (log n) 2 3 0.

Background Portnoy (1984, 1985, 1986, 1987) proved the joint asymptotic normality of ˆβ in the case (p log n) 3 2 n 0; Mammen (1989) provided an expansion for ˆβ (which leads to joint asymptotic normality) by assuming κn 1 3 (log n) 2 3 0. All works are based on fixed designs but requires p n 0.

Background El Karoui et al. (2011, 2013), Bean et al. (2013) established the joint asymptotic normality of ˆβ in the regime p n κ (0, 1), by assuming a random design X, which has i.i.d. Gaussian entries;

Background El Karoui et al. (2011, 2013), Bean et al. (2013) established the joint asymptotic normality of ˆβ in the regime p n κ (0, 1), by assuming a random design X, which has i.i.d. Gaussian entries; Zhang and Zhang (2014), Van de Geer et al. (2014) proved the partial asymptotic normality of LASSO estimator by assuming a fixed design and imposing a sparsity condition on β 0 : β 0 0 log p n 0.

Main Research Question and Our Contributions Suppose p n κ (0, 1) and the design matrix X is fixed, can we make inference on the coordinate (or lower dimensional linear contrast) of ˆβ?

Main Research Question and Our Contributions Suppose p n κ (0, 1) and the design matrix X is fixed, can we make inference on the coordinate (or lower dimensional linear contrast) of ˆβ? YES! In this work, we prove the coordinatewise asymptotic normality of ˆβ in the regime p n κ (0, 1) for fixed designs; show that the conditions for fixed design matrix is satisfied by a broad class of random designs.

Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example

Ridge-Regularized M Estimator Consider the ridge-regularized M estimator 1 ˆβ = arg min β R p n n ρ(y i xi T β) + τ 2 β 2. i=1 Assume that ρ C 2 is a convex function with ψ = ρ and β 0 = 0, then the first order condition implies that n x i ψ(ɛ i xi T i=1 ˆβ) = nτ ˆβ. In most cases, there is no closed form solution and ˆβ is an implicit function of ɛ.

Final Conclusions Theorem 1. Under Assumptions A1-A4, ˆβ is coordinatewise asymptotically normal in the sense that ( ), N(0, 1) PolyLog(n) = O. n max d TV j ˆβ j E ɛ ˆβ j Var ɛ ( ˆβ j )

Assumptions: Loss Function Assumption A1: Let ψ = ρ, for any x, 0 < D 0 ψ (x) D 1 ( x 1) m 1 ; ψ (x) D 2 ( x 1) m 2 ; ψ (x) D 3 ( x 1) m 3 ; max{d 1 0, D 1, D 2, D 3 } = O(PolyLog(n)); max{m 1, m 2, m 3 } = O(1).

Assumptions: Error Distribution Assumption A2: ɛ are transformations of i.i.d. Gaussian random variables, i.e. ɛ i = u i (ν i ), where ν i i.i.d. N(0, 1); u i c 1, u i c 2 ; max{c 1, c 2 } = O(PolyLog(n)).

Assumptions: Design Matrix Assumption A3: for design matrix X, max i,j X ij = O(PolyLog(n)); ( ) λ X T X max n = O(PolyLog(n)); 1 n n i=1 x i = O(PolyLog(n)), where x i is the i-th row.

Assumptions: Linear Concentration Property Assumption A4: Let x i be the i-th row of X and X j be the j-th column of X. {α k,i R p : k = 1,..., N n (1) ; i = 1,..., n} and {γ r,j R n : r = 1,..., N n (2) ; j = 1,..., p} are two sequences of unit vectors (with explicit forms but omitted here for concision) max{n (1) n, N (2) n } = O(n 2 ). α k,i only relies on ɛ and x i for i i; γ r,j only relies on ɛ and X j for j j; E ɛ max k,i α T k,i x i = O(PolyLog(n)); E ɛ max r,j γ T r,j X j = O(PolyLog(n));

Illustration of Assumptions A4 Consider i.i.d. standard gaussian designs For given k and i, α k,i x i and X ij i.i.d. N(0, 1), X ɛ. α T k,i x i N(0, 1). Then E ɛ,x max k,i αk,i T x i is the expectation of N n (1) gaussian random variables and hence E ɛ,x max αk,i T x i k,i By Markov Inequality, E ɛ max k,i αk,i T x i = O p (E ɛ,x max αk,i T x i k,i standard log nn n (1) = O(PolyLog(n)). ) = O p (PolyLog(n)).

Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example

Lindeberg-Feller Condition Assume that p/n κ (0, 1), p < n and β 0 = 0, denote ˆβ OLS p 1 = arg min β R p n n (y i xi T β) 2 = (X T X ) 1 X T ɛ; i=1 then each coordinate is a linear constrast of ˆβ. Proposition 1 (Lindeberg-Feller Condition). Suppose ɛ n = (ɛ 1,..., ɛ n ) T has i.i.d. zero mean components with variance σ 2. If c n / c n 2 0 where c n = (c n,1,..., c n,kn ), then c T n ɛ n c n 2 d N(0, σ 2 ).

Lindeberg-Feller Condition Note that ˆβ OLS p,j p = e T j p (X T X ) 1 X T ɛ c T p,j p ɛ. For given matrix X R n p, define H(X ) then for any j p {1,..., p}, and this leads to Theorem 2. ˆβ OLS p is c.a.s.n. if H(X p ) 0. ej T (X T X ) 1 X T max j=1,...,p ej T (X T X ) 1 X T, 2 c T p,j p c T p,j p 2 H(X p )

Random Designs We prove that H(X p ) 0 for a broad class of random designs. Theorem 3. Let X R n p be a random matrix with independent zero mean entries, such that sup i,j X ij 8+δ M for some constant M and δ > 0. Assume that X has full column rank almost surely and Var(X ij ) > τ 2 for some τ > 0. Then provided lim sup p/n < 1. H(X ) = O p (n 1 4 )

Future Works Extend to heavy-tailed errors, e.g. ɛ i Cauchy; Explore more general random designs that satisfy A3 and A4; Calculate the bias E ɛ ˆβj and variance Var ɛ ( ˆβ j ); Prove the result for unregularized M estimator, i.e. τ = 0; Extend to low dimensional linear contrasts of ˆβ, i.e. α T ˆβ with α 0 = o(n).

Thank You!