Inference for High Dimensional Robust Regression

Size: px
Start display at page:

Download "Inference for High Dimensional Robust Regression"

Transcription

1 Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015

2 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example

3 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example

4 Setup Consider a linear regression model: Y i = X T i β 0 + ɛ i, i = 1, 2,..., n. Here Y i R, X i R p, β 0 R p and ɛ i R. OLS Estimator (p < n): M Estimator (p < n): ˆβ OLS 1 = arg min β R p n n (y i xi T β) 2 ; i=1 1 ˆβ = arg min β R p n n ρ(y i xi T i=1 β).

5 Background The limiting behavior for ˆβ when p is fixed ( ) L( ˆβ) N β 0, (X T X ) 1 E(ψ2 (ɛ)) [Eψ (ɛ)] 2 ;

6 Background The limiting behavior for ˆβ when p is fixed ( ) L( ˆβ) N β 0, (X T X ) 1 E(ψ2 (ɛ)) [Eψ (ɛ)] 2 ; Huber (1973) raised the question of understanding the behavior of ˆβ when p ;

7 Background The limiting behavior for ˆβ when p is fixed ( ) L( ˆβ) N β 0, (X T X ) 1 E(ψ2 (ɛ)) [Eψ (ɛ)] 2 ; Huber (1973) raised the question of understanding the behavior of ˆβ when p ; Huber (1973) proved that ˆβ OLS is jointly asymptotically normal iff κ = max(x (X T X ) 1 X T ) i,i 0 i which requires p n 0.

8 Background Portnoy (1984, 1985, 1986, 1987) proved the joint asymptotic normality of ˆβ in the case (p log n) 3 2 n 0;

9 Background Portnoy (1984, 1985, 1986, 1987) proved the joint asymptotic normality of ˆβ in the case (p log n) 3 2 n 0; Mammen (1989) provided an expansion for ˆβ (which leads to joint asymptotic normality) by assuming κn 1 3 (log n)

10 Background Portnoy (1984, 1985, 1986, 1987) proved the joint asymptotic normality of ˆβ in the case (p log n) 3 2 n 0; Mammen (1989) provided an expansion for ˆβ (which leads to joint asymptotic normality) by assuming κn 1 3 (log n) All works are based on fixed designs but requires p n 0.

11 Background El Karoui et al. (2011, 2013), Bean et al. (2013) established the joint asymptotic normality of ˆβ in the regime p n κ (0, 1), by assuming a random design X, which has i.i.d. Gaussian entries;

12 Background El Karoui et al. (2011, 2013), Bean et al. (2013) established the joint asymptotic normality of ˆβ in the regime p n κ (0, 1), by assuming a random design X, which has i.i.d. Gaussian entries; Zhang and Zhang (2014), Van de Geer et al. (2014) proved the partial asymptotic normality of LASSO estimator by assuming a fixed design and imposing a sparsity condition on β 0 : β 0 0 log p n 0.

13 Main Research Question and Our Contributions Suppose p n κ (0, 1) and the design matrix X is fixed, can we make inference on the coordinate (or lower dimensional linear contrast) of ˆβ?

14 Main Research Question and Our Contributions Suppose p n κ (0, 1) and the design matrix X is fixed, can we make inference on the coordinate (or lower dimensional linear contrast) of ˆβ? YES! In this work, we prove the coordinatewise asymptotic normality of ˆβ in the regime p n κ (0, 1) for fixed designs; show that the conditions for fixed design matrix is satisfied by a broad class of random designs.

15 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example

16 Ridge-Regularized M Estimator Consider the ridge-regularized M estimator 1 ˆβ = arg min β R p n n ρ(y i xi T β) + τ 2 β 2. i=1 Assume that ρ C 2 is a convex function with ψ = ρ and β 0 = 0, then the first order condition implies that n x i ψ(ɛ i xi T i=1 ˆβ) = nτ ˆβ. In most cases, there is no closed form solution and ˆβ is an implicit function of ɛ.

17 Final Conclusions Theorem 1. Under Assumptions A1-A4, ˆβ is coordinatewise asymptotically normal in the sense that ( ), N(0, 1) PolyLog(n) = O. n max d TV j ˆβ j E ɛ ˆβ j Var ɛ ( ˆβ j )

18 Assumptions: Loss Function Assumption A1: Let ψ = ρ, for any x, 0 < D 0 ψ (x) D 1 ( x 1) m 1 ; ψ (x) D 2 ( x 1) m 2 ; ψ (x) D 3 ( x 1) m 3 ; max{d 1 0, D 1, D 2, D 3 } = O(PolyLog(n)); max{m 1, m 2, m 3 } = O(1).

19 Assumptions: Error Distribution Assumption A2: ɛ are transformations of i.i.d. Gaussian random variables, i.e. ɛ i = u i (ν i ), where ν i i.i.d. N(0, 1); u i c 1, u i c 2 ; max{c 1, c 2 } = O(PolyLog(n)).

20 Assumptions: Design Matrix Assumption A3: for design matrix X, max i,j X ij = O(PolyLog(n)); ( ) λ X T X max n = O(PolyLog(n)); 1 n n i=1 x i = O(PolyLog(n)), where x i is the i-th row.

21 Assumptions: Linear Concentration Property Assumption A4: Let x i be the i-th row of X and X j be the j-th column of X. {α k,i R p : k = 1,..., N n (1) ; i = 1,..., n} and {γ r,j R n : r = 1,..., N n (2) ; j = 1,..., p} are two sequences of unit vectors (with explicit forms but omitted here for concision) max{n (1) n, N (2) n } = O(n 2 ). α k,i only relies on ɛ and x i for i i; γ r,j only relies on ɛ and X j for j j; E ɛ max k,i α T k,i x i = O(PolyLog(n)); E ɛ max r,j γ T r,j X j = O(PolyLog(n));

22 Illustration of Assumptions A4 Consider i.i.d. standard gaussian designs For given k and i, α k,i x i and X ij i.i.d. N(0, 1), X ɛ. α T k,i x i N(0, 1). Then E ɛ,x max k,i αk,i T x i is the expectation of N n (1) gaussian random variables and hence E ɛ,x max αk,i T x i k,i By Markov Inequality, E ɛ max k,i αk,i T x i = O p (E ɛ,x max αk,i T x i k,i standard log nn n (1) = O(PolyLog(n)). ) = O p (PolyLog(n)).

23 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example

24 Lindeberg-Feller Condition Assume that p/n κ (0, 1), p < n and β 0 = 0, denote ˆβ OLS p 1 = arg min β R p n n (y i xi T β) 2 = (X T X ) 1 X T ɛ; i=1 then each coordinate is a linear constrast of ˆβ. Proposition 1 (Lindeberg-Feller Condition). Suppose ɛ n = (ɛ 1,..., ɛ n ) T has i.i.d. zero mean components with variance σ 2. If c n / c n 2 0 where c n = (c n,1,..., c n,kn ), then c T n ɛ n c n 2 d N(0, σ 2 ).

25 Lindeberg-Feller Condition Note that ˆβ OLS p,j p = e T j p (X T X ) 1 X T ɛ c T p,j p ɛ. For given matrix X R n p, define H(X ) then for any j p {1,..., p}, and this leads to Theorem 2. ˆβ OLS p is c.a.s.n. if H(X p ) 0. ej T (X T X ) 1 X T max j=1,...,p ej T (X T X ) 1 X T, 2 c T p,j p c T p,j p 2 H(X p )

26 Random Designs We prove that H(X p ) 0 for a broad class of random designs. Theorem 3. Let X R n p be a random matrix with independent zero mean entries, such that sup i,j X ij 8+δ M for some constant M and δ > 0. Assume that X has full column rank almost surely and Var(X ij ) > τ 2 for some τ > 0. Then provided lim sup p/n < 1. H(X ) = O p (n 1 4 )

27 Future Works Extend to heavy-tailed errors, e.g. ɛ i Cauchy; Explore more general random designs that satisfy A3 and A4; Calculate the bias E ɛ ˆβj and variance Var ɛ ( ˆβ j ); Prove the result for unregularized M estimator, i.e. τ = 0; Extend to low dimensional linear contrasts of ˆβ, i.e. α T ˆβ with α 0 = o(n).

28 Thank You!

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

High-dimensional regression:

High-dimensional regression: High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models

Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models Journal of Machine Learning Research 9 208-66 Submitted /7; Revised 2/7; Published 08/8 Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models Noureddine El Karoui Criteo AI Lab 32 Rue

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Weak quenched limiting distributions of a one-dimensional random walk in a random environment

Weak quenched limiting distributions of a one-dimensional random walk in a random environment Weak quenched limiting distributions of a one-dimensional random walk in a random environment Jonathon Peterson Cornell University Department of Mathematics Joint work with Gennady Samorodnitsky September

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Can we trust the bootstrap in high-dimension?

Can we trust the bootstrap in high-dimension? Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley First submitted: February 4, 205 This version: October

More information

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Finite Time Analysis of Vector Autoregressive Models under Linear Restrictions

Finite Time Analysis of Vector Autoregressive Models under Linear Restrictions Finite Time Analysis of Vector Autoregressive Models under Linear Restrictions Yao Zheng collaboration with Guang Cheng Department of Statistics, Purdue University Talk at Math, IUPUI October 23, 2018

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS The Annals of Statistics 2008, Vol. 36, No. 2, 587 613 DOI: 10.1214/009053607000000875 Institute of Mathematical Statistics, 2008 ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar

More information

Exponential tail inequalities for eigenvalues of random matrices

Exponential tail inequalities for eigenvalues of random matrices Exponential tail inequalities for eigenvalues of random matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

M-Estimation under High-Dimensional Asymptotics

M-Estimation under High-Dimensional Asymptotics M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical

More information

Can we trust the bootstrap in high-dimension?

Can we trust the bootstrap in high-dimension? Can we trust the bootstrap in high-dimension? Noureddine El Karoui and Elizabeth Purdom Department of Statistics, University of California, Berkeley February 4, 05 Abstract We consider the performance

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Find the maximum likelihood estimate of θ where θ is a parameter

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Graduate Econometrics I: Asymptotic Theory

Graduate Econometrics I: Asymptotic Theory Graduate Econometrics I: Asymptotic Theory Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Asymptotic Theory

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

M-estimation in high-dimensional linear model

M-estimation in high-dimensional linear model Wang and Zhu Journal of Inequalities and Applications 208 208:225 https://doi.org/0.86/s3660-08-89-3 R E S E A R C H Open Access M-estimation in high-dimensional linear model Kai Wang and Yanling Zhu *

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

A Resampling Method on Pivotal Estimating Functions

A Resampling Method on Pivotal Estimating Functions A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Robust Testing and Variable Selection for High-Dimensional Time Series

Robust Testing and Variable Selection for High-Dimensional Time Series Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional

More information

Stein s method and zero bias transformation: Application to CDO pricing

Stein s method and zero bias transformation: Application to CDO pricing Stein s method and zero bias transformation: Application to CDO pricing ESILV and Ecole Polytechnique Joint work with N. El Karoui Introduction CDO a portfolio credit derivative containing 100 underlying

More information

Different types of regression: Linear, Lasso, Ridge, Elastic net, Ro

Different types of regression: Linear, Lasso, Ridge, Elastic net, Ro Different types of regression: Linear, Lasso, Ridge, Elastic net, Robust and K-neighbors Faculty of Mathematics, Informatics and Mechanics, University of Warsaw 04.10.2009 Introduction We are given a linear

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Combinatorial Types of Tropical Eigenvector

Combinatorial Types of Tropical Eigenvector Combinatorial Types of Tropical Eigenvector arxiv:1105.55504 Ngoc Mai Tran Department of Statistics, UC Berkeley Joint work with Bernd Sturmfels 2 / 13 Tropical eigenvalues and eigenvectors Max-plus: (R,,

More information

The Lindeberg central limit theorem

The Lindeberg central limit theorem The Lindeberg central limit theorem Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto May 29, 205 Convergence in distribution We denote by P d the collection of Borel probability

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the

More information

Chp 4. Expectation and Variance

Chp 4. Expectation and Variance Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.

More information

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning Krishnakumar Balasubramanian Georgia Institute of Technology krishnakumar3@gatech.edu Kai Yu Baidu Inc. yukai@baidu.com Tong

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Quasi-regression for heritability

Quasi-regression for heritability Quasi-regression for heritability Art B. Owen Stanford University March 01 Abstract We show in an idealized model that the narrow sense (linear heritability from d autosomal SNPs can be estimated without

More information

arxiv: v3 [stat.me] 10 Mar 2016

arxiv: v3 [stat.me] 10 Mar 2016 Submitted to the Annals of Statistics ESTIMATING THE EFFECT OF JOINT INTERVENTIONS FROM OBSERVATIONAL DATA IN SPARSE HIGH-DIMENSIONAL SETTINGS arxiv:1407.2451v3 [stat.me] 10 Mar 2016 By Preetam Nandy,,

More information

Efficient and Robust Scale Estimation

Efficient and Robust Scale Estimation Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Negative Association, Ordering and Convergence of Resampling Methods

Negative Association, Ordering and Convergence of Resampling Methods Negative Association, Ordering and Convergence of Resampling Methods Nicolas Chopin ENSAE, Paristech (Joint work with Mathieu Gerber and Nick Whiteley, University of Bristol) Resampling schemes: Informal

More information

CHAPTER 3: LARGE SAMPLE THEORY

CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information