Confidence Intervals for Low-dimensional Parameters with High-dimensional Data
|
|
- Andrew Reed
- 6 years ago
- Views:
Transcription
1 Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012
2 Outline Introduction Methodology Bias corrected linear estimators Low-dimensional proections (LDP) Implementation with the scaled Lasso Confidence intervals Theoretical Results Simulation Study
3 Statistical Inference of Low-dimensional Parameters with High-dimensional Data Limit of statistical inference based on selection consistency theory All nonzero coefficients be greater than a noise level No signal strength or uniform strength of individual variables. This paper includes Proposal of low-dimensional proection estimator (LDPE) Confidence interval construction for coefficients Asymptotic normality of the proposed estimators Consistent estimator for their covariance matrices Numerical study
4 Setting Linear model y = Xβ + ɛ, ɛ N(0, σ 2 I ), where y R n, X = (x 1,, x p ) R n p, and β = (β 1,, β p ) T. Notation For vectors v = (v 1,, v m ) of any dimension, supp(v) = { : v 0}, v 0 = supp(v) and v q = { v q } 1/q. For A {1,, p}, v A = (v, A) T and X A = (x k, k A). Define = {1,, p} \ {}.
5 Least squares estimator (LSE) Least squares estimator of an estimable regression coefficient ˆβ (lse) = y T x / x 2 2, x :proection of x to the orthogonal complement of the column space of X = (x k, k ). For estimable β and β k, Cov( (lse) (lse) ˆβ, ˆβ k ) = σ 2 (x ) T x k /( x 2 x k 2) In the case of p > n, rank(x ) = n. ˆβ(lse) because x = 0. is not defined
6 Bias corrected linear estimators Preserving LSE properties: explicit formula of the covariance structure For any score vector z not orthogonal to x, the corresponding linear estimator satisfies ˆβ (lin) = zt y z T = β + zt ɛ x z T + z T x k β k x z T k x Bias problem: z T x k 0 for some k Bias correction with an initial estimator ˆβ (init) ˆβ = ˆβ (lin) k z T x k ˆβ (init) z T x = ˆβ (init) + zt {y Xˆβ (init) } z T x, with a score vector z depending on X only.
7 Bias corrected linear estimators Estimation error: sum of noise and approximation error ˆβ β = zt ɛ z T x + 1 z T x k z T x k (β k ˆβ (init) k ) Specifications of the score vector z and the initial estimator ˆβ (init)
8 Low-dimensional proections (LDP) Proper choice of z Suitable conditions on {X, β} and an initial estimator are given Control on the noise and approximation error Not too small x : z = x Too small x : z relaxed proection of x ˆβ (init) Low-dimensional proections estimator (LDPE): β Relaxed proection x : residual of the least squares fit of x on X z : residual of the Lasso, relaxation of the least squares method z = x X ˆγ, ˆγ = argmin{ x X b λ b 1 }. b 2n Common penalty level use: normalization to x k 2 2 = n
9 Sketch on LDPE theoretical results Karush-Kuhn-Tucker conditions: x T k z /n λ k z T x k (β k ˆβ (init) k ) nλ ˆβ (init) β 1 Let η = nλ / z 2 and τ = z 2 / /z T x. Since z T ɛ N(0, σ z 2 2 ), η ˆβ (init) β 1 /σ = o(1) τ 1 ( ˆβ β ) N(0, σ 2 )
10 Scaled Lasso Joint convex minimization method {ˆβ (init), ˆσ} = argmin{ y Xb σ b,σ 2σn 2 + λ 0 b 1 }, with a penalty level λ 0 λ 0 = A (2/n)log(p/ɛ) with a certain A > 1 and 0 < ɛ 1 (Sun and Zhang, 2011) Scaled Lasso estimate of the noise level at a penalty level λ ˆσ = argminmin{ x X b σ σ b 2σn 2 + λ b 1 }.
11 Lasso path Coefficient estimator, residual and factors along the Lasso path ˆγ (λ) = argmin{ x X b 2 2/(2n) + λ b 1 }, b z (λ) = x X ˆγ (λ), η (λ) = max k xt k z (λ) / z (λ) 2 = nλ/ z (λ) 2 τ (λ) = z (λ) 2 / x T z (λ)
12 Penalty level choice High accuracy of coverage probability of the confidence interval for β Reduction in the ratio of bias to standard error (SE) of our estimator Increase in its SE z = z (λ ), λ = argmin{η (λ) : τ (λ) (1 + κ 0 )τ (ˆσ λ )}, λ κ 0 > 0: pre-determined constant
13 Properties of procedure Proposition Both functions z (λ) 2 and η (λ) are nondecreasing in λ and the function τ (λ) is no greater than 1/ z (λ) 2 in the Lasso path. Moreover, η (λ ) η (ˆσ λ ) = nλ, τ (λ ) (1 + κ 0 )/(ˆσ n 1/2 ) Remark Since η (λ) is a nondecreasing function of λ, the procedure can be carried out by minimizing λ under the constraint on τ (λ). κ 0 = 1/2, λ = λ univ = (2/n)logp z determined by the design matrix X given κ 0
14 Restricted LDPE Weighted low dimensional proection with different relaxation levels for different variable x k according to their correlation to x The larger x T x k /n, the greater contribution to bias due to initial estimation error Smaller z T x k /n for large x T x k /n with a weighted relaxation z = x X ˆγ, ˆγ = argmin{ x X b 2 2 } b 2n w k = 0 for large x T x k /n and w k = 1 for other k +λ w k b k k
15 Confidence intervals Conditions on X and β: approximation error is smaller order than the standard deviation of the noise component Covariance of the noise component V V = (V k ) p p, V k = LDPE vector: ˆβ = ( ˆβ 1,, ˆβ p ) T z T z k z T x z T k x k = σ 2 Cov( zt ɛ z T, zt k ɛ x z T k x ). k Approximate (1 α)100% confidence interval of a T β (a: sparse vectors) a T ˆβ a T β ˆσΦ(1 α 2 )(at Va) 1/2, ˆσ: scaled Lasso noise level estimator Φ: standard normal distribution function
16 Conditions Sparsity p min{ β /(σλ univ ), 1} s, λ univ = (2/n)logp. =1 Ex: β 0 s, l q sparse with β q q/(σλ univ ) q s, 0 < q 1. Initial estimator P{ ˆβ (init) β 1 C 1 sσ (2/n)log(p/ɛ)} ɛ, fixed C 1 and all α 0 /p 2 ɛ 1, α 0 (0, 1). Existing oracle inequalities are used Noise level estimator P{ ˆσ/σ 1 C 2 s(2/n)log(p/ɛ)} ɛ fixed C 2 and all α 0 /p 2 ɛ 1
17 Main theorem ˆβ is the LDPE with a z depending on X only and an initial estimator β (init). Let max(ɛ n, ɛ n ) 0+, τ = z 2 / x T z, and η = min k xt k z / z 2. Suppose initial estimator condition holds with η C 1 sσ (2/n)log(p/ɛ) ɛ n for all. Then, maxp{ τ 1 ( ˆβ β ) z T ɛ/ z 2 > ɛ n } ɛ. If ˆσ condition holds with C 2 s(2/n)log(p/ɛ) ɛ n, then p and t R, Φ(t ɛ n ɛ n t ) 2ɛ P{τ 1 ( ˆβ β ) ˆσt} Φ(t+ɛ n +ɛ n t )+2ɛ. Consequently, for the covariance matrix V and all fixed m, lim inf a 0 m P{ at ˆβ a T β } ˆσΦ 1 (1 α/2)(a T Va) 1/2 } = 1 α n
18 Main theorem ˆβ is the LDPE with a z depending on X only and an initial estimator β (init). Let max(ɛ n, ɛ n ) 0+, τ = z 2 / x T z, and η = min k xt k z / z 2. Suppose initial estimator condition holds with η C 1 sσ (2/n)log(p/ɛ) ɛ n for all. Then, maxp{ τ 1 ( ˆβ β ) z T ɛ/ z 2 > ɛ n } ɛ. If ˆσ condition holds with C 2 s(2/n)log(p/ɛ) ɛ n, then p and t R, Φ(t ɛ n ɛ n t ) 2ɛ P{τ 1 ( ˆβ β ) ˆσt} Φ(t+ɛ n +ɛ n t )+2ɛ. Consequently, for the covariance matrix V and all fixed m, lim inf a 0 m P{ at ˆβ a T β } ˆσΦ 1 (1 α/2)(a T Va) 1/2 } = 1 α n
19 Asymptotic normality of LDPE and LDPE based confidence interval Remark In implementation with λ = λ unit, z is the residual of the Lasso estimator in the regression model for x against X with a penalty level λ to guarantee η 2logp. Thus, the dimension constraint for the asymptotic normality and proper coverage probability in Theorem is s(logp)/ n 0. τ 1/ z 2 n 1/2 is expected. Joint asymptotic normality of the LDPE for finitely many ˆβ (z T ɛ/ z 2, p) has a multivariate normal distribution with identical marginal distribution N(0, σ 2 ) Approximate coverage probability of confidence intervals Under additional condition on signal level estimator
20 Oracle inequalities and Random designs Condition for Scaled Lasso estimators Theorem 2 and Sun and Zhang, 2011 Random designs X: column normalized version of a Gaussian random matrix X Linear regression of x against X : motive for the use of the above Lasso path Theorem 3 and Remark 3
21 Simulation study n = 200, p = 3000 λ = λ univ = (2/n)logp β = 3λ univ for = 1500, 1800, 2100, 2400, 2700, 3000, β = 3λ univ / α for all other. Simulated data: ( X, X, y) X = ( x i ) n p has iid N(0, Σ) rows with Σ = (ρ ( k) ) p p X = (x 1,, x p ), x = x n/ x 2 y = Xβ + ɛ, ɛ N(0, I ), Settings (A): (α, ρ) = (2, 1/5); (B): (α, ρ) = (1, 1/5) (C): (α, ρ) = (2, 4/5); (D): (α, ρ) = (1, 4/5) Estimators: LDPE, restricted LDPE, Lasso, scaled Lasso ˆβ (init), ˆσ: Scaled Lasso z : Lasso path procedure, with κ 0 = 1/2, and λ m = 20 for restricted Lasso = λ univ
22 Asymptotic normality of LDPE estimator for the largest β Figure: Histogram of errors of maximal β estimation in settings (A), (B), (C), and (D)
23 Behavior of LDPE estimator for the largest β Table: Summary statistics of errors of maximal β estimation in every setting
24 Coverage probability of LDPE based confidence interval Table: Mean coverage for LDPEs of all β Figure: Top: relative coverage frequencies versus index; Bottom: variables percentage for given relative coverage frequency values
25 Performance of restricted LDPE Figure: Top: plots of LDPE and restricted LDPE median of ˆβ versus index; Bottom: relative coverage frequencies of LDPE and restricted LDPE versus index in setting (D)
26 Point estimation performance of LDPE and thresholded LDPE Figure: Plots of median absolute error for scaled Lasso, LDPE and thresholded LDPE in settings (A) and (D)
27 L 2 loss of Lasso, scaled Lasso, thresholded LDPE Table: Summary statistics of L 2 loss for Lasso, scaled Lasso, and thresholded LDPE
28 Thank you!!!
Sample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationarxiv: v2 [math.st] 15 Sep 2015
χ 2 -confidence sets in high-dimensional regression Sara van de Geer, Benjamin Stucky arxiv:1502.07131v2 [math.st] 15 Sep 2015 Abstract We study a high-dimensional regression model. Aim is to construct
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationGraphlet Screening (GS)
Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationThe adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)
Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationSparse Matrix Inversion with Scaled Lasso
Sparse Matrix Inversion with Scaled Lasso arxiv:1202.2723v2 [math.st] 14 Oct 2013 Tingni Sun Statistics Department, The Wharton School, University of Pennsylvania Philadelphia, Pennsylvania, 19104 tingni@wharton.upenn.edu
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationarxiv: v2 [math.st] 9 Feb 2017
Submitted to Biometrika Selective inference with unknown variance via the square-root LASSO arxiv:1504.08031v2 [math.st] 9 Feb 2017 1. Introduction Xiaoying Tian, and Joshua R. Loftus, and Jonathan E.
More informationModeling Real Estate Data using Quantile Regression
Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationarxiv: v1 [stat.me] 26 Sep 2012
Correlated variables in regression: clustering and sparse estimation Peter Bühlmann 1, Philipp Rütimann 1, Sara van de Geer 1, and Cun-Hui Zhang 2 arxiv:1209.5908v1 [stat.me] 26 Sep 2012 1 Seminar for
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationBootstrapping high dimensional vector: interplay between dependence and dimensionality
Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationThe Australian National University and The University of Sydney. Supplementary Material
Statistica Sinica: Supplement HIERARCHICAL SELECTION OF FIXED AND RANDOM EFFECTS IN GENERALIZED LINEAR MIXED MODELS The Australian National University and The University of Sydney Supplementary Material
More informationGeneralized Concomitant Multi-Task Lasso for sparse multimodal regression
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort
More informationWithin Group Variable Selection through the Exclusive Lasso
Within Group Variable Selection through the Exclusive Lasso arxiv:1505.07517v1 [stat.me] 28 May 2015 Frederick Campbell Department of Statistics, Rice University and Genevera Allen Department of Statistics,
More informationarxiv: v3 [stat.me] 8 Jun 2018
Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationOn Model Selection Consistency of Lasso
On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationDISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich
Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationConfounder Adjustment in Multiple Hypothesis Testing
in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationDelta Theorem in the Age of High Dimensions
Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high
More informationSparse PCA in High Dimensions
Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationThe Iterated Lasso for High-Dimensional Logistic Regression
The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of
More informationQuantile Regression for Extraordinarily Large Data
Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationLinear programming II
Linear programming II Review: LP problem 1/33 The standard form of LP problem is (primal problem): max z = cx s.t. Ax b, x 0 The corresponding dual problem is: min b T y s.t. A T y c T, y 0 Strong Duality
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationUniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationBayesian Linear Models
Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors
More informationECAS Summer Course. Quantile Regression for Longitudinal Data. Roger Koenker University of Illinois at Urbana-Champaign
ECAS Summer Course 1 Quantile Regression for Longitudinal Data Roger Koenker University of Illinois at Urbana-Champaign La Roche-en-Ardennes: September 2005 Part I: Penalty Methods for Random Effects Part
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More information. a m1 a mn. a 1 a 2 a = a n
Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by
More informationHigh-dimensional statistics, with applications to genome-wide association studies
EMS Surv. Math. Sci. x (201x), xxx xxx DOI 10.4171/EMSS/x EMS Surveys in Mathematical Sciences c European Mathematical Society High-dimensional statistics, with applications to genome-wide association
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationRobust estimation, efficiency, and Lasso debiasing
Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More information3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationCovariate-Assisted Variable Ranking
Covariate-Assisted Variable Ranking Tracy Ke Department of Statistics Harvard University WHOA-PSI@St. Louis, Sep. 8, 2018 1/18 Sparse linear regression Y = X β + z, X R n,p, z N(0, σ 2 I n ) Signals (nonzero
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More information