The annals of Statistics (2006)
|
|
- Theodora York
- 6 years ago
- Views:
Transcription
1 High dimensional graphs and variable selection with the Lasso Nicolai Meinshausen and Peter Buhlmann The annals of Statistics (2006) presented by Jee Young Moon Feb High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
2 Inverse covariance matrix 0 in Σ 1 = conditional independence (X a X b all the remaining variables) = no edge between these two variables (nodes) Traditionally, Dempster (1972): introduced covariance selection. Discovering the conditional independence. Forward search: edges are added iteratively. MLE fit (Speed and Kiiveri 1986) for O(p 2 ) different models. But, the existence of MLE is not guaranteed in general if the number of observation is smaller than the number of nodes (Buhl 1993) Neighborhood selection with the Lasso (here) : optimization of a convex function, applied consecutively to each node in the graph. High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
3 Neighborhood Neighborhood ne a of a node a Γ = smallest subset of Γ\{a} so that X a all the remaining X nea = {b Γ\{a} : (a, b) E}. High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
4 Notations p(n) = Γ(n) = the number of nodes (the number of variables) n: the number of observartions Optimal prediction of X a given all remaining variables θ a = arg min E(X a θ k X k ) 2 θ:θ a=0 k Γ(n) Optimal prediction θ a,a where A Γ(n)\{a} θ a,a = arg A: active set. Relation to conditional independence is θb a = Σ 1 ab /Σ 1 aa. ne a = {b Γ(n) : θb a 0}. min E(X a θ k X k ) 2 θ:θ k =0, k / A k Γ(n) High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
5 Neighborhood selection with Lasso Lasso estimate ˆθ a,λ of θ a Neighborhood estimate ˆθ a,λ = arg min θ:θ a=0 (n 1 X a X θ 2 + λ θ 1 ) (3) ne ˆ λ a = {b Γ(n) ˆθ a,λ b 0} High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
6 (unavailable) prediction-oracle value λ oracle = arg min E(X a ˆθ a,λ λ k X k) 2 k Γ(n) Proposition 1. Let the number of variables grow to infinity, p(n) for n with p(n) = o(n γ ) for some γ > 0. Assume that the covariance matrices Σ(n) are identical to the identity matrix except for some pair (a, b) Γ(n) Γ(n) for which Σ ab (n) = Σ ba (n) = s for some 0 < s < 1 and all n N. The probability of selecting the wrong neighborhood for node a converges to 1 under the prediction-oracle penalty P( ne ˆ λ oracle a ne a ) 1 for n. High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
7 θ a = (0, K ab /K aa, 0, 0,...) = (0, s, 0, 0,...). To be ne ˆ λ a = ne a, ˆθ a,λ = (0, τ, 0, 0,...) is the oracle Lasso solution for some τ 0. Then, it is the same as 1.P( λ, τ s : ˆθ a,λ = (0, τ, 0, 0,...)) 0 as n. and 2. (0, τ, 0, 0,...) cannot be the oracle Lasso solution as long as τ < s. 1. If ˆθ = (0, τ, 0,...) is a Lasso solution, from Lemma 1 and positiviity of τ, < X 1 τx 2, X 2 > < X 1 τx 2, X k > k Γ(n), k > 2. Substituting X 1 = sx s + W 1 yields < W 1, X 2 > (τ s) < X 2, X 2 > < W 1, X k > (τ s) < X 2, X k >. Let U k =< W 1, X k >. U k, k = 2,..., p(n) are exchangeable. Let It is sufficient to show D =< X 2, X 2 > max < X 2, X k >. k Γ(n),k>2 P(U 2 > max U k + (τ s)d) 0 for n. k Γ(n),k>2 High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
8 Since τ s > 0, P(U 2 > P(U 2 > P(U 2 > max U k + (τ s)d) P(U 2 > max U k) when D >= 0. k Γ(n),k>2 k Γ(n),k>2 max k Γ(n),k>2 U k + (τ s)d) 1 when D < 0. max U k + (τ s)d) P(U 2 > max U k) + P(D < 0). k Γ(n),k>2 k Γ(n),k>2 By Berstein inequality and p(n) = o(n γ ), P(D < 0) 0 for n. Since U 2,..., U p(n) are exchangeable, P(U 2 > max k Γ(n),k>2 U k ) = P(U 3 > max k Γ(n),k=2,>3 U k ) = = P(U p > max 2<=k<p 1 U k ) and sum of those should be 1. Therefore, P(U 2 > max k Γ(n),k>2 U k ) = (p(n) 1) 1 0 for n. High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
9 2. (0, τ, 0,...) with τ < s cannot be the oracle Lasso solution. Suppose (0, τ max, 0, 0,...) is the Lasso solution ˆθ a,λ for some λ = λ > 0 with τ max < s. Since τ max is the maximal value such that (0, τ, 0,...) is a Lasso solution, there exists some k Γ(n) > 2 such that n 1 < X 1 τ max X 2, X 2 > = n 1 < X 1 τ max X 2, X k > = λ. For sufficiently small δλ 0, a Lasso solution for the penalty λ δλ is given by (0, τ max + δθ 2, δθ 3, 0,...). From LARS, δθ 2 = δθ 3. If we compare the squared error for these solution L δθ L 0 = 2(s τ max )δθ + 2δθ 2 < 0 for any 0 < δθ < 1/2(s τ max ) High dimensional graphs and variable selection with the LassoFeb. Nicolai 19. Meinshausen and / 21Pe
10 Lemma 1 Lasso estimate ˆθ a,a,λ of θ a,a is given by. Lemma 1 ˆθ a,a,λ = arg min θ:θ k =0 k / A (n 1 X a X θ 2 + λ θ 1 ) (10) Given θ R p(n), let G(θ) be a p(n)-dimensional vector with elements G b (θ) = 2n 1 < X a X θ, X b >. A vector ˆθ with ˆθ k = 0, k Γ(n)\A is a solution to the above for all b A, G b (ˆθ) = sign( ˆθ b )λ in case ˆθ b 0 and G b (ˆθ) λ in case ˆθ b = 0. Moreover, if the solution is not unique and G b (ˆθ) < λ for some solution θ, then ˆθ b = 0 for all solution of the above. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 10 and / 21Pe
11 Proof. D(θ) = subdifferential of (n 1 X a X θ 2 + λ θ 1 ) with respect to θ = {G(θ) + λe, e S} where S R p(n) is given by S = {e R p(n) e b = sign(θ b ) if θ b 0 and e b [ 1, 1]}. ˆθ is a solution to the above iff d D(θ) so that d b = 0 b A. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 11 and / 21Pe
12 Assumptions X N(0, Σ) High-dimensionality Assumption 1. There exists γ > 0 so that p(n) = O(n γ ) for n. Non-singularity Assumption 2. (a) For all a Γ(n) and n N, Var(X a ) = 1. (b) There exists v 2 > 0 so that for all n N and a Γ(n), Var(X a X Γ(n)\{a} ) v 2.[This excludes singular or nearly singular covariance matrices.] Sparsity Assumption 3. There exists some 0 κ < 1 so that max a Γ(n) ne a = O(n κ ) for n. [restriction on the size of the neighborhood]. Assumption 4. There exists some ϑ < so that for all neighboring nodes a, b Γ(n) and all n N, θ a,ne b\{a} 1 ϑ. [This is fulfilled if assumption 2 holds and the size of the overlap of neighborhoods is bounded by an arbitrarily large number from above. ] High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 12 and / 21Pe
13 Magnitude of partial correlations Assumption 5. There exists a constant δ > 0 and some ξ > κ so that for every (a, b) E, π a,b δn (1 ξ)/2. Neighborhood stability S a (b) := k ne a sign(θ a,nea k )θ b,nea k. Assumption 6. There exists some δ < 1 so that for all a, b Γ(n) with b / ne a, S a (b) < δ. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 13 and / 21Pe
14 Theorem 1: controlling type-i error Theorem 1 Let assumptions 1-6 be fulfilled. Let the penalty parameter satisfy λ n dn (1 ɛ)/2 with some κ < ɛ < ξ and d > 0. There exists some c > 0 so that, for all a Γ(n), P( ne ˆ λ a ne a ) = 1 O(exp( cn ɛ )) for n. It means that the probability of falsely including any of the non-neighboring variables is vanishing exponentially fast. Proposition 3 says that assumption 6 cannot be relaxed. Proposition 3 If there exists some a, b Γ(n) with b / ne a and S a (b) > 1, then P( ne ˆ λ a ne a ) 0 for n. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 14 and / 21Pe
15 Proof of Thm 1 P( ne ˆ λ a,λ a ne a ) = 1 P( b Γ(n)\cl a : ˆθ b 0). Consider the Lasso estimate ˆθ a,nea,λ which is constrained to have non-zero components only in ne a. Let E be the event max G k (ˆθ a,nea,λ ) < λ. k Γ(n)\cl a On this event, by Lemma 1, ˆθ a,nea,λ is a solution of (3) with A = Γ(n)\{a} as well as a solution of (10). P( b Γ(n)\cl a : ˆθ a,λ b 0) 1 P(E) = P( max k Γ(n)\cl a G k (ˆθ a,nea,λ ) λ ). It is sufficient to show there exists a constant c > 0 so that for all b Γ(n)\cl a, P( G b (ˆθ a,nea,λ ) λ) = O(exp( cn ɛ )). High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 15 and / 21Pe
16 cont. prof of Thm 1. One can write for any b Γ(n)\cl a, X b = m ne a θ b,nea m X m + V b, where V b N(0, σ 2 b ) for some σ2 b 1 and V b is independent of {X m m cl a }. Plugging this in gradient calculation, G b (ˆθ a,nea,λ ) = 2n 1 θm b,nea < X a X ˆθ a,nea,λ, X m > m ne a 2n 1 < X a X ˆθ a,nea,λ, V b >. By lemma 2, there exists some c > 0 so that with probability 1 O(exp( cn ɛ )), sign(ˆθ a,nea,λ ) = sign(θ a,nea k ), k ne a. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 16 and / 21Pe
17 cont. prof of Thm 1. With Lemma1, assumption 6, we get with probability 1 O(exp( cn ɛ )) and some δ < 1, G b (ˆθ a,nea,λ ) δλ + 2n 1 < X a X ˆθ a,nea,λ, V b >. Then, it remains to be shown that P( 2n 1 < X a, V b > (1 δ)λ) = O(exp( cn ɛ )). High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 17 and / 21Pe
18 Theorem 2 Let the assumptions of Theorem 1 be fulfilled. For λ = λ n as a in Theorem 1, it holds for some c > 0 that P(ne a ne ˆ λ a ) = 1 O(exp( cn ɛ )) for n. Proposition 4 says that assumption 5 cannot be relaxed. Proposition 4 Let the assumptions of Theorem 1 be fulfilled with ϑ < 1 in Assumption 4. For a Γ(n), let there be some b γ(n)\{a} with π ab 0 and π ab = O(n (1 ξ)/2 ) for n for some ξ < ɛ. Then P(b ne ˆ λ a ) 0 for n. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 18 and / 21Pe
19 Proof of Thm2 Let E be the event P(ne a ne ˆ λ a,λ a ) = 1 P( b ne a : ˆθ b = 0) max G k (ˆθ a,nea,λ ) < λ. k Γ(n)\cl a As in Thm 1, ˆθ a,nea,λ is a solution of (3). Then, P( b ne a : θ a,λ ˆ a,nea,λ b = 0) P( b ne a : ˆθ b = 0) + P(E c ). P(E C ) = O(exp( cn ɛ )) by theorem 1 and P(ˆθ a,nea,λ = 0) = O(exp( cn ɛ )) by lemma 2. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 19 and / 21Pe
20 Edge set Ideally, edge set can be given by An estimate of the edge set is E = {(a, b) : a ne b b ne a }. Ê λ, = {(a, b) : a ne ˆ λ b b neλ ˆ a }. Or Ê λ, = {(a, b) : a ne ˆ λ b b neλ ˆ a }. Corollary 1 Under the conditions of Theorem 2, for some c > 0, P(Ê λ = E) = 1 P(exp( cn ɛ )) for n. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 20 and / 21Pe
21 λ? How to choose the penalty? For any level 0 < α < 1, the penalty λ(α) = 2ˆσ a Φ 1 α ( n 2p(n) 2 ). Theorem 3 Assumptions 1-6 be fulfilled. Using the penalty λ(α), it holds for all n N that P( a Γ(n) : Ĉ λ a C a ) α. This constrains the probability of (falsely) connecting two distinct connectivity components of the true graph. High dimensional graphs and variable selection with the LassoFeb. Nicolai Meinshausen 21 and / 21Pe
HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO
The Annals of Statistics 2006, Vol. 34, No. 3, 1436 1462 DOI: 10.1214/009053606000000281 Institute of Mathematical Statistics, 2006 HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO BY NICOLAI
More informationarxiv:math/ v1 [math.st] 1 Aug 2006
The Annals of Statistics 2006, Vol. 34, No. 3, 1436 1462 DOI: 10.1214/009053606000000281 c Institute of Mathematical Statistics, 2006 arxiv:math/0608017v1 [math.st] 1 Aug 2006 HIGH-DIMENSIONAL GRAPHS AND
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationHigh dimensional ising model selection using l 1 -regularized logistic regression
High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationRelaxed Lasso. Nicolai Meinshausen December 14, 2006
Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationExtended Bayesian Information Criteria for Model Selection with Large Model Spaces
Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationLecture 3: Huge-scale optimization problems
Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationVariable selection with error control: Another look at Stability Selection
Variable selection with error control: Another look at Stability Selection Richard J. Samworth and Rajen D. Shah University of Cambridge RSS Journal Webinar 25 October 2017 Samworth & Shah (Cambridge)
More informationLASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA
The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More information(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively);
STABILITY OF EQUILIBRIA AND LIAPUNOV FUNCTIONS. By topological properties in general we mean qualitative geometric properties (of subsets of R n or of functions in R n ), that is, those that don t depend
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationProximity-Based Anomaly Detection using Sparse Structure Learning
Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction
More informationStatistical Inference
Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationA Novel Sparse Orthogonal Matrix Construction over the Fields of Characteristic 2
A Novel Sparse Orthogonal Matrix Construction over the Fields of Characteristic 2 Yuri L. Borissov 1 Moon Ho Lee 2 1 Department of Mathematical Foundations of Informatics Institute of Mathematics and Informatics,
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationConsistent change point estimation. Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University
Consistent change point estimation Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University Outline of presentation Change point problem = variable selection
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationLasso-type recovery of sparse representations for high-dimensional data
Lasso-type recovery of sparse representations for high-dimensional data Nicolai Meinshausen and Bin Yu Department of Statistics, UC Berkeley December 5, 2006 Abstract The Lasso (Tibshirani, 1996) is an
More informationML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58
ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationThe adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)
Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationThe Constrained Lasso
The Constrained Lasso Gareth M. ames, Courtney Paulson and Paat Rusmevichientong Abstract Motivated by applications in areas as diverse as finance, image reconstruction, and curve estimation, we introduce
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationTopics in Approximation Algorithms Solution for Homework 3
Topics in Approximation Algorithms Solution for Homework 3 Problem 1 We show that any solution {U t } can be modified to satisfy U τ L τ as follows. Suppose U τ L τ, so there is a vertex v U τ but v L
More information3.3 Easy ILP problems and totally unimodular matrices
3.3 Easy ILP problems and totally unimodular matrices Consider a generic ILP problem expressed in standard form where A Z m n with n m, and b Z m. min{c t x : Ax = b, x Z n +} (1) P(b) = {x R n : Ax =
More informationMultivariate Least Weighted Squares (MLWS)
() Stochastic Modelling in Economics and Finance 2 Supervisor : Prof. RNDr. Jan Ámos Víšek, CSc. Petr Jonáš 12 th March 2012 Contents 1 2 3 4 5 1 1 Introduction 2 3 Proof of consistency (80%) 4 Appendix
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationSubgradient methods for huge-scale optimization problems
Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems
More informationModel Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data Onureena Banerjee Electrical Engineering and Computer Sciences University of California at Berkeley
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationarxiv: v1 [stat.ml] 5 Oct 2018
High-Dimensional Poisson DAG Model Learning Using l 1 -Regularized Regression Gunwoong Park 1 1 Department of Statistics, University of Seoul Abstract arxiv:1810.02501v1 [stat.ml] 5 Oct 2018 In this paper
More informationarxiv: v3 [stat.me] 10 Mar 2016
Submitted to the Annals of Statistics ESTIMATING THE EFFECT OF JOINT INTERVENTIONS FROM OBSERVATIONAL DATA IN SPARSE HIGH-DIMENSIONAL SETTINGS arxiv:1407.2451v3 [stat.me] 10 Mar 2016 By Preetam Nandy,,
More informationProofs for Large Sample Properties of Generalized Method of Moments Estimators
Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationHigh-dimensional statistics and data analysis Course Part I
and data analysis Course Part I 3 - Computation of p-values in high-dimensional regression Jérémie Bigot Institut de Mathématiques de Bordeaux - Université de Bordeaux Master MAS-MSS, Université de Bordeaux,
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationCSE 312 Final Review: Section AA
CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material
More informationSusceptible-Infective-Removed Epidemics and Erdős-Rényi random
Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all
More informationA Robust Event-Triggered Consensus Strategy for Linear Multi-Agent Systems with Uncertain Network Topology
A Robust Event-Triggered Consensus Strategy for Linear Multi-Agent Systems with Uncertain Network Topology Amir Amini, Amir Asif, Arash Mohammadi Electrical and Computer Engineering,, Montreal, Canada.
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationParametrizations of Discrete Graphical Models
Parametrizations of Discrete Graphical Models Robin J. Evans www.stat.washington.edu/ rje42 10th August 2011 1/34 Outline 1 Introduction Graphical Models Acyclic Directed Mixed Graphs Two Problems 2 Ingenuous
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationComputing the MLE and the EM Algorithm
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations
More informationOn the uncertainty of the minimal distance between two confocal Keplerian orbits
On the uncertainty of the minimal distance between two confocal Keplerian orbits Giovanni F. Gronchi and Giacomo Tommei Dipartimento di Matematica, Università di Pisa Largo B. Pontecorvo 5, 56127 Pisa,
More informationRATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University
Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active
More informationMATH 151, FINAL EXAM Winter Quarter, 21 March, 2014
Time: 3 hours, 8:3-11:3 Instructions: MATH 151, FINAL EXAM Winter Quarter, 21 March, 214 (1) Write your name in blue-book provided and sign that you agree to abide by the honor code. (2) The exam consists
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationArchitectures and Algorithms for Distributed Generation Control of Inertia-Less AC Microgrids
Architectures and Algorithms for Distributed Generation Control of Inertia-Less AC Microgrids Alejandro D. Domínguez-García Coordinated Science Laboratory Department of Electrical and Computer Engineering
More informationThe Iterated Lasso for High-Dimensional Logistic Regression
The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of
More informationAn Adaptive LASSO-Penalized BIC
An Adaptive LASSO-Penalized BIC Sakyajit Bhattacharya and Paul D. McNicholas arxiv:1406.1332v1 [stat.me] 5 Jun 2014 Dept. of Mathematics and Statistics, University of uelph, Canada. Abstract Mixture models
More informationGRAPHICAL MODELS FOR HIGH DIMENSIONAL GENOMIC DATA. Min Jin Ha
GRAPHICAL MODELS FOR HIGH DIMENSIONAL GENOMIC DATA Min Jin Ha A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSUPPLEMENT TO SLOPE IS ADAPTIVE TO UNKNOWN SPARSITY AND ASYMPTOTICALLY MINIMAX. By Weijie Su and Emmanuel Candès Stanford University
arxiv: arxiv:503.08393 SUPPLEMENT TO SLOPE IS ADAPTIVE TO UNKNOWN SPARSITY AND ASYMPTOTICALLY MINIMAX By Weijie Su and Emmanuel Candès Stanford University APPENDIX A: PROOFS OF TECHNICAL RESULTS As is
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationPrimal-dual IPM with Asymmetric Barrier
Primal-dual IPM with Asymmetric Barrier Yurii Nesterov, CORE/INMA (UCL) September 29, 2008 (IFOR, ETHZ) Yu. Nesterov Primal-dual IPM with Asymmetric Barrier 1/28 Outline 1 Symmetric and asymmetric barriers
More informationNetwork Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis, and Muriel Medard
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-214-28 December 2, 214 Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis,
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More information