Composite Loss Functions and Multivariate Regression; Sparse PCA

Size: px
Start display at page:

Download "Composite Loss Functions and Multivariate Regression; Sparse PCA"

Transcription

1 Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, to appear. G. Obozinski, M. J. Wainwright, and M. I. Jordan (2009). Union support recovery in multivariate regression. Annals of Statistics, under review. A. Amini and M. J. Wainwright (2009). High-dimensional analysis of semidefinite relaxations for sparse PCA. Annals of Statistics, to appear. 1

2 Introduction classical asymptotic theory of statistical inference: number of observations n + model dimension p stays fixed not suitable for many modern applications: { images, signals, systems, networks } frequently large (p )... interesting consequences: might have p = Θ(n) or even p n curse of dimensionality: procedures unless p/n 0 frequently impossible to obtain consistent can be saved by a lower effective dimensionality, due to some form of complexity constraint 2

3 Example: Sparse linear regression y X β w n = n p + S S c vector β R p with at most k p non-zero entries observation model: noisy linear observations y = Xβ + w X R n p : design matrix w R n 1 : noise vector various applications (database sketching, imaging, genetic testing...) 3

4 Example: Graphical model selection consider m-dimensional random vector Z = (Z 1,...,Z m ): P(Z 1,...,Z m ; β) exp (i,j) E β ij Z i Z j. given n independent and identically distributed (i.i.d.) samples of Z, identify underlying graph G = (V, E) lower effective dimensionality: graphs with k p := ( m 2) edges 4

5 Example: Sparse principal components analysis = Σ ZZ T D Set-up: Covariance matrix Σ = ZZ T + D, where leading eigenspace Z has sparse columns. Σ. Goal: Produce an estimate Ẑ based on samples X(i) with covariance matrix 5

6 Some issues in high-dimensional inference Consider some fixed loss function, and a fixed level δ of error. Given particular (polynomial-time) algorithms for what sample sizes n do they succeed/fail to achieve error δ? when does more computation reduce minimum # samples needed? 6

7 Outline 1. Multivariate regression in high dimensions (a) Practical limitations: scaling laws for second-order cone programs (b) SOCP vs. Lasso: when does more computation reduce statistical error? 2. Sparse principal component analysis in high dimensions (a) Thresholding methods (b) Semidefinite programming 7

8 Optimization-based estimators in (sparse) regression y X β w n = n p + S S c Regularized QP: β b 1 arg min β R p 2n y Xβ 2 2 {z } Data term + ρ n R(β). {z } Regularizer R(β) = β 2 Ridge regression (Tik43, HoeKen70) R(β) = β 1 convex l 1 -constrained QP (CheDonSau96; Tibs96) R(β) = β 0 Subset selection: combinatorial, NP-hard (Nat95) R(β) = β a, a (0,1) Non-convex l a regularization 8

9 Different loss functions Given an estimate β, how to assess its performance? 1. Predictive loss: compute expected error E[ ỹ X β 2 2] goal is to construct model with good predictive power β itself of secondary interest (need not be uniquely determined) 2. l 2 -loss E[ β β 2 2] appropriate when B is of primary interest (signal recovery, compressed sensing, denoising etc.) 3. Support recovery criterion: define estimated support S( β) = { i = 1,...,p β i 0 }, and measure probability P[S( β) S(β )]. useful for feature selection, dimensionality reduction, model selection can be used as a pre-processing step for estimation in l 2 -norm 9

10 1. Multivariate regression in high dimensions Y X B W n r = n p + S S c p r n r signal B is a p r matrix: partitioned into non-zero rows S and zero rows S c observe n noisy projections, defined via design matrix X R n p and noise matrix W R n r matrix Y R n r of observations high-dimensional scaling: allow parameters (n, p, r, S ) to scale 10

11 Block regularization and second-order cone programs (Obozinski, Taskar & Jordan, 2009) for fixed parameter q [1, ], estimate B via: bb j 1 arg min B R p r 2n Y XB 2 F {z } P Data term n rp ˆYjl (XB) jl 2 j=1 l=1 ff + ρ n B {z 1,q. } pp (B i1,..., B ir ) q i=1 regularization constant ρ n > 0 to be chosen by user q = 1: elementwise l 1 norm (constrained QP) different cases: q = 2: second-order cone program (SOCP) q = : block l 1 /l max-norm (constrained QP) in all cases, efficiently solvable (e.g., by interior point methods) generalization of the Lasso (Tibshirani, 1996; Chen et al., 1998), special case of the CAP family (Zhao, Rocha, & Yu, 2006); see also (Turlach et al., 2005; Yuan & Lin, 2006, Nardi & Rinaldo, 2008) 11

12 Two strategies Goal: Model selection consistency: recover union of supports S(B ) := {i {1, 2,..., p} B i1,..., B ir 2 0}. Different methods: Lasso-based recovery: 1. Solve a separate Lasso (l 1 -constrained QP) for each column l = 1,..., r, yielding column vector b β l R p. 2. Estimate row support b S Lasso = i {1, 2,..., p} b β il 0 for some l. SOCP-based recovery: 1. Solve a single SOCP, obtaining matrix estimate bb p r. 2. Estimate support b S SOCP = i {1,..., p} ( b B i1,..., b B ir 2 0. Trade-offs: Lasso (QP) cheap to solve, but method ignores coupling among columns SOCP more expensive, but block-regularizer better tailored to matrix structure 12

13 Scaling law for high-dimensional SOCP recovery (Obozinski, Wainwright & Jordan, 2009) SOCP method: bb arg min B R p r 1 2n Y XB 2 F + ρ n B 1,2. Parameters: Problem dimension p; number of non-zero rows k Design matrix X: i.i.d. rows from sub-gaussian distribution, with suitable covariance Σ Theorem: If the rescaled sample size θ SOCP (n, p, k, B ) := n Ψ(B S ;Σ SS) log(p k) is greater than a critical threshold θ l (Σ; σ 2 ), then for suitable ρ n we have with probability greater than 1 2 exp(c 2 log k): (a) the SOCP has a unique solution B b s.t. S( b B) b S(B ), and q (b) It includes all rows i with B i max{k,log(p k)} 2 c 3 n. 13

14 Assumptions on design covariance Σ SS Σ S c S c Σ S c S Σ SS c support set S = {i β i 0} complement S c := {1,... p}\s. random design matrix X R n p rows drawn i.i.d., cov. Σ, sub-gaussian 1. Bounded eigenspectrum: λ(σ SS ) [C min, C max ]. 2. Mutual incoherence/irrepresentability: There exists an ν (0, 1] such that Σ S c S (Σ SS ) 1, 1 ν. P Example: if Σ SS = I, then require max Σ j S c ji 1 ν. i S 14

15 Order parameter captures threshold (Angle 0 ) Prob. success versus rescaled sample size θ SOCP (n, p, k, B ) = n Ψ(B S ; Σ SS)log(p k). 15

16 Order parameter captures threshold (Angle 60 ) Prob. success versus rescaled sample size θ SOCP (n, p, k, B ) = n Ψ(B S ; Σ SS)log(p k). 16

17 Sparsity overlap function Ψ form gradient matrix Z(B S ) := B S 1,2 BS =B S R k r equivalent to renormalizing B S to have unit l 2-norm rows form r r Gram matrix: G = Z T (Σ SS ) 1 Z with G a,b = Z a, Z b (ΣSS ) 1 sparsity overlap function is max. eigenvalue of G: Ψ(B S ; Σ SS) = G 2. measures relative alignments of the renormalized columns of B Special case: Univariate regression (r = 1): Z(β S ) = k for any vector β S 17

18 Concrete examples (k = 4,r = 2) Aligned columns B S 2 Z(B S ) Orthogonal columns B S 2 Z(B S ) G =» G 2 = 4 G =» G 2 = 2 18

19 Empirical illustration of sparsity-overlap Ψ Orthogonal regression: Columns Z 1 Z 2 Intermediate angle: Columns at 60 Aligned regression: Columns parallel Ordinary Lasso: solve problems separately. 19

20 SOCP versus ordinary QP Corollary: If Σ SS =I k k, SOCP always dominates ordinary QP, with relative statistical efficiency: 1 max k l log(p k l ) l=1,...r Ψ(BS ; I)log(p k) {z } (QP sample size)/(socp sample size) r increased statistical efficiency of SOCP: dependent on orthogonality properties of rescaled columns B S up to a factor 1/r reduction in number of samples required most pessimistic case: no gain for disjoint supports, SOCP can be worse in some cases (if Σ SS I) 20

21 Proof sketch of sufficient conditions Direct analysis : Given n observations of β R p with S(β ) = k, oracle decoder performs following two 1. For each subset S of size k, solve the quadratic program: steps: f(s) = min β S R k Y X Sβ S Output the subset b S = arg min S =k f(s). by symmetry of ensemble, may assume that fixed subset S is chosen for sets U different from true set S, consider range of non-overlaps t := U\S {1,..., k} number of subsets with non-overlap t given by N(t) = ` k `p k t k t 21

22 Error exponents for random projections union bound yields upper bound on error probability P[error S true]: kx k p k t=1 k t t P[error on subset with non-overlap t] orthogonal projection Π U := I n n X U ˆXT U X 1 U X T U optimal decoder chooses U incorrectly over S if and only if (U) = Π U X S\U β S\U + W 2 {z } Π S W 2 {z } < 0 effective noise in U effective noise in S use large deviations to establish that P[ (U) < 0] exp n F( β S\U 2 ; t). 22

23 Proof sketch of necessary conditions Fano s inequality applied to a restricted ensemble, assuming fixed choice of β : β i [U] = ( βmin if i U 0 otherwise. by Fano s inequality, probability of success upper bounded as 1 P[error] I(Y ; β ) log(m 1) o(1), where I(Y ; β ): mutual information between β and observation vector Y M = `p k : number of competing models some work to establish the upper bound holds w.h.p. for X: I(Y, β X) n 2 log»1 + (1 k p )kβ2 min 23

24 2. High-dimensional analysis of sparse PCA principal components analysis (PCA): classical method for dimensionality reduction high-dimensional version: eigenvectors from sample covariance bσ based on n samples in p dimensions in general, high-dimensional PCA inconsistent unless p/n 0 (Joh01, JohLu04) natural to investigate more structured ensembles for which consistency still possible even with p/n + : sparse eigenvector recovery (JolEtal03, JohLu04, ZouEtAl06) sparse covariance matrices (LevBic06,ElKar07) 24

25 Spiked covariance ensembles sequences {Σ p } of spiked population covariance matrices: Σ p = MX α i β i β T i + Γ p, i=1 with leading eigenvectors (β i, i = 1,... M). past work on identity spiked ensembles (Γ p = I p ) (Joh01; JohLu04) different sparsity models: hard sparsity model: β has exactly k non-zero coefficients weak l q -sparsity: β belongs to the l q - ball : B q (R q ) = z R p px z i q R q. i=1 given n i.i.d. samples {X i } n i=1 with E[X i] = 0 and cov(x i ) = Σ p 25

26 SDP relaxation of sparse PCA (D Asprémont, El Ghaoui, Jordan & Lanckriet, 2006) Courant-Fischer variational principle for maximum eigenvalue/vector (PCA): λ max (Q) = max z 2 =1 zt Qz. equivalent/exact semidefinite program (SDP) of max. eigenvector: λ max (Q) = max Z 0,trace(Z)=1 trace `Z Q. SDP relaxation of sparse PCA: 8 < bz = arg max Z 0,trace(Z)=1 : trace `Z X Q ρ n` with regularization parameter ρ n > 0 chosen by user. i,j 9 = Z ij ;, 26

27 Rates in spectral norm given n samples from spiked identity model Σ p = αzz T + σ 2 I p eigenvector z in weak l q -ball B q (R q ) SDP relaxation: bz arg min Z 0,trace(Z)=1 trace(z bσ) + ρ n P i,j Z ij. Theorem: (AmiWai08b) Suppose that we apply q the SDP to the sample covariance Σ b with regularization parameter ρ n = f(α, σ 2 log p ) n. Then with probability greater than 1 c 1 exp( c 2 log p) 0, we have: bz zz T `log p 1 2 C R 2(1+q) q. n Example (Hard sparsity): q = 0, and radius R q = k (# non-zeros) s Z b zz T k 2 log p 2 C. n 27

28 Comparison to some known results Estimating sparse covariance matrices Thresholding estimator T λ n(bσ) achieves rate: (BicLev07) `log p 1 q T λ n(bσ) Σ 2 CR 2 q. n by matrix perturbation results, for well-separated eigenvalues, same rate applies to leading eigenvector agrees with SDP result for q = 0, but slower rate for q > 0 Minimax rates for q (0, 2): with sign bz, z = 1: (PauJoh08) min bz max z Bq(Rq) E[ bz z 2 2 ] C R q `log p n 1 q 2. same rate as normal sequence model (DonJoh94) SDP rate is slower, but approaches minimax rate as q 0 28

29 Model selection consistency for hard sparsity (q = 0) Goal: Given spiked model with k-sparse eigenvector (z i = ± 1 k ), recover support set S(z) = i {1, 2,..., p} z i 0} exactly. Methods: 1. Diagonal thresholding: Complexity O(np + p log p) (JohLu04) (a) Form sample covariance bσ = 1 n P n i=1 X ix T i. (b) Extract top k order statistics b Σ (11),..., b Σ (kk), and estimate support b S(D) by rank indices. 2. SDP-based recovery: Complexity O(np + p 4 log p) (AspLanGhaJor08) (a) Solve SDP with ρ n = α/(2σ 2 k). (b) Given solution b Z, estimate support bs := i {1,..., p} b Z ij 0 for some j. 29

30 Sharp threshold for diagonal thresholding Model: Σ p = αzz T + σ 2 I p Parameters: p model dimension k number of non-zeroes in spiked eigenvector Proposition: (AmiWai08a) If k = O(p 1 δ ) for any δ (0, 1), diagonal thresholding for support recovery controlled by rescaled sample size θ thr (n, p, k) := n k 2 log(p k). I.e., there are constants 0 < τ l (α, σ2 ) τ u (α, σ2 ) < such that (a) Success: If n > τ u k2 log(p k), then P[ bs(d) = S(β)] 1 c 1 exp ` c 2 k 2 log(p k)) 1. (b) Failure: If n τ l k2 log(p k), then P[ bs(d) = S(β)] c 1 exp ( c 2 (log(p k))) 0. 30

31 Performance of diagonal thresholding 1 Diagonal thresholding: k = O(log(p)) 1 Diagonal thresholding (k = O(sqrt(p)) Prob. success p = 100 p = p = 300 p = 600 p = Control parameter Prob. success p = 100 p = p = 300 p = 600 p = Control parameter (a) Log. sparsity (b) Square-root sparsity Probability of success P[S(D) = S(β )] versus rescaled sample size n θ thr (n, p, k) = k 2 log(p k) 31

32 Eigenvector support recovery via SDP relaxation spiked identity model Σ p = αzz T + σ 2 I p with k-sparse eigenvector z SDP relaxation: Z b arg trace(zσ) b P + ρn i,j Z ij. min Z 0,trace(Z)=1 Theorem: (AmiWai08a) Suppose that we solve the SDP with ρ n = α/(2σ 2 k). Then there are constants θ wr and θ crit such that (a) For sample sizes such that θ thr (n, p, k) = solution w.h.p., and (b) For problem sequences such that k = O(log p), and n k 2 log(p k) > θ wr, the SDP has a rank one θ sdp (n, p, k) := n k log(p k) > θ crit, a rank one solution (when it exists) specifies correct support w.h.p. Remarks: technical condition k = O(log p): likely an artifact 32

33 Performance of SDP relaxation 1 SDP relaxation (k = O(log p)) 1 SDP relaxation (k = 0.1 p) Prob. success p = 100 p = 200 p = 300 Prob. success p = 100 p = 200 p = Control parameter (a) Log. sparsity Control parameter (b) Linear sparsity Probability of success P[S( b β) = S(β )] versus rescaled sample size θ sdp (n, p, k) = n k log(p k). 33

34 Summary and open directions 1. When does more computation yield greater statistical accuracy? Multivariate regression: second-order cone programming versus quadratic programming (Lasso) Sparse PCA: diagonal thresholding versus SDP relaxation 2. When are polynomial-time algorithms as good as optimal algorithms? Multivariate regression: Lasso/SOCP order-optimal for k = o(p) Sparse PCA: SDP relaxation order-optimal for k = O(log p) 34

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

approximation algorithms I

approximation algorithms I SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Semidefinite Programming Basics and Applications

Semidefinite Programming Basics and Applications Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Convex sets, conic matrix factorizations and conic rank lower bounds

Convex sets, conic matrix factorizations and conic rank lower bounds Convex sets, conic matrix factorizations and conic rank lower bounds Pablo A. Parrilo Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

(k, q)-trace norm for sparse matrix factorization

(k, q)-trace norm for sparse matrix factorization (k, q)-trace norm for sparse matrix factorization Emile Richard Department of Electrical Engineering Stanford University emileric@stanford.edu Guillaume Obozinski Imagine Ecole des Ponts ParisTech guillaume.obozinski@imagine.enpc.fr

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Info-Greedy Sequential Adaptive Compressed Sensing

Info-Greedy Sequential Adaptive Compressed Sensing Info-Greedy Sequential Adaptive Compressed Sensing Yao Xie Joint work with Gabor Braun and Sebastian Pokutta Georgia Institute of Technology Presented at Allerton Conference 2014 Information sensing for

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

High-dimensional Principal Component Analysis. Arash Ali Amini

High-dimensional Principal Component Analysis. Arash Ali Amini High-dimensional Principal Component Analysis by Arash Ali Amini A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering-Electrical Engineering

More information

General principles for high-dimensional estimation: Statistics and computation

General principles for high-dimensional estimation: Statistics and computation General principles for high-dimensional estimation: Statistics and computation Martin Wainwright Statistics, and EECS UC Berkeley Joint work with: Garvesh Raskutti, Sahand Negahban Pradeep Ravikumar, Bin

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS 1. BY T. TONY CAI AND ANRU ZHANG University of Pennsylvania

ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS 1. BY T. TONY CAI AND ANRU ZHANG University of Pennsylvania The Annals of Statistics 2015, Vol. 43, No. 1, 102 138 DOI: 10.1214/14-AOS1267 Institute of Mathematical Statistics, 2015 ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS 1 BY T. TONY CAI AND ANRU ZHANG University

More information

Sparse signal recovery using sparse random projections

Sparse signal recovery using sparse random projections Sparse signal recovery using sparse random projections Wei Wang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-69 http://www.eecs.berkeley.edu/pubs/techrpts/2009/eecs-2009-69.html

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

High dimensional ising model selection using l 1 -regularized logistic regression

High dimensional ising model selection using l 1 -regularized logistic regression High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014 Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Information Recovery from Pairwise Measurements

Information Recovery from Pairwise Measurements Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Within Group Variable Selection through the Exclusive Lasso

Within Group Variable Selection through the Exclusive Lasso Within Group Variable Selection through the Exclusive Lasso arxiv:1505.07517v1 [stat.me] 28 May 2015 Frederick Campbell Department of Statistics, Rice University and Genevera Allen Department of Statistics,

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

ESTIMATION OF (NEAR) LOW-RANK MATRICES WITH NOISE AND HIGH-DIMENSIONAL SCALING

ESTIMATION OF (NEAR) LOW-RANK MATRICES WITH NOISE AND HIGH-DIMENSIONAL SCALING Submitted to the Annals of Statistics ESTIMATION OF (NEAR) LOW-RANK MATRICES WITH NOISE AND HIGH-DIMENSIONAL SCALING By Sahand Negahban and Martin J. Wainwright University of California, Berkeley We study

More information

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo A. E. Brouwer & W. H. Haemers 2008-02-28 Abstract We show that if µ j is the j-th largest Laplacian eigenvalue, and d

More information

Information-theoretically Optimal Sparse PCA

Information-theoretically Optimal Sparse PCA Information-theoretically Optimal Sparse PCA Yash Deshpande Department of Electrical Engineering Stanford, CA. Andrea Montanari Departments of Electrical Engineering and Statistics Stanford, CA. Abstract

More information

Sketching for Large-Scale Learning of Mixture Models

Sketching for Large-Scale Learning of Mixture Models Sketching for Large-Scale Learning of Mixture Models Nicolas Keriven Université Rennes 1, Inria Rennes Bretagne-atlantique Adv. Rémi Gribonval Outline Introduction Practical Approach Results Theoretical

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model Fang Han Department of Biostatistics Johns Hopkins University Baltimore, MD 21210 fhan@jhsph.edu Han Liu Department

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Spectral Graph Theory Lecture 3. Fundamental Graphs. Daniel A. Spielman September 5, 2018

Spectral Graph Theory Lecture 3. Fundamental Graphs. Daniel A. Spielman September 5, 2018 Spectral Graph Theory Lecture 3 Fundamental Graphs Daniel A. Spielman September 5, 2018 3.1 Overview We will bound and derive the eigenvalues of the Laplacian matrices of some fundamental graphs, including

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011. L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods

More information

Information-Theoretic Methods in Data Science

Information-Theoretic Methods in Data Science Information-Theoretic Methods in Data Science Information-theoretic bounds on sketching Mert Pilanci Department of Electrical Engineering Stanford University Contents Information-theoretic bounds on sketching

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Dimension reduction for semidefinite programming

Dimension reduction for semidefinite programming 1 / 22 Dimension reduction for semidefinite programming Pablo A. Parrilo Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute of Technology

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery

Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery Jonathan Scarlett jonathan.scarlett@epfl.ch Laboratory for Information and Inference Systems (LIONS)

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information