Regularized PCA to denoise and visualise data

Size: px
Start display at page:

Download "Regularized PCA to denoise and visualise data"

Transcription

1 Regularized PCA to denoise and visualise data Marie Verbanck Julie Josse François Husson Laboratoire de statistique, Agrocampus Ouest, Rennes, France CNAM, Paris, 16 janvier / 30

2 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 2 / 30

3 Geometrical point of view Maximize the variance of the projected points Minimize the reconstruction error Approximation of X of low rank (S < p) X n p ˆXn p 2 SVD: ˆXPCA = U n S Λ 1 2 S S V p S F = UΛ 1 2 principal components V principal axes Alternating least squares algorithm: X FV 2 F = XV(V V) 1 V = X F(F F) 1 3 / 30

4 Model in PCA x = signal + noise Random eect model: Factor Analysis, Probabilistic PCA (Lawley, 1953, Roweis, 1998, Tipping & Bishop, 1999) individuals are exchangeable relationship between variables Fixed eect model (Lawley, 1941, Caussinus 1986) individuals are not i.i.d study the individuals and the variables sensory analysis (products - sensory descriptors) plant breeding (genotypes - environments) 4 / 30

5 Fixed eect model X n p = Xn p + ε n p x = S s=1 Randomness due to the noise d s q is r js + ε, ε N (0, σ 2 ) Individuals have dierent expectations: individual parameters Inferential framework: when n the number of parameters Asymptotic: σ 0 Maximum likelihood estimates (ˆθ) = least square estimates EM algorithm = ALS algorithm Not necessarily the best in MSE Shrinkage estimates: minimizes E(φˆθ θ) 2 5 / 30

6 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 6 / 30

7 Minimizing the MSE x = x + ε = S s=1 x s + ε MSE = E i,j min(n 1,p) s=1 φ sˆx (s) x 2 with ˆx (s) = λ s u is v js MSE = E i,j ( S s=1 φ sˆx (s) ) 2 x (s) + min(n 1,p) s=s+1 ( φ sˆx (s) ) x (s) 2 For all s S + 1, x (s) = 0, therefore φ S+1 =... = φ min(n 1,p) = 0. 7 / 30

8 Minimizing the MSE MSE = E i,j ( S s=1 φ sˆx (s) x ) 2 with ˆx (s) = λ s u is v js φ s = i,j i,j (s) E(ˆx ) x E(ˆx (s)2 ) = i,j ( i,j V (ˆx (s) ) + (s) E(ˆx ) x ( E(ˆx (s) ) ) 2 ) When σ 0: E(ˆx (s) ) = x (s) (unbiased); V (ˆx (s) σ ) = 2 (Pazman & Denis, 1996; Denis & Gower, 1999) min(n 1;p) φ s = i,j i,j x (s)2 σ 2 min(n 1;p) + (s)2 i,j x 8 / 30

9 Minimizing the MSE x = x + ε = S s=1 d s q is r js + ε, ε N (0, σ 2 ) MSE = E ( i,j ( S s=1 φ sˆx (s) x ) 2 ) with ˆx (s) = λ s u is v js φ s = i,j i,j x (s)2 σ 2 min(n 1;p) + (s)2 i,j x = = d s npσ 2 min(n 1;p) + d s variance signal variance total ˆφ s = λ s ˆσ 2 λ s for s = 1,..., S 9 / 30

10 Regularized PCA ˆx rpca = ˆx PCA = S ( ) λs ˆσ 2 λs u is v js = s=1 λ s S s=1 S s=1 λs u is v js ( λs ˆσ2 λs ) u is v js Compromise between hard and soft thresholding σ 2 small regularized PCA PCA σ 2 large individuals coordinates toward the center of gravity ˆσ 2 = RSS ddl = n q s=s+1 λ s np p ns ps + S 2 + S (X n p ; U n S ; V p S ) (only asympt unbiased) 10 / 30

11 Illustration of the regularization X sim 4 10 = X ε sim 4 10 sim = 1,..., 1000 PCA rpca rpca more biased less variable 11 / 30

12 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 12 / 30

13 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view Y = Xβ + ε, ε N (0, σ 2 ) β N (0, τ 2 ) ˆβ MAP = (X X + κ) 1 X Y, with κ = σ 2 /τ 2 3 Results 12 / 30

14 Probabilistic PCA (Roweis, 1998, Tipping & Bishop, 1999) A specic case of an exploratory factor analysis model x = (Z n S B p S ) + ε, z i N (0, I S ), ε i N (0, σ 2 I p ) Conditional independence : x i z i N (Bz i, I p ) Distribution for the observations: x i N (0, Σ) with Σ = BB + σ 2 I p Explicit solution: ˆσ 2 = 1 p S p s=s+1 λ s ˆB = V(Λ σ 2 I S ) / 30

15 Probabilistic PCA x = (Z n S B p S ) + ε, z i N (0, I S ), ε i N (0, σ 2 I p ) Random eect model: no individual parameters BLUP E(z i x i ): Ẑ = XˆB(ˆB ˆB + σ2 I S ) 1 ˆX ppca = ẐˆB Using ˆB = V(Λ σ2 I S ) 1 2 and SVD X = UΛ 1/2 V XV = Λ 1/2 U ˆX ppca = U(Λ σ 2 I S )Λ 1 2 V ˆx ppca = S s=1 ( λs ˆσ2 λs ) u is v js Bayesian treatment of the xed eect model with an a priori on z i : Regularized PCA 14 / 30

16 Probabilistic PCA EM algorithm in ppca: two ridge regressions Step E: Step M: Ẑ = XˆB(ˆB ˆB + ˆσ 2 I S ) 1 ˆB = X Ẑ(Ẑ Ẑ + ˆσ 2 Λ 1 ) 1 EM algorithm in PCA: two regressions X FV 2 F = XV(V V) 1 V = X F(F F) 1 15 / 30

17 An empirical Bayesian approach Model: x = S s=1 Prior: x (s) N (0, τs 2 ) ( ) Posterior: E x (s) x (s) + ε, ε N (0, σ 2 ) x (s) ( ) Maximizing the likelihood of x (s) ( function of τs 2 to obtain: ˆτ s 2 = ˆΦ s = λ s = Φ s x (s) with Φ s = τ 2 s τ 2 s + 1 min(n 1;p) σ2 N (0, τs min(n 1;p) σ2 ) as a ) min(n 1;p) ˆσ2 1 np λ 1 s np min(n 1;p) ˆσ2 λ s 16 / 30

18 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 17 / 30

19 Simulation design The simulated data: X = X + ε Vary several parameters to construct X: n/p: 100/20 = 5, 50/50 = 1 and 20/100 = 0.2 S underlying dimensions: 2, 4, 10 the ratio d 1 /d 2 : 4, 1 ε is drawn from a Gaussian distribution with a standard deviation chosen to obtain a signal-to-noise ratio (SNR): 4, 1, / 30

20 Soft thresholding and SURE (Candès et al., 2012) ˆx Soft = min(n,p) s=1 ( λs λ) + u is v js X X 2 + λ X Choosing the tunning parameter λ? MSE(λ) = E X SVTλ (X) 2 Stein's Unbiased Risk Estimate (SURE) min(n,p) SURE(SVT λ )(X) = npσ 2 + min(λ 2, λ i ) + 2σ 2 div(svt λ )(X) i λ is obtained minimizing SURE. Remark: ˆσ 2 is required 19 / 30

21 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 MSE in favor of rpca 20 / 30

22 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 rpca PCA when SNR is small - rpca PCA when SNR is high 20 / 30

23 rpca rpca when d 1 d 2 20 / 30 PCA Regularized PCA Results Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01

24 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 Same order of errors for n/p = 0.2 or 5, smaller for n/p = 1! 20 / 30

25 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 SURE good when data are noisy 20 / 30

26 Simulation from Candes et al. (2012) Same model of simulation n = 500 p = 200, SNR (4, 2, 1, 0.5), S (10, 100) S SNR MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) E E E E E E E E E E E E E E E E E E E E E E E E / 30

27 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 True conguration PCA 22 / 30

28 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 PCA rpca 23 / 30

29 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 PCA SURE 23 / 30

30 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 True conguration rpca Improve the recovery of the inner-product matrix X X 24 / 30

31 Variables representation True conguration PCA cor(x 7, X 9 )= / 30

32 Variables representation PCA rpca cor(x 7, X 9 )= / 30

33 Variables representation PCA SURE cor(x 7, X 9 )= / 30

34 Variables representation True conguration rpca cor(x 7, X 9 )= Improve the recovery of the covariance matrix X X 27 / 30

35 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions PCA rpca 28 / 30

36 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions SURE rpca 28 / 30

37 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions SPCA rpca 28 / 30

38 Real data set PCA rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30

39 Real data set SURE rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30

40 Real data set SPCA rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30

41 Conclusion Recover a low rank matrix from noisy observations singular values thresholding: a popular estimation strategy ( λs a regularized term a priori: ˆσ2 λs ): Minimize MSE - Bayesian xed eect model Good results in term of recovery of the structure, the distances between individuals and the covariance matrix and in term of visualization Asymptotic results Tuning parameter: the number of dimensions CV? Regularized MCA (Takane, 2006) 30 / 30

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Handling missing values in Multiple Factor Analysis

Handling missing values in Multiple Factor Analysis Handling missing values in Multiple Factor Analysis François Husson, Julie Josse Applied mathematics department, Agrocampus Ouest, Rennes, France Rabat, 26 March 2014 1 / 10 Multi-blocks data set Groups

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Signal Denoising with Wavelets

Signal Denoising with Wavelets Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973] Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

5.2 Expounding on the Admissibility of Shrinkage Estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc

More information

Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising

Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising Journal of Machine Learning Research 8 (207) -50 Submitted 5/6; Revised 0/7; Published /7 Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising Jérémie Bigot Institut de

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects P. Richard Hahn, Jared Murray, and Carlos Carvalho July 29, 2018 Regularization induced

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Mixtures of Robust Probabilistic Principal Component Analyzers

Mixtures of Robust Probabilistic Principal Component Analyzers Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Concentration Ellipsoids

Concentration Ellipsoids Concentration Ellipsoids ECE275A Lecture Supplement Fall 2008 Kenneth Kreutz Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San Diego VERSION LSECE275CE

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Approximate Message Passing Algorithms

Approximate Message Passing Algorithms November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)

More information

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems 1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Spatial Modeling and Prediction of County-Level Employment Growth Data

Spatial Modeling and Prediction of County-Level Employment Growth Data Spatial Modeling and Prediction of County-Level Employment Growth Data N. Ganesh Abstract For correlated sample survey estimates, a linear model with covariance matrix in which small areas are grouped

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Probabilistic Models of Past Climate Change

Probabilistic Models of Past Climate Change Probabilistic Models of Past Climate Change Julien Emile-Geay USC College, Department of Earth Sciences SIAM International Conference on Data Mining Anaheim, California April 27, 2012 Probabilistic Models

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

ABSTRACT. Topics on LASSO and Approximate Message Passing. Ali Mousavi

ABSTRACT. Topics on LASSO and Approximate Message Passing. Ali Mousavi ABSTRACT Topics on LASSO and Approximate Message Passing by Ali Mousavi This thesis studies the performance of the LASSO (also known as basis pursuit denoising) for recovering sparse signals from undersampled,

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Structured signal recovery from non-linear and heavy-tailed measurements

Structured signal recovery from non-linear and heavy-tailed measurements Structured signal recovery from non-linear and heavy-tailed measurements Larry Goldstein* Stanislav Minsker* Xiaohan Wei # *Department of Mathematics # Department of Electrical Engineering UniversityofSouthern

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Prediction by supervised principal components

Prediction by supervised principal components Prediction by supervised principal components Eric Bair Trevor Hastie Debashis Paul and Robert Tibshirani September 15, 2004 Summary In regression problems where the number of predictors greatly exceeds

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Missing values imputation for mixed data based on principal component methods

Missing values imputation for mixed data based on principal component methods Missing values imputation for mixed data based on principal component methods Vincent Audigier, François Husson & Julie Josse Agrocampus Rennes Compstat' 2012, Limassol (Cyprus), 28-08-2012 1 / 21 A real

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

Chapter 14 Stein-Rule Estimation

Chapter 14 Stein-Rule Estimation Chapter 14 Stein-Rule Estimation The ordinary least squares estimation of regression coefficients in linear regression model provides the estimators having minimum variance in the class of linear and unbiased

More information

Sparse Subspace Clustering

Sparse Subspace Clustering Sparse Subspace Clustering Based on Sparse Subspace Clustering: Algorithm, Theory, and Applications by Elhamifar and Vidal (2013) Alex Gutierrez CSCI 8314 March 2, 2017 Outline 1 Motivation and Background

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

13. Parameter Estimation. ECE 830, Spring 2014

13. Parameter Estimation. ECE 830, Spring 2014 13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information