Regularized PCA to denoise and visualise data
|
|
- Amber Cunningham
- 6 years ago
- Views:
Transcription
1 Regularized PCA to denoise and visualise data Marie Verbanck Julie Josse François Husson Laboratoire de statistique, Agrocampus Ouest, Rennes, France CNAM, Paris, 16 janvier / 30
2 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 2 / 30
3 Geometrical point of view Maximize the variance of the projected points Minimize the reconstruction error Approximation of X of low rank (S < p) X n p ˆXn p 2 SVD: ˆXPCA = U n S Λ 1 2 S S V p S F = UΛ 1 2 principal components V principal axes Alternating least squares algorithm: X FV 2 F = XV(V V) 1 V = X F(F F) 1 3 / 30
4 Model in PCA x = signal + noise Random eect model: Factor Analysis, Probabilistic PCA (Lawley, 1953, Roweis, 1998, Tipping & Bishop, 1999) individuals are exchangeable relationship between variables Fixed eect model (Lawley, 1941, Caussinus 1986) individuals are not i.i.d study the individuals and the variables sensory analysis (products - sensory descriptors) plant breeding (genotypes - environments) 4 / 30
5 Fixed eect model X n p = Xn p + ε n p x = S s=1 Randomness due to the noise d s q is r js + ε, ε N (0, σ 2 ) Individuals have dierent expectations: individual parameters Inferential framework: when n the number of parameters Asymptotic: σ 0 Maximum likelihood estimates (ˆθ) = least square estimates EM algorithm = ALS algorithm Not necessarily the best in MSE Shrinkage estimates: minimizes E(φˆθ θ) 2 5 / 30
6 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 6 / 30
7 Minimizing the MSE x = x + ε = S s=1 x s + ε MSE = E i,j min(n 1,p) s=1 φ sˆx (s) x 2 with ˆx (s) = λ s u is v js MSE = E i,j ( S s=1 φ sˆx (s) ) 2 x (s) + min(n 1,p) s=s+1 ( φ sˆx (s) ) x (s) 2 For all s S + 1, x (s) = 0, therefore φ S+1 =... = φ min(n 1,p) = 0. 7 / 30
8 Minimizing the MSE MSE = E i,j ( S s=1 φ sˆx (s) x ) 2 with ˆx (s) = λ s u is v js φ s = i,j i,j (s) E(ˆx ) x E(ˆx (s)2 ) = i,j ( i,j V (ˆx (s) ) + (s) E(ˆx ) x ( E(ˆx (s) ) ) 2 ) When σ 0: E(ˆx (s) ) = x (s) (unbiased); V (ˆx (s) σ ) = 2 (Pazman & Denis, 1996; Denis & Gower, 1999) min(n 1;p) φ s = i,j i,j x (s)2 σ 2 min(n 1;p) + (s)2 i,j x 8 / 30
9 Minimizing the MSE x = x + ε = S s=1 d s q is r js + ε, ε N (0, σ 2 ) MSE = E ( i,j ( S s=1 φ sˆx (s) x ) 2 ) with ˆx (s) = λ s u is v js φ s = i,j i,j x (s)2 σ 2 min(n 1;p) + (s)2 i,j x = = d s npσ 2 min(n 1;p) + d s variance signal variance total ˆφ s = λ s ˆσ 2 λ s for s = 1,..., S 9 / 30
10 Regularized PCA ˆx rpca = ˆx PCA = S ( ) λs ˆσ 2 λs u is v js = s=1 λ s S s=1 S s=1 λs u is v js ( λs ˆσ2 λs ) u is v js Compromise between hard and soft thresholding σ 2 small regularized PCA PCA σ 2 large individuals coordinates toward the center of gravity ˆσ 2 = RSS ddl = n q s=s+1 λ s np p ns ps + S 2 + S (X n p ; U n S ; V p S ) (only asympt unbiased) 10 / 30
11 Illustration of the regularization X sim 4 10 = X ε sim 4 10 sim = 1,..., 1000 PCA rpca rpca more biased less variable 11 / 30
12 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 12 / 30
13 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view Y = Xβ + ε, ε N (0, σ 2 ) β N (0, τ 2 ) ˆβ MAP = (X X + κ) 1 X Y, with κ = σ 2 /τ 2 3 Results 12 / 30
14 Probabilistic PCA (Roweis, 1998, Tipping & Bishop, 1999) A specic case of an exploratory factor analysis model x = (Z n S B p S ) + ε, z i N (0, I S ), ε i N (0, σ 2 I p ) Conditional independence : x i z i N (Bz i, I p ) Distribution for the observations: x i N (0, Σ) with Σ = BB + σ 2 I p Explicit solution: ˆσ 2 = 1 p S p s=s+1 λ s ˆB = V(Λ σ 2 I S ) / 30
15 Probabilistic PCA x = (Z n S B p S ) + ε, z i N (0, I S ), ε i N (0, σ 2 I p ) Random eect model: no individual parameters BLUP E(z i x i ): Ẑ = XˆB(ˆB ˆB + σ2 I S ) 1 ˆX ppca = ẐˆB Using ˆB = V(Λ σ2 I S ) 1 2 and SVD X = UΛ 1/2 V XV = Λ 1/2 U ˆX ppca = U(Λ σ 2 I S )Λ 1 2 V ˆx ppca = S s=1 ( λs ˆσ2 λs ) u is v js Bayesian treatment of the xed eect model with an a priori on z i : Regularized PCA 14 / 30
16 Probabilistic PCA EM algorithm in ppca: two ridge regressions Step E: Step M: Ẑ = XˆB(ˆB ˆB + ˆσ 2 I S ) 1 ˆB = X Ẑ(Ẑ Ẑ + ˆσ 2 Λ 1 ) 1 EM algorithm in PCA: two regressions X FV 2 F = XV(V V) 1 V = X F(F F) 1 15 / 30
17 An empirical Bayesian approach Model: x = S s=1 Prior: x (s) N (0, τs 2 ) ( ) Posterior: E x (s) x (s) + ε, ε N (0, σ 2 ) x (s) ( ) Maximizing the likelihood of x (s) ( function of τs 2 to obtain: ˆτ s 2 = ˆΦ s = λ s = Φ s x (s) with Φ s = τ 2 s τ 2 s + 1 min(n 1;p) σ2 N (0, τs min(n 1;p) σ2 ) as a ) min(n 1;p) ˆσ2 1 np λ 1 s np min(n 1;p) ˆσ2 λ s 16 / 30
18 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 17 / 30
19 Simulation design The simulated data: X = X + ε Vary several parameters to construct X: n/p: 100/20 = 5, 50/50 = 1 and 20/100 = 0.2 S underlying dimensions: 2, 4, 10 the ratio d 1 /d 2 : 4, 1 ε is drawn from a Gaussian distribution with a standard deviation chosen to obtain a signal-to-noise ratio (SNR): 4, 1, / 30
20 Soft thresholding and SURE (Candès et al., 2012) ˆx Soft = min(n,p) s=1 ( λs λ) + u is v js X X 2 + λ X Choosing the tunning parameter λ? MSE(λ) = E X SVTλ (X) 2 Stein's Unbiased Risk Estimate (SURE) min(n,p) SURE(SVT λ )(X) = npσ 2 + min(λ 2, λ i ) + 2σ 2 div(svt λ )(X) i λ is obtained minimizing SURE. Remark: ˆσ 2 is required 19 / 30
21 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 MSE in favor of rpca 20 / 30
22 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 rpca PCA when SNR is small - rpca PCA when SNR is high 20 / 30
23 rpca rpca when d 1 d 2 20 / 30 PCA Regularized PCA Results Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01
24 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 Same order of errors for n/p = 0.2 or 5, smaller for n/p = 1! 20 / 30
25 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 SURE good when data are noisy 20 / 30
26 Simulation from Candes et al. (2012) Same model of simulation n = 500 p = 200, SNR (4, 2, 1, 0.5), S (10, 100) S SNR MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) E E E E E E E E E E E E E E E E E E E E E E E E / 30
27 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 True conguration PCA 22 / 30
28 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 PCA rpca 23 / 30
29 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 PCA SURE 23 / 30
30 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 True conguration rpca Improve the recovery of the inner-product matrix X X 24 / 30
31 Variables representation True conguration PCA cor(x 7, X 9 )= / 30
32 Variables representation PCA rpca cor(x 7, X 9 )= / 30
33 Variables representation PCA SURE cor(x 7, X 9 )= / 30
34 Variables representation True conguration rpca cor(x 7, X 9 )= Improve the recovery of the covariance matrix X X 27 / 30
35 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions PCA rpca 28 / 30
36 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions SURE rpca 28 / 30
37 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions SPCA rpca 28 / 30
38 Real data set PCA rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30
39 Real data set SURE rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30
40 Real data set SPCA rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30
41 Conclusion Recover a low rank matrix from noisy observations singular values thresholding: a popular estimation strategy ( λs a regularized term a priori: ˆσ2 λs ): Minimize MSE - Bayesian xed eect model Good results in term of recovery of the structure, the distances between individuals and the covariance matrix and in term of visualization Asymptotic results Tuning parameter: the number of dimensions CV? Regularized MCA (Takane, 2006) 30 / 30
Regression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationHandling missing values in Multiple Factor Analysis
Handling missing values in Multiple Factor Analysis François Husson, Julie Josse Applied mathematics department, Agrocampus Ouest, Rennes, France Rabat, 26 March 2014 1 / 10 Multi-blocks data set Groups
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationSignal Denoising with Wavelets
Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationLecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]
Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More information5.2 Expounding on the Admissibility of Shrinkage Estimators
STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc
More informationGeneralized SURE for optimal shrinkage of singular values in low-rank matrix denoising
Journal of Machine Learning Research 8 (207) -50 Submitted 5/6; Revised 0/7; Published /7 Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising Jérémie Bigot Institut de
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationDimension Reduction. David M. Blei. April 23, 2012
Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do
More informationBayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects
Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects P. Richard Hahn, Jared Murray, and Carlos Carvalho July 29, 2018 Regularization induced
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationMixtures of Robust Probabilistic Principal Component Analyzers
Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationConcentration Ellipsoids
Concentration Ellipsoids ECE275A Lecture Supplement Fall 2008 Kenneth Kreutz Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San Diego VERSION LSECE275CE
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationApproximate Message Passing Algorithms
November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)
More informationRigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems
1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationSpatial Modeling and Prediction of County-Level Employment Growth Data
Spatial Modeling and Prediction of County-Level Employment Growth Data N. Ganesh Abstract For correlated sample survey estimates, a linear model with covariance matrix in which small areas are grouped
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationRidge Regression Revisited
Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationMixture of Gaussians Models
Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationProbabilistic Models of Past Climate Change
Probabilistic Models of Past Climate Change Julien Emile-Geay USC College, Department of Earth Sciences SIAM International Conference on Data Mining Anaheim, California April 27, 2012 Probabilistic Models
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationABSTRACT. Topics on LASSO and Approximate Message Passing. Ali Mousavi
ABSTRACT Topics on LASSO and Approximate Message Passing by Ali Mousavi This thesis studies the performance of the LASSO (also known as basis pursuit denoising) for recovering sparse signals from undersampled,
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationStructured signal recovery from non-linear and heavy-tailed measurements
Structured signal recovery from non-linear and heavy-tailed measurements Larry Goldstein* Stanislav Minsker* Xiaohan Wei # *Department of Mathematics # Department of Electrical Engineering UniversityofSouthern
More informationStatistical Inference On the High-dimensional Gaussian Covarianc
Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationPrediction by supervised principal components
Prediction by supervised principal components Eric Bair Trevor Hastie Debashis Paul and Robert Tibshirani September 15, 2004 Summary In regression problems where the number of predictors greatly exceeds
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationMissing values imputation for mixed data based on principal component methods
Missing values imputation for mixed data based on principal component methods Vincent Audigier, François Husson & Julie Josse Agrocampus Rennes Compstat' 2012, Limassol (Cyprus), 28-08-2012 1 / 21 A real
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationEstimating subgroup specific treatment effects via concave fusion
Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion
More informationChapter 14 Stein-Rule Estimation
Chapter 14 Stein-Rule Estimation The ordinary least squares estimation of regression coefficients in linear regression model provides the estimators having minimum variance in the class of linear and unbiased
More informationSparse Subspace Clustering
Sparse Subspace Clustering Based on Sparse Subspace Clustering: Algorithm, Theory, and Applications by Elhamifar and Vidal (2013) Alex Gutierrez CSCI 8314 March 2, 2017 Outline 1 Motivation and Background
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationHomework 1. Yuan Yao. September 18, 2011
Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationEstimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction
Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More information13. Parameter Estimation. ECE 830, Spring 2014
13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More information