Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude
|
|
- Irene Barker
- 5 years ago
- Views:
Transcription
1 Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Addy M. Boĺıvar Cimé Centro de Investigación en Matemáticas A.C. May 1, 2010 Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
2 Contenido 1 Introduction Spiked Covariance Model Recent results Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
3 Contenido 1 Introduction Spiked Covariance Model Recent results 2 Random Matrix Theory Result derived with RMT Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
4 Contenido 1 Introduction Spiked Covariance Model Recent results 2 Random Matrix Theory Result derived with RMT 3 Asymptotic results Consistency of eigenvalues Consistency of eigenvectors Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
5 An important problem in multivariate statistical analysis is the estimation of the population covariance matrix. In principal component analysis (PCA) one often fails to estimate the population eigenvalues and eigenvectors, since the sample covariance matrix is not a good approximation to the population covariance matrix when the data dimension is larger than the sample size. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
6 Spiked Covariance Model As pointed out in Johnstone (2001), one often observes one or a small number of large sample eigenvalues well separated from the rest. In this case, of special interest is the so-called spiked covariance model. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
7 More specifically, suppose we have a d n data matrix X = [x 1, x 2,..., x n ] with d n, in the sense that d n, where x j = (x 1j,..., x dj ), j = 1, 2,..., n. Assume the columns of X are independent and identically distributed random vectors from a multivariate Gaussian distribution with mean zero and unknown covariance matrix Σ. The spiked covariance model considers a covariance matrix of the type Σ = diag(τ 1, τ 2,..., τ p, 1,..., 1), (1) with τ 1 τ 2 τ p > 1, for some 1 p < d. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
8 Some results Ahn, Marron, Muller and Chi (2007) showed that for p = 1 and Σ = diag(d α, 1,..., 1), with α > 1, it is possible to estimate very well the first population eigenvalue and eigenvector using PCA when d and n are sufficiently large and d n. For the case p 2, Jung and Marron (2009) showed the consistency of the first p largest sample eigenvalues under the assumption that each of the p largest population eigenvalues, τ 1 τ 2 τ p, has a different asymptotic order of magnitude when dimension increases, that is τ i d α i c i, as d, where α 1 > α 2 > > α p > 1 and c i > 0, i = 1, 2,..., p. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
9 They showed that if τ 1 τ 2 τ p are the p largest sample eigenvalues τ i w χ 2 n, as d, (2) n τ i for i = 1, 2,..., p, where χ 2 n is a Chi-square random variable with n degrees of freedom. Therefore, since χ 2 n/n w 1 as n the first p sample eigenvalues are consistent. They also show that in this case the corresponding sample eigenvectors are consistent. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
10 Purpose of this work To study the asymptotic behaviour of the sample eigenvalues and eigenvectors, when d, n are sufficiently large and d n, in the case of the spiked covariance model where the p largest population eigenvalues τ 1,..., τ p, p > 1, have the same asymptotic order of magnitude when dimension increases, i.e. α 1 = α 2 = = α p = α > 1 and c 1 = c 2 = = c p = c > 0. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
11 Under this last assumption, we first prove a multivariate extension of (2), namely ( τ 1 τ 1, τ 2 τ 2,..., τ p τ p ) w (cn) 1 (l 1, l 2,..., l p ) as d where l 1 l 2 l p are the nonzero eigenvalues of a Wishart random matrix with p degrees of freedom and n n covariance matrix ci n. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
12 Wishart Distribution Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
13 Lemma 2.1 If l 1 l 2 l p are the nonzero eigenvalues of U W n (p, ci n ) with n > p and c > 0, then the joint density of (l 1, l 2,..., l p ) is given by f l1,l 2,...,l p (l 1, l 2,..., l p ) = α S p sign(α) exp( 1 2c with l 1 > l 2 > > l p, where p k=1 p l k ) k=1 l α k+(n p 1)/2 1 p+1 k, (3) = π p2 /2 (2c) pn/2 Γ p ( p 2 )Γ p( n (4) 2 ). Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
14 Proposition 2.1 If l 1 l 2 l p are the nonzero eigenvalues of U W n (p, ci n ) with n > p and c > 0, then the characteristic function of (l 1, l 2,..., l p ) is given by ϕ l1,l 2,...,l p (t 1, t 2,..., t p ) = ( ) pn/2 2c p Ĝ( t j ; pn p 2, 2c p ) j=1 p 1 C n,p (k 1,..., k p 1 ) r=1 ( r p k 1=0 k p 1=0 Γ( pn p k j ) ) kr Ĝ( p j=1 t j; k r, 2c p ) Ĝ( p j=1 j=p+1 r t (5) j; k r, 2c ), where Ĝ(t; a, b) is the characteristic function of the gamma distribution Gamma(a, b) and C n,p (k 1,..., k p 1 ) = p 1 sign(α) α S p r=1 Γ( r j=1 α j + r(n p 1) 2 + r 1 j=1 k j) Γ( r j=1 α j + r(n p 1) 2 + r j=1 k j + 1). (6) r Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
15 Theorem 3.1 Assume X as above with unknown covariance matrix given by the spiked covariance model (1) with p < n < d and where τ 1 τ 2 τ p > 1 have the same asymptotic order of magnitude, that is τ i = τ i (d), d α τ i (d) c as d for some c > 0 and α > 1, i = 1, 2,..., p. Let τ 1 τ 2 τ p be the p largest sample eigenvalues of the sample covariance matrix S = n 1 XX. Then ( τ 1 τ 1, τ 2 τ 2,..., τ p τ p ) w (cn) 1 (l 1, l 2,..., l p ), when d, where l 1 l 2 l p are the nonzero eigenvalues of a Wishart random matrix with distribution W n (p, ci n ). Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
16 Proposition 3.1 Let c > 0 and l 1 l 2 l p be the nonzero eigenvalues of U W n (p, ci n ) with n > p. Then (cn) 1 (l 1, l 2,..., l p ) w (1, 1,..., 1), as n. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
17 Theorem 3.2 Under the assumptions of Theorem 3.1, we have that for all ɛ > 0 ( ĺım ĺım P ( τ 1, τ 2,..., τ ) p ) (1, 1,..., 1) > ɛ = 0. (7) n d τ 1 τ 2 τ p Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
18 Theorem 3.3 Under the same conditions of Theorem 3.1 we have d α n( τ 1, τ 2,..., τ n ) w (l1, l 2,..., l p, 0,..., 0) as d, where l 1 l 2 l p are the nonzero eigenvalues of a Wishart random matrix with distribution W n (p, ci n ). Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
19 We define vj [Proj span{ui :i J}v j ] Angle(v j, span{u i : i J}) = arc cos( v j Proj span{ui :i J}v j ) vj ( i J = arc cos( (u i v j )u i ) v j i J (u i v j )u i ). Following the definition given in Jung and Marron (2009), we say that the sample eigenvector v j, with j J, is subspace consistent if Angle(v j, span{u i : i J}) P 0. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
20 Theorem 3.4 Under the same conditions of Theorem 3.1, let v 1, v 2,..., v p be the sample eigenvectors corresponding to the first p sample eigenvalues τ 1 τ 2 τ p. Then if ɛ > 0 ĺım ĺım P(Angle(v i, span{e j : j = 1, 2,... p}) > ɛ) = 0, (8) n d for i = 1, 2,..., p. Addy Boĺıvar (CIMAT) PCA for the Spiked Covariance Model May 1, / 18
Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors
Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationSTT 843 Key to Homework 1 Spring 2018
STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ
More informationThe Third International Workshop in Sequential Methodologies
Area C.6.1: Wednesday, June 15, 4:00pm Kazuyoshi Yata Institute of Mathematics, University of Tsukuba, Japan Effective PCA for large p, small n context with sample size determination In recent years, substantial
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationPrincipal Component Analysis
Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.
More informationLarge sample covariance matrices and the T 2 statistic
Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and
More informationSliced Inverse Regression
Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed
More informationAnnouncements (repeat) Principal Components Analysis
4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationRandom Matrices and Multivariate Statistical Analysis
Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical
More informationABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY
ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY José A. Díaz-García and Raúl Alberto Pérez-Agamez Comunicación Técnica No I-05-11/08-09-005 (PE/CIMAT) About principal components under singularity José A.
More informationNumerical Solutions to the General Marcenko Pastur Equation
Numerical Solutions to the General Marcenko Pastur Equation Miriam Huntley May 2, 23 Motivation: Data Analysis A frequent goal in real world data analysis is estimation of the covariance matrix of some
More informationConcentration Inequalities for Random Matrices
Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic
More informationAnnouncements Monday, November 20
Announcements Monday, November 20 You already have your midterms! Course grades will be curved at the end of the semester. The percentage of A s, B s, and C s to be awarded depends on many factors, and
More informationMultivariate Linear Models
Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationSmall sample size in high dimensional space - minimum distance based classification.
Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc
More information1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.
Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationThe largest eigenvalues of the sample covariance matrix. in the heavy-tail case
The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationJournal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts
Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA
More informationOn the principal components of sample covariance matrices
On the principal components of sample covariance matrices Alex Bloemendal Antti Knowles Horng-Tzer Yau Jun Yin February 4, 205 We introduce a class of M M sample covariance matrices Q which subsumes and
More informationNon white sample covariance matrices.
Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions
More informationRandom Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg
Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles
More informationMultivariate Analysis Homework 1
Multivariate Analysis Homework A490970 Yi-Chen Zhang March 6, 08 4.. Consider a bivariate normal population with µ = 0, µ =, σ =, σ =, and ρ = 0.5. a Write out the bivariate normal density. b Write out
More informationStatistics and data analyses
Statistics and data analyses Designing experiments Measuring time Instrumental quality Precision Standard deviation depends on Number of measurements Detection quality Systematics and methology σ tot =
More informationElliptically Contoured Distributions
Elliptically Contoured Distributions Recall: if X N p µ, Σ), then { 1 f X x) = exp 1 } det πσ x µ) Σ 1 x µ) So f X x) depends on x only through x µ) Σ 1 x µ), and is therefore constant on the ellipsoidal
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationFundamentals of Unconstrained Optimization
dalmau@cimat.mx Centro de Investigación en Matemáticas CIMAT A.C. Mexico Enero 2016 Outline Introduction 1 Introduction 2 3 4 Optimization Problem min f (x) x Ω where f (x) is a real-valued function The
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II
MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II the Contents the the the Independence The independence between variables x and y can be tested using.
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More information04. Random Variables: Concepts
University of Rhode Island DigitalCommons@URI Nonequilibrium Statistical Physics Physics Course Materials 215 4. Random Variables: Concepts Gerhard Müller University of Rhode Island, gmuller@uri.edu Creative
More informationAsymptotic distribution of the largest eigenvalue with application to genetic data
Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More informationApplications of random matrix theory to principal component analysis(pca)
Applications of random matrix theory to principal component analysis(pca) Jun Yin IAS, UW-Madison IAS, April-2014 Joint work with A. Knowles and H. T Yau. 1 Basic picture: Let H be a Wigner (symmetric)
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationAppendix 2. The Multivariate Normal. Thus surfaces of equal probability for MVN distributed vectors satisfy
Appendix 2 The Multivariate Normal Draft Version 1 December 2000, c Dec. 2000, B. Walsh and M. Lynch Please email any comments/corrections to: jbwalsh@u.arizona.edu THE MULTIVARIATE NORMAL DISTRIBUTION
More informationUser Guide for Hermir version 0.9: Toolbox for Synthesis of Multivariate Stationary Gaussian and non-gaussian Series
User Guide for Hermir version 0.9: Toolbox for Synthesis of Multivariate Stationary Gaussian and non-gaussian Series Hannes Helgason, Vladas Pipiras, and Patrice Abry June 2, 2011 Contents 1 Organization
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationz = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is
Example X~N p (µ,σ); H 0 : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) H 0β : β µ = 0 test statistic for H 0β is y z = β βσβ /n And reject H 0β if z β > c [suitable critical value] 301 Reject H 0 if
More informationUpper Bounds for Partitions into k-th Powers Elementary Methods
Int. J. Contemp. Math. Sciences, Vol. 4, 2009, no. 9, 433-438 Upper Bounds for Partitions into -th Powers Elementary Methods Rafael Jaimczu División Matemática, Universidad Nacional de Luján Buenos Aires,
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationOperator norm convergence for sequence of matrices and application to QIT
Operator norm convergence for sequence of matrices and application to QIT Benoît Collins University of Ottawa & AIMR, Tohoku University Cambridge, INI, October 15, 2013 Overview Overview Plan: 1. Norm
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five
More informationHigh-dimensional two-sample tests under strongly spiked eigenvalue models
1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationBinary Discrimination Methods for High Dimensional Data with a. Geometric Representation
Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis
MS-E2112 Multivariate Statistical (5cr) Lecture 8: Contents Canonical correlation analysis involves partition of variables into two vectors x and y. The aim is to find linear combinations α T x and β
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationOptimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices
Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices Alex Wein MIT Joint work with: Amelia Perry (MIT), Afonso Bandeira (Courant NYU), Ankur Moitra (MIT) 1 / 22 Random
More informationDependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.
MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y
More informationFace Recognition and Biometric Systems
The Eigenfaces method Plan of the lecture Principal Components Analysis main idea Feature extraction by PCA face recognition Eigenfaces training feature extraction Literature M.A.Turk, A.P.Pentland Face
More informationHigh-dimensional asymptotic expansions for the distributions of canonical correlations
Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic
More informationSecond-Order Inference for Gaussian Random Curves
Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationComputational functional genomics
Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLecture 5: Moment Generating Functions
Lecture 5: Moment Generating Functions IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge February 28th, 2018 Rasmussen (CUED) Lecture 5: Moment
More informationSTA 2101/442 Assignment 3 1
STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance
More informationLearning representations
Learning representations Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 4/11/2016 General problem For a dataset of n signals X := [ x 1 x
More informationUniversity of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May
University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2016-0063R2 Title Two-sample tests for high-dimension, strongly spiked eigenvalue models Manuscript ID SS-2016-0063R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI
More informationSome New Properties of Wishart Distribution
Applied Mathematical Sciences, Vol., 008, no. 54, 673-68 Some New Properties of Wishart Distribution Evelina Veleva Rousse University A. Kanchev Department of Numerical Methods and Statistics 8 Studentska
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationA note on profile likelihood for exponential tilt mixture models
Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential
More informationSolutions to Review Problems for Chapter 6 ( ), 7.1
Solutions to Review Problems for Chapter (-, 7 The Final Exam is on Thursday, June,, : AM : AM at NESBITT Final Exam Breakdown Sections % -,7-9,- - % -9,-,7,-,-7 - % -, 7 - % Let u u and v Let x x x x,
More informationNonparametric Inference In Functional Data
Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +
More informationThe Gaussian distribution
The Gaussian distribution Probability density function: A continuous probability density function, px), satisfies the following properties:. The probability that x is between two points a and b b P a
More information1 Intro to RMT (Gene)
M705 Spring 2013 Summary for Week 2 1 Intro to RMT (Gene) (Also see the Anderson - Guionnet - Zeitouni book, pp.6-11(?) ) We start with two independent families of R.V.s, {Z i,j } 1 i
More informationHypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA
Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationAdvanced Calculus II Unit 7.3: 7.3.1a, 7.3.3a, 7.3.6b, 7.3.6f, 7.3.6h Unit 7.4: 7.4.1b, 7.4.1c, 7.4.2b, 7.4.3, 7.4.6, 7.4.7
Advanced Calculus II Unit 73: 73a, 733a, 736b, 736f, 736h Unit 74: 74b, 74c, 74b, 743, 746, 747 Megan Bryant October 9, 03 73a Prove the following: If lim p a = A, for some p >, then a converges absolutely
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More information5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.
88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal
More informationCovariance estimation using random matrix theory
Covariance estimation using random matrix theory Randolf Altmeyer joint work with Mathias Trabs Mathematical Statistics, Humboldt-Universität zu Berlin Statistics Mathematics and Applications, Fréjus 03
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationTopic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.
Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Comparisons of Two Means Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationJoint p.d.f. and Independent Random Variables
1 Joint p.d.f. and Independent Random Variables Let X and Y be two discrete r.v. s and let R be the corresponding space of X and Y. The joint p.d.f. of X = x and Y = y, denoted by f(x, y) = P(X = x, Y
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationRandom Vectors and Multivariate Normal Distributions
Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random 75 variables. For instance, X = X 1 X 2., where each
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationInvertibility of random matrices
University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationPROCESS MONITORING OF THREE TANK SYSTEM. Outline Introduction Automation system PCA method Process monitoring with T 2 and Q statistics Conclusions
PROCESS MONITORING OF THREE TANK SYSTEM Outline Introduction Automation system PCA method Process monitoring with T 2 and Q statistics Conclusions Introduction Monitoring system for the level and temperature
More information