Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors

Size: px

Start display at page:

Download "Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors"

Godfrey Nicholson
5 years ago
Views:

1 Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

2 ...and related topics How are confidence distributions relevant for principal component analysis? K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

3 ...and related topics How are confidence distributions relevant for principal component analysis? Principal component analysis (PCA) deals with the distributions of sample eigenvalues, sample eigenvectors, projections unto the sample eigenvector space. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

4 Motivation Principal component analysis (PCA) has become a widely used dimension-reduction technique in areas with high-dimensional data: genetics, signal processing, chemometrics, climate, finance, imaging... K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

5 Motivation Principal component analysis (PCA) has become a widely used dimension-reduction technique in areas with high-dimensional data: genetics, signal processing, chemometrics, climate, finance, imaging... The method constructs a low-dimensional representation of each observation, which can be used to visualize high-dimensional data or as input in regression clustering classification K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

6 Example: Genetic markers, Novembre et al. (2008) K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

7 Population principal components For a p-dimensional random variable with expectation, E X = 0, and covariance matrix, Var X = Σ, X 1 X =. the principal components of X are defined as the linear projections given by the eigenvectors v 1,..., v p of Σ: X p, s T i = v T i X. The variance of each projection will be the eigenvalue Var s i = λ i. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

8 X 2 v 1 X 1 For the normal distribution X N (0, Σ), the first eigenvector v 1 of Σ corresponds to the major axis of the distribution ellipse. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

9 Sample principal components For a p n data matrix, X = x 1... x n, the principal components are given by the eigenvectors and eigenvalues of the sample covariance matrix ˆΣ = 1 n 1 n (x j x) (x j x) T, j=1 denoted by ˆv 1,..., ˆv p and ˆλ 1,..., ˆλ p, such that the ith sample principal component is given ŝ T i = ˆv T i X, for i = 1,..., p. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

10 X 2 X 1 The first sample eigenvector ˆv 1 will then be the line minimizing the sum of squares of the distances from the line to each data point.

11 X 2 X 1 The first sample eigenvector ˆv 1 will then be the line minimizing the sum of squares of the distances from the line to each data point.

12 X 2 ˆv 1 X 1 The first sample eigenvector ˆv 1 will then be the line minimizing the sum of squares of the distances from the line to each data point. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

13 Distribution of eigenvalues and eigenvectors The exact and asymptotic distributions of the sample eigenvalues and eigenvectors were explored and established by James (1960, 1964) and Anderson (1963, 1965). If X N(0, Σ) and λ 1 > > λ p, both the sample eigenvalues and -vectors are asymptotically normally distributed for fixed p and n : n (ˆλ i λ i ) d N ( 0, 2λ 2 ) i, n (ˆvi v i ) d N 0, p k=1,k i λ i λ k (λ i λ k ) 2 v kv T k. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

14 Distribution of eigenvalues and eigenvectors The exact and asymptotic distributions of the sample eigenvalues and eigenvectors were explored and established by James (1960, 1964) and Anderson (1963, 1965). If X N(0, Σ) and λ 1 > > λ p, both the sample eigenvalues and -vectors are asymptotically normally distributed for fixed p and n : n (ˆλ i λ i ) d N ( 0, 2λ 2 ) i, n (ˆvi v i ) d N 0, p k=1,k i λ i λ k (λ i λ k ) 2 v kv T k. Problems in the high-dimensional setting (p >> n): sample covariance matrix is rank deficient, no obvious asymptotic framework. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

15 High-dimensional setting When p > n, one assumes the m first eigenvalues to be substantially larger than the remaining eigenvalues (the spiked covariance model): λ 1 > > λ m }{{} Signal λ m+1 > > λ p. }{{} Noise K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

16 High-dimensional setting When p > n, one assumes the m first eigenvalues to be substantially larger than the remaining eigenvalues (the spiked covariance model): λ 1 > > λ m }{{} Signal λ m+1 > > λ p. }{{} Noise Two possible asymptotic frameworks to work within: K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

17 High-dimensional setting When p > n, one assumes the m first eigenvalues to be substantially larger than the remaining eigenvalues (the spiked covariance model): λ 1 > > λ m }{{} Signal λ m+1 > > λ p. }{{} Noise Two possible asymptotic frameworks to work within: 1 Random matrix theory (Bai, Silverstein): for fixed population eigenvalues λ i, let n, p such that p/n γ 0. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

18 High-dimensional setting When p > n, one assumes the m first eigenvalues to be substantially larger than the remaining eigenvalues (the spiked covariance model): λ 1 > > λ m }{{} Signal λ m+1 > > λ p. }{{} Noise Two possible asymptotic frameworks to work within: 1 Random matrix theory (Bai, Silverstein): for fixed population eigenvalues λ i, let n, p such that p/n γ 0. 2 High dimension low sample size (Marron, Jung): for fixed sample size n, let p as the m eigenvalues scale λ i = σ 2 i pα, α > 0. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

19 HDLSS framework The value α = 1 in the high dimension low sample size framework is a special case (Jung, Sen & Marron, 2012). Simplest setting: For normally distributed data and m = 1, λ 1 = σ 2 p, λ 2 = = λ p = τ 2, K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

20 HDLSS framework The value α = 1 in the high dimension low sample size framework is a special case (Jung, Sen & Marron, 2012). Simplest setting: For normally distributed data and m = 1, λ 1 = σ 2 p, λ 2 = = λ p = τ 2, the distribution of the first sample eigenvalue converges as p ˆσ 2 = p 1ˆλ 1 d σ 2 χ2 n n + τ 2 n. Here, ˆσ2 n τ 2 will be a pivot, such that a confidence distribution of σ 2 : σ 2 ( ˆσ C(σ 2 2 n τ 2 ) ) = 1 Γ n, Γ n ( ) is the cdf of χ 2 n. σ 2 K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

21 Confidence curve for first eigenvalue σ^2 n = 100, p = 500 K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

22 Confidence curve for first eigenvalue σ^2 n = 10, p = 500 K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

23 Distribution of eigenvectors (Jung, Sen & Marron, 2012) For normally distributed data, X N(0, Σ) and the population eigenvalues λ 1 = σ 2 p, λ 2 = = λ p = τ 2, the asymptotic distribution of the inner product between the first population eigenvector and the first sample eigenvector depends on α as p : ˆv 1 T v 1 1 α > 1, ( 1 + τ 2 σ 2 χ 2 n ) 1/2 α = 1, 0 α < 1. Possible confidence distribution for the boundary case α = 1? K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

24 Asymptotic inconsistency ˆv 1 v 1 X 2 X 1 In the high-dimensional setting, the sample eigenvectors do generally not converge to the population eigenvectors (Johnstone and Lu, 2009, Paul, 2007) - one motivation behind sparse PCA. But if the sample eigenvectors are not consistent, how can classical PCA be a successful method in high-dimensional applications? K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

25 Distribution of sample projections A possible answer can be given by the distribution of projections unto the sample eigenvector space. For the observations X = [x 1,..., x n ], the population and sample normalized principal component projections are defined as z T j = vt j X λj, ẑ T j = ˆvT j X ˆλ j. These sample projections are also not consistent, just as the eigenvectors, BUT... K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

26 Theorem (Hellton and Thoresen, 2015) For X N(0, Σ) and the m first eigenvalues λ 1 = σ 2 1p,..., λ m = σ 2 mp, the joint distribution of the m first sample projections of the jth observation converges, as p, to ẑ j1 n/d1 0 σ 1 z j1. d... u 1 u m., ẑ jm 0 n/dm σ m z jm }{{}}{{} Scaling Rotation for j = 1,..., n, where d i and u i are the ith eigenvalue and -vector of an m m Wishart distributed matrix, W Wishart ( n, diag(σ 2 1,..., σ2 m) ). K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

27 The vector of the first m sample projections will be a rotated and scaled version in m dimensions of the corresponding population projections. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

28 The vector of the first m sample projections will be a rotated and scaled version in m dimensions of the corresponding population projections. However, for the purpose of visualizing data one would plot pairs of sample projections in two dimensions. Simulations show that for moderate sample size, a two-dimensional representation of the sample projections will also be a scaled and rotated version of the population projections. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

29 The vector of the first m sample projections will be a rotated and scaled version in m dimensions of the corresponding population projections. However, for the purpose of visualizing data one would plot pairs of sample projections in two dimensions. Simulations show that for moderate sample size, a two-dimensional representation of the sample projections will also be a scaled and rotated version of the population projections. Implications for visualization As the difference between the sample and population projections is only a scaling and rotation, the relative positions and the visual content will remain the same. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

30 Different behavior seen in two samples (black dots) compared to their populations (circles). Scaling: 0.91 in x, 0.92 in y, Rotation: 3.7 degrees Scaling: 0.99 in x, 0.93 in y, Rotation: 27.1 degrees Even though eigenvectors and projections are not consistent, the visual information of the population projections is conserved in the estimated projections. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

31 Summing up In connection with principal component analysis in high dimension, there are a growing number of results on the distributions of sample eigenvalues, sample eigenvectors, and sample projections. Here, confidence distributions can play a role. K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

32 Thank you! K. Hellton (UiO) Distribution of sample eigenvectors May 12, / 21

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations