Independent component analysis for functional data

Size: px

Start display at page:

Download "Independent component analysis for functional data"

Hilary Logan
6 years ago
Views:

1 Independent component analysis for functional data Hannu Oja Department of Mathematics and Statistics University of Turku Version August 216 Oja (UTU) FICA Date bottom 1 / 38

2 Outline 1 Probability space, random variables, random functions 2 Affine (linear) transformations and linear operators 3 Descriptive statistics for random vectors and functions 4 Independent component analysis (ICA) Real data analysis 6 References Oja (UTU) FICA Date bottom 2 / 38

3 Acknowledgement The presentation is very much based on the paper Li, G., Van Bever, G., Oja, H., Sabolova, R. and Critchley, F. (216). Functional independent component analysis : an extension of fourth-order blind identification. Submitted. Oja (UTU) FICA Date bottom 3 / 38

4 Acknowledgement The presentation is very much based on the paper Li, G., Van Bever, G., Oja, H., Sabolova, R. and Critchley, F. (216). Functional independent component analysis : an extension of fourth-order blind identification. Submitted. See also Peña, D., Prieto, J. and Rendón, C. (214). Independent components techniques based on kurtosis for functional data analysis. Working Paper 14-1, Statistics and Econometrics Series (6). Oja (UTU) FICA Date bottom 3 / 38

5 Prologue : PCA vs. ICA in R p Let X R p be a random vector. Assume that E(X ) =. In the principal component analysis (PCA), one then finds a transformation matrix A R p p such that AA = I p and AE(XX )A = D where D is a diagonal matrix with diagonal elements d 1... d p. Oja (UTU) FICA Date bottom 4 / 38

6 Prologue : PCA vs. ICA in R p Let X R p be a random vector. Assume that E(X ) =. In the principal component analysis (PCA), one then finds a transformation matrix A R p p such that AA = I p and AE(XX )A = D where D is a diagonal matrix with diagonal elements d 1... d p. In the independent component analysis (ICA), fourth order blind separation (FOBI) finds a transformation matrix A R p p such that AE(XX )A = I p and AE ( XX E(XX ) 1 XX ) A = D where the diagonal elements D are ordered according to marginal kurtosis : d 1... d p. Oja (UTU) FICA Date bottom 4 / 38

7 Prologue : PCA vs. ICA in R p Let X R p be a random vector. Assume that E(X ) =. In the principal component analysis (PCA), one then finds a transformation matrix A R p p such that AA = I p and AE(XX )A = D where D is a diagonal matrix with diagonal elements d 1... d p. In the independent component analysis (ICA), fourth order blind separation (FOBI) finds a transformation matrix A R p p such that AE(XX )A = I p and AE ( XX E(XX ) 1 XX ) A = D where the diagonal elements D are ordered according to marginal kurtosis : d 1... d p. Idea : Transform X AX. Some of the components present signal, some present noise! Oja (UTU) FICA Date bottom 4 / 38

8 Iris data - PCA Sepal.Length Sepal.Width Petal.Length Petal.Width Oja (UTU) FICA Date bottom / 38

9 Iris data - PCA Sepal.Length Comp Sepal.Width Comp.2 Petal.Length Comp Petal.Width Comp Figure: Classical Iris data with three species : Left panel original data, right panel principal components. Oja (UTU) FICA Date bottom / 38

10 Iris data - ICA Sepal.Length Sepal.Width Petal.Length Petal.Width Oja (UTU) FICA Date bottom 6 / 38

11 Iris data - ICA Sepal.Length IC Sepal.Width IC.2 Petal.Length IC Petal.Width IC Figure: Classical Iris data with three species : Left panel original data, right panel independent components. Oja (UTU) FICA Date bottom 6 / 38

12 Toy data I - PCA V1 1 V2 1 1 V3 1 V4 V V Oja (UTU) 1 FICA Date bottom 7 / 38

13 Toy data I - PCA V V1 V2 1 1 V2 1 V V3 2 V4 V V V4 V6 1. V Figure: Left panel original data, right panel principal components. Oja (UTU) FICA Date bottom 7 / 38

14 Toy data I - ICA V1 1 V2 1 1 V3 1 V4 V V Oja (UTU) 1 FICA Date bottom 8 / 38

15 Toy data I - ICA V1 3 1 V1 V2 3 1 V V V2 V4 1 V V V4 V6 3 V Figure: Left panel original data, right panel independent components. Oja (UTU) FICA Date bottom 8 / 38

16 Toy data II - PCA V1 4 V2 4 V V V 4 V6 4 Oja (UTU) FICA Date bottom 9 / 38

17 Toy data II - PCA V1 V1 V V2 V V3 V4 2 V V V4 V V Figure: Left panel original data, right panel principal components. Oja (UTU) FICA Date bottom 9 / 38

18 Toy data II - ICA V1 4 V2 4 V V V 4 V6 4 Oja (UTU) FICA Date bottom 1 / 38

19 Toy data II - ICA V1 1 3 V1 V V3 V V2 V4 2 V V V4 V V Figure: Left panel original data, right panel independent components. Oja (UTU) FICA Date bottom 1 / 38

20 Toy data III - PCA var var 2 var var 4 var Oja (UTU) FICA Date bottom 11 / 38

21 Toy data III - PCA var 1 var 2 var Comp.1 Comp.2 Comp var Comp.4 var Comp Figure: Left panel original data, right panel principal components. Oja (UTU) FICA Date bottom 11 / 38

22 Toy data III - ICA var var 2 var var 4 var Oja (UTU) FICA Date bottom 12 / 38

23 Toy data III - ICA var IC var 2 var IC.2 IC var IC.4 var IC Figure: Left panel original data, right panel independent components. Oja (UTU) FICA Date bottom 12 / 38

24 Probability space, random variables, random functions Probability space Let (Ω, F, P) be a probability space for a random experiment that is planned to collect functional data. Functional observations are assumed to belong to H = L 2 (, 1), a separable Hilbert space with the scalar product 1 scalar product f, g = f (t)g(t)dt, and norm f = f, f 1/2. Oja (UTU) FICA Date bottom 13 / 38

25 Probability space, random variables, random functions Probability space Let (Ω, F, P) be a probability space for a random experiment that is planned to collect functional data. Functional observations are assumed to belong to H = L 2 (, 1), a separable Hilbert space with the scalar product 1 scalar product f, g = f (t)g(t)dt, and norm f = f, f 1/2. Let B be the Borel σ-field generated by the open sets in H. A random function on H is then a measurable mapping X : Ω H that is F/B measurable. Oja (UTU) FICA Date bottom 13 / 38

26 Probability space, random variables, random functions Basis functions In the separable Hilbert space H = L 2 (, 1), there exists an orthonormal basis (f j ), that is, fj, f k = δjk such that K H = {x : x = x j f j } H K = {x : x = x j f j }. j=1 j=1 Oja (UTU) FICA Date bottom 14 / 38

27 Probability space, random variables, random functions Basis functions In the separable Hilbert space H = L 2 (, 1), there exists an orthonormal basis (f j ), that is, fj, f k = δjk such that K H = {x : x = x j f j } H K = {x : x = x j f j }. j=1 j=1 In practice, one chooses orthonormal basis functions (g j ) such as (i) Fourier basis functions, (ii) spline functions, (iii) wavelets, (iv) basis functions based on kernel smoothing of data functions, etc. The data analysis is then made in the space H = {x : x = x j g j } j=1 or in the truncated space K H K = {x : x = x j g j }. j=1 Oja (UTU) FICA Date bottom 14 / 38

28 Probability space, random variables, random functions Random vectors and random functions We consider three types of random elements, namely random variables, that is, mappings X : Ω R, random vectors, that is, mappings X : Ω R p, and random functions, mappings X : Ω H, that are F/B-measurable with relevant Borel σ-fields.. Oja (UTU) FICA Date bottom 1 / 38

29 Probability space, random variables, random functions Random vectors and random functions We consider three types of random elements, namely random variables, that is, mappings X : Ω R, random vectors, that is, mappings X : Ω R p, and random functions, mappings X : Ω H, that are F/B-measurable with relevant Borel σ-fields.. In our experiment (Ω, F, P), to estimate population quantities, we collect random samples, that is, iid observations X 1,..., X n, measurable mappings X i : Ω R, i = 1,..., n measurable mapping X i : Ω R p, i = 1,..., n, and measurable mappings X i : Ω H, i = 1,..., n. Oja (UTU) FICA Date bottom 1 / 38

30 Affine (linear) transformations and linear operators Affine transformations for random variables For a univariate random variable, linear transformations are given by X Y = ax + b, a, b R. For a p-variate random vector X R p, affine or linear transformations are given by X Y = AX + b, A R k p, b R k.. Oja (UTU) FICA Date bottom 16 / 38

31 Affine (linear) transformations and linear operators Affine transformations for random variables For a univariate random variable, linear transformations are given by X Y = ax + b, a, b R. For a p-variate random vector X R p, affine or linear transformations are given by X Y = AX + b, A R k p, b R k.. Some important square matrices often used for affine transformations : Centering matrix A = I p 1 p 1p1 p. Rescaling matrix is a diagonal matrix with positive diagonal elements. Orthogonal matrix A satisfying A A = AA = I p. Sign-change and permutation matrices are orthogonal. Projection matrix A satisfying A = A and A 2 = A. Then A = UU where U R k p has orthonormal columns. Oja (UTU) FICA Date bottom 16 / 38

32 Affine (linear) transformations and linear operators Eigenvector-eigenvalue decomposition Let A R p p be symmetric, e.g. a covariance matrix. Then one can write (eigenvector-eigenvalue decomposition) that A = UDU = p d i u i u i, where U = (u 1 u p) is an orthogonal matrix and D is a diagonal matrix with ordered diagonal elements d 1 d 2 d p. i=1 Oja (UTU) FICA Date bottom 17 / 38

33 Affine (linear) transformations and linear operators Eigenvector-eigenvalue decomposition Let A R p p be symmetric, e.g. a covariance matrix. Then one can write (eigenvector-eigenvalue decomposition) that A = UDU = p d i u i u i, where U = (u 1 u p) is an orthogonal matrix and D is a diagonal matrix with ordered diagonal elements d 1 d 2 d p. The columns of U = (u 1 u p) are called the eigenvectors and the diagonal elements d 1,..., d p corresponding eigenvalues of A. i=1 Oja (UTU) FICA Date bottom 17 / 38

34 Affine (linear) transformations and linear operators Eigenvector-eigenvalue decomposition Let A R p p be symmetric, e.g. a covariance matrix. Then one can write (eigenvector-eigenvalue decomposition) that A = UDU = p d i u i u i, where U = (u 1 u p) is an orthogonal matrix and D is a diagonal matrix with ordered diagonal elements d 1 d 2 d p. The columns of U = (u 1 u p) are called the eigenvectors and the diagonal elements d 1,..., d p corresponding eigenvalues of A. Note that det(a) = p d i, tr(a) = i=1 i=1 p d i and A 2 = i=1 p di 2. i=1. Oja (UTU) FICA Date bottom 17 / 38

35 Affine (linear) transformations and linear operators Eigenvector-eigenvalue decomposition Let A R p p be symmetric, e.g. a covariance matrix. Then one can write (eigenvector-eigenvalue decomposition) that A = UDU = p d i u i u i, where U = (u 1 u p) is an orthogonal matrix and D is a diagonal matrix with ordered diagonal elements d 1 d 2 d p. The columns of U = (u 1 u p) are called the eigenvectors and the diagonal elements d 1,..., d p corresponding eigenvalues of A. Note that det(a) = p d i, tr(a) = i=1 i=1 p d i and A 2 = i=1 p di 2. i=1. A symmetric matrix A R p p is positive (non-negative) definite if the smallest eigenvalue d p > ( ). Then b Ab > ( ), for all nonzero b R p. For a symmetric matrix A = p i=1 d i u i u i with non-zero eigenvalues, the inverse is A 1 = p i=1 d i 1 u i u i. For a p p symmetric matrix A, a square root matrix A 1/2 is any matrix that satisfies A 1/2 (A 1/2 ) = A. Often, a symmetric version of A 1/2 is chosen and then for example A 1/2 = p i=1 d 1/2 i u i u i. Oja (UTU) FICA Date bottom 17 / 38

36 Affine (linear) transformations and linear operators Singular value decomposition (SVD) Next let A be any k p matrix, and assume that k p. Then it can be written (SVD) as A = UDV, where U R k k is orthogonal, V R p k has orthonormal columns (V V = I k ) and D R k k is a diagonal matrix with diagonal elements d 1... d k. Oja (UTU) FICA Date bottom 18 / 38

37 Affine (linear) transformations and linear operators Singular value decomposition (SVD) Next let A be any k p matrix, and assume that k p. Then it can be written (SVD) as A = UDV, where U R k k is orthogonal, V R p k has orthonormal columns (V V = I k ) and D R k k is a diagonal matrix with diagonal elements d 1... d k. Then k AA = UD 2 U, A A = VD 2 V and A 2 = tr(aa ) = di 2. i=1 Oja (UTU) FICA Date bottom 18 / 38

38 Affine (linear) transformations and linear operators Singular value decomposition (SVD) Next let A be any k p matrix, and assume that k p. Then it can be written (SVD) as A = UDV, where U R k k is orthogonal, V R p k has orthonormal columns (V V = I k ) and D R k k is a diagonal matrix with diagonal elements d 1... d k. Then k AA = UD 2 U, A A = VD 2 V and A 2 = tr(aa ) = di 2. i=1 If U = (u 1,..., u k ) and V = (v 1,..., v k ) then A = k i=1 d i u i v i and k x y = Ax = d i v i, x u i i=1 may be seen as a change of a coordinate system with componentwise rescalings. For a projection matrix A R p p with rank k (p k zero eigenvalues), we get k A = UU = u i u i i=1 with some U R p k with orthonormal columns. Oja (UTU) FICA Date bottom 18 / 38

39 Affine (linear) transformations and linear operators Linear operators for functional data I In functional data analysis, we speak about linear operators rather than matrices. Let L be the set of bounded linear operators A : H H with norm A L = sup Ax, A L. x =1 (Linearity means that A(ax + by) = aax + bay, a, b R and x.y H.) Oja (UTU) FICA Date bottom 19 / 38

40 Affine (linear) transformations and linear operators Linear operators for functional data I In functional data analysis, we speak about linear operators rather than matrices. Let L be the set of bounded linear operators A : H H with norm A L = sup Ax, A L. x =1 (Linearity means that A(ax + by) = aax + bay, a, b R and x.y H.)An integral operator L with real kernel K : I I R given by 1 (Lf )(t) = K(t, s)f (s)ds, f H is a linear operator. Oja (UTU) FICA Date bottom 19 / 38

41 Affine (linear) transformations and linear operators Linear operators for functional data I In functional data analysis, we speak about linear operators rather than matrices. Let L be the set of bounded linear operators A : H H with norm A L = sup Ax, A L. x =1 (Linearity means that A(ax + by) = aax + bay, a, b R and x.y H.)An integral operator L with real kernel K : I I R given by 1 (Lf )(t) = K(t, s)f (s)ds, f H is a linear operator. The adjoint operator A of A L is uniquely defined through Af, g = f, A g, f, gh. Operator A L is symmetric if A = A, that is, if Ax, y = x, Ay, x, y H and positive definite if Ax, x, x H. Operator U is unitary if UU = U U = I, the identity operator. Oja (UTU) FICA Date bottom 19 / 38

42 Affine (linear) transformations and linear operators Linear operators for functional data II The class of Hilbert-Schmidt operators is particularly interesting for our purpose : A linear operator A : H H is Hilbert-Schmidt if it allows a singular value decomposition (SVD) Ax = λ j gj, x f j j=1 with two orthonormal bases (g j ) and (f j ) and a sequence (λ j ) of non-negative real numbers such that j=1 λ2 j <. Oja (UTU) FICA Date bottom 2 / 38

43 Affine (linear) transformations and linear operators Linear operators for functional data II The class of Hilbert-Schmidt operators is particularly interesting for our purpose : A linear operator A : H H is Hilbert-Schmidt if it allows a singular value decomposition (SVD) Ax = λ j gj, x f j j=1 with two orthonormal bases (g j ) and (f j ) and a sequence (λ j ) of non-negative real numbers such that j=1 λ2 j <. Operator A is thus an infinite weighted sum of elemental, tensor product operators f e : H H given by (f e)x = e, x f For min(λ 1,..., λ k ) >, A = k j=1 λ j (f j g j ) : H H give linear transformations to a finite subspace spanned by orthonormal functions f 1,..., f k. Oja (UTU) FICA Date bottom 2 / 38

44 Affine (linear) transformations and linear operators Linear operators for functional data II The class of Hilbert-Schmidt operators is particularly interesting for our purpose : A linear operator A : H H is Hilbert-Schmidt if it allows a singular value decomposition (SVD) Ax = λ j gj, x f j j=1 with two orthonormal bases (g j ) and (f j ) and a sequence (λ j ) of non-negative real numbers such that j=1 λ2 j <. Operator A is thus an infinite weighted sum of elemental, tensor product operators f e : H H given by (f e)x = e, x f For min(λ 1,..., λ k ) >, A = k j=1 λ j (f j g j ) : H H give linear transformations to a finite subspace spanned by orthonormal functions f 1,..., f k. Note also that the projection to the subspace spanned by orthonormal f 1,..., f k is A = k j=1 (f j f j ). For f, g, h H and A, B L, (g f ) = f g, (Ag) (Bf ) = A(g f )B and (h g)(g f ) = g 2 (h f ). Oja (UTU) FICA Date bottom 2 / 38

45 Descriptive statistics for random vectors and functions Moments of random variables Let X R be a univariate random variable with expected value µ = E(X ) and variance σ 2 = Var(X ). Write R = X µ σ for the standardized variable, that is, the standardized (signed) distance between X and µ. Oja (UTU) FICA Date bottom 21 / 38

46 Descriptive statistics for random vectors and functions Moments of random variables Let X R be a univariate random variable with expected value µ = E(X ) and variance σ 2 = Var(X ). Write R = X µ σ for the standardized variable, that is, the standardized (signed) distance between X and µ. Assume also that E(X 4 ) <. Then the classical measures of (univariate) location, scale, skewness and kurtosis are E(X ), Var(X ), E(R 3 ) and E(R 4 ). Kurtosis is then the ratio of two scale measures Var(RX ) and Var(X ), and may be seen through the comparisons of E((X E(X )) 4 ) and E((X E(X )) 2 ). Oja (UTU) FICA Date bottom 21 / 38

47 Descriptive statistics for random vectors and functions Moments of random variables Let X R be a univariate random variable with expected value µ = E(X ) and variance σ 2 = Var(X ). Write R = X µ σ for the standardized variable, that is, the standardized (signed) distance between X and µ. Assume also that E(X 4 ) <. Then the classical measures of (univariate) location, scale, skewness and kurtosis are E(X ), Var(X ), E(R 3 ) and E(R 4 ). Kurtosis is then the ratio of two scale measures Var(RX ) and Var(X ), and may be seen through the comparisons of E((X E(X )) 4 ) and E((X E(X )) 2 ). Let then X 1,..., X n be a random sample from a distribution of X. Natural estimates of µ and σ 2 are then ˆµ = 1 n n i=1 X i and ˆσ 2 = 1 n n i=1 (X i ˆµ) 2. If ˆR i = (X i ˆµ)/ˆσ, i = 1,..., n, then 1 n n i=1 ˆR 3 i and 1 n n i=1 ˆR 4 i can be used to test for normality, for example. Oja (UTU) FICA Date bottom 21 / 38

48 Descriptive statistics for random vectors and functions Moments of random vectors... Let next X R p be random vector with mean vector with bounded fourth moments, µ = E(X ) R p and covariance matrix Σ = Cov(X ) R p p +. Write now R 2 = (X µ) Σ 1 (X µ) for the standardized (squared Mahalanobis) distance between X and µ. Oja (UTU) FICA Date bottom 22 / 38

49 Descriptive statistics for random vectors and functions Moments of random vectors... Let next X R p be random vector with mean vector with bounded fourth moments, µ = E(X ) R p and covariance matrix Σ = Cov(X ) R p p +. Write now R 2 = (X µ) Σ 1 (X µ) for the standardized (squared Mahalanobis) distance between X and µ. More generally, location and scatter functionals, T (X ) R p and C(X ) R p p are functionals that are affine equivariant in the sense that T (AX + b) = AT (X ) + b and C(AX ) = AC(X )A, for all X, A R p p and b R p One-step M-estimates T w (X ) = E(w(R 2 )X )/E(w(R 2 )) and C w (X ) = Cov(w(R 2 )X ) are then often used to find robust location and scatter functionals. Oja (UTU) FICA Date bottom 22 / 38

50 Descriptive statistics for random vectors and functions Moments of random vectors... Let next X R p be random vector with mean vector with bounded fourth moments, µ = E(X ) R p and covariance matrix Σ = Cov(X ) R p p +. Write now R 2 = (X µ) Σ 1 (X µ) for the standardized (squared Mahalanobis) distance between X and µ. More generally, location and scatter functionals, T (X ) R p and C(X ) R p p are functionals that are affine equivariant in the sense that T (AX + b) = AT (X ) + b and C(AX ) = AC(X )A, for all X, A R p p and b R p One-step M-estimates T w (X ) = E(w(R 2 )X )/E(w(R 2 )) and C w (X ) = Cov(w(R 2 )X ) are then often used to find robust location and scatter functionals. Kurtosis studies are often made using the functionals C(X ) := E(((X E(X ))(X E(X )) ) 2 ) and Cov(X ) = E((X E(X ))(X E(X )) ). Note C(X ) is not affine equivariant but an affine equivariant version is obtained through Cov 4 (X ) := Cov(R 2 X ) = Cov(X ) 1/2 C(Cov(X ) 1/2 X )Cov(X ) 1/2. These two matrices are also used for the solution of the independence component analysis (ICA) problem as the eigenvectors and eigenvalues of Cov(X ) 1 Cov 4 (X ) give both directional and quantitative information about the kurtosis. Oja (UTU) FICA Date bottom 22 / 38

51 Descriptive statistics for random vectors and functions... and their estimates when n > p + 1 Let then X 1,..., X n be a random sample from a distribution of X. Natural estimates of µ and Σ are ˆµ = 1 n n i=1 X i and ˆΣ = 1 n n i=1 (X i ˆµ)(X i ˆµ). If ˆR i 2 = (X i ˆµ) ˆΣ 1 (X i ˆµ), i = 1,..., n, then one-step M-estimates are obtained as ˆµ = n i=1 w T ( ˆR i 2)X i / n i=1 w T ( ˆR i 2 ) and n i=1 w S ( ˆR i 2)(X i ˆµ)(X i ˆµ). Then or, equivalently, Cov 4 (X ) = n ˆR i 2 (X i ˆµ)(X i ˆµ), i=1 [ ] Cov 4 (X ) = ˆΣ 1/2 1 n (ˆΣ 1/2 (X i ˆµ)(X i ˆµ) ˆΣ 1/2) 2 ˆΣ 1/2. n i=1 Oja (UTU) FICA Date bottom 23 / 38

52 Descriptive statistics for random vectors and functions... and their estimates when n p + 1 Tyler (21) has shown that, if n p + 1, all location statistics (with the affine equivariance property) are equal to the sample mean vector, an all scatter statistics (with the affine equivariance property) are proportional to the sample sample covariance matrix. The rank of the covariance matrix is n 1. For n = p + 1, ˆΣ is still ( invertible and maximal invariant statistic under affine transformations, that is, the matrix (X i ˆµ) ˆΣ ) 1 (X j ˆµ) is a constant. Therefore the requirement of affine equivariance/invariance is not sensible for n p + 1! Oja (UTU) FICA Date bottom 24 / 38

53 Descriptive statistics for random vectors and functions... and their estimates when n p + 1 Tyler (21) has shown that, if n p + 1, all location statistics (with the affine equivariance property) are equal to the sample mean vector, an all scatter statistics (with the affine equivariance property) are proportional to the sample sample covariance matrix. The rank of the covariance matrix is n 1. For n = p + 1, ˆΣ is still ( invertible and maximal invariant statistic under affine transformations, that is, the matrix (X i ˆµ) ˆΣ ) 1 (X j ˆµ) is a constant. Therefore the requirement of affine equivariance/invariance is not sensible for n p + 1! If the matrix of eigenvalues of Cov(X ) is decomposed as U = (U 1, U 2 ) where U 1 R p k, k < n, and C(X ) is a robust scatter matrix then a robust, orthogonally equivariant functional can be defined as U 1 C(U 1 X )U 1 + U 2Cov(U 2 X )U 2 and this can be applied to the data when k < n p + 1. The idea is then that outliers are hiding in the subspace given by the k first eigenvectors Cov(X ). Oja (UTU) FICA Date bottom 24 / 38

54 Descriptive statistics for random vectors and functions... and their estimates when n p + 1 Tyler (21) has shown that, if n p + 1, all location statistics (with the affine equivariance property) are equal to the sample mean vector, an all scatter statistics (with the affine equivariance property) are proportional to the sample sample covariance matrix. The rank of the covariance matrix is n 1. For n = p + 1, ˆΣ is still ( invertible and maximal invariant statistic under affine transformations, that is, the matrix (X i ˆµ) ˆΣ ) 1 (X j ˆµ) is a constant. Therefore the requirement of affine equivariance/invariance is not sensible for n p + 1! If the matrix of eigenvalues of Cov(X ) is decomposed as U = (U 1, U 2 ) where U 1 R p k, k < n, and C(X ) is a robust scatter matrix then a robust, orthogonally equivariant functional can be defined as U 1 C(U 1 X )U 1 + U 2Cov(U 2 X )U 2 and this can be applied to the data when k < n p + 1. The idea is then that outliers are hiding in the subspace given by the k first eigenvectors Cov(X ). Kurtosis studies for X are then made using for example Cov(X ) and Cov 4,k (X ) = U 1 Cov 4 (U 1 X )U 1 + U 2Cov(U 2 X )U 2. Oja (UTU) FICA Date bottom 24 / 38

55 Descriptive statistics for random vectors and functions Moment operators of random functions... Let B be a the Borel σ-field generated by the open sets in H. Let then X be a random element in H such that E X 4 <. We now define the mean function and the covariance operator of X, again denoted by E(X ) and Cov(X ). Oja (UTU) FICA Date bottom 2 / 38

56 Descriptive statistics for random vectors and functions Moment operators of random functions... Let B be a the Borel σ-field generated by the open sets in H. Let then X be a random element in H such that E X 4 <. We now define the mean function and the covariance operator of X, again denoted by E(X ) and Cov(X ). By the Riesz representation theorem, there is a function E(X ) H such that f, E(X ) = E f, X, f H Naturally, E(X E(X )) = (constant zero function). Assume next that E(X ) =. Oja (UTU) FICA Date bottom 2 / 38

57 Descriptive statistics for random vectors and functions Moment operators of random functions... Let B be a the Borel σ-field generated by the open sets in H. Let then X be a random element in H such that E X 4 <. We now define the mean function and the covariance operator of X, again denoted by E(X ) and Cov(X ). By the Riesz representation theorem, there is a function E(X ) H such that f, E(X ) = E f, X, f H Naturally, E(X E(X )) = (constant zero function). Assume next that E(X ) =. The covariance operator Cov(X ) : H H is defined by Cov(X )(f ) = E ( f, X X ), f H. Again, using the Riesz representation theorem, the expected value of a random operator A : H H can be defined via f, E(A)g = E f, Ag, f H and we can then also write Cov(X ) = E ((X E(X )) (X E(X ))). As in the vector case, we also define a functional based on fourth moments, namely, C(X ) = E ( (X E(X )) (X E(X )) 2) = E ( X E(X ) 2 (X E(X )) (X E(X )) ). Oja (UTU) FICA Date bottom 2 / 38

58 Descriptive statistics for random vectors and functions... and their properties I and... It is easy to see that E(X ) and Cov(X ) are affine equivariant in the sense that E(AX + b) = AE(X ) + b and Cov(AX + b) = ACov(X )A, X H, A L and b H. Again, the functional C(X ) is equivariant only under transformations x Ux + b with unitary U, and then C(UX + b) = UC(X )U. Oja (UTU) FICA Date bottom 26 / 38

59 Descriptive statistics for random vectors and functions... and their properties I and... It is easy to see that E(X ) and Cov(X ) are affine equivariant in the sense that E(AX + b) = AE(X ) + b and Cov(AX + b) = ACov(X )A, X H, A L and b H. Again, the functional C(X ) is equivariant only under transformations x Ux + b with unitary U, and then C(UX + b) = UC(X )U. Cov(X ) is an integral operator : (Cov(X )f )(t) = K(t, s)f (s)ds, where K(t, s) = Cov(X (t), X (s)) Cov(X ) is symmetric and positive definite and, if λ j, e j, j = 1, 2,..., are the eigenvalues and eigenfunctions of Cov(X ), λ 1 λ 2... and j λ j <, the eigenvector eigenfunction decomposition of Cov(X ) is Cov(X ) = λ j e j e j. j=1 Oja (UTU) FICA Date bottom 26 / 38

60 Descriptive statistics for random vectors and functions... and properties II... For a chosen k, write P k = k e i e i and Q k = i=1 i=k+1 and consider the projected (finite dimensional) random variable Further write Σ k := Cov(X k ) = k i=1 k X k := ( e i e i )X. i=1 λ i e i e i and Σ ±1/2 k := e i e i k i=1 λ ±1/2 i e i e i Finally, Cov 4,k (X ) = Σ 1/2 k C(Σ 1/2 k X )Σ 1/2 k + Cov(Q k X ) Oja (UTU) FICA Date bottom 27 / 38

61 Descriptive statistics for random vectors and functions... and their estimates Let then X 1,..., X n be a random sample from a distribution of X. Let ˆX H be a random function with possible values X 1,..., X n with equal probabilities 1. Natural estimates of n µ = E(X ) H and Σ = Cov(X ) L are ˆµ = E( ˆX ) = 1 n X i and ˆΣ = Cov( ˆX ) = 1 n (X i ˆµ) (X i ˆµ). n n i=1 i=1 Oja (UTU) FICA Date bottom 28 / 38

62 Descriptive statistics for random vectors and functions... and their estimates Let then X 1,..., X n be a random sample from a distribution of X. Let ˆX H be a random function with possible values X 1,..., X n with equal probabilities 1. Natural estimates of n µ = E(X ) H and Σ = Cov(X ) L are ˆµ = E( ˆX ) = 1 n X i and ˆΣ = Cov( ˆX ) = 1 n (X i ˆµ) (X i ˆµ). n n i=1 i=1 Note that ˆµ and ˆΣ are both located in finite dimensional subspaces, and ˆΣ has at most n 1 positive eigenvalues ˆλ 1... ˆλ n 1 with corresponding eigenfunctions so that n 1 ˆΣ = ˆλ i ê i ê i i=1 For n > k, one can again use estimates such as Cov 4,k (X ) = ˆΣ 1/2 k C(ˆΣ 1/2 k ˆX )ˆΣ 1/2 k + Cov( ˆQ k ˆX ) Oja (UTU) FICA Date bottom 28 / 38

63 Independent component analysis (ICA) ICA for a random vector Definition I We say that X R p follows independent component model if there exists a full-rank A R p p such that the components of AX are independent. Oja (UTU) FICA Date bottom 29 / 38

64 Independent component analysis (ICA) ICA for a random vector Definition I We say that X R p follows independent component model if there exists a full-rank A R p p such that the components of AX are independent. For X with high dimension p and covariance matrix Σ = p i=1 d i e i e i, d 1... d p, principal component analysis is often used as a preliminary step. Using principal components Z i = d i e i, X, one can write X = X 1 + X 2 := k i=1 Z i e i + p i=k+1 Z i e i. One then has the following model in mind. Oja (UTU) FICA Date bottom 29 / 38

65 Independent component analysis (ICA) ICA for a random vector Definition I We say that X R p follows independent component model if there exists a full-rank A R p p such that the components of AX are independent. For X with high dimension p and covariance matrix Σ = p i=1 d i e i e i, d 1... d p, principal component analysis is often used as a preliminary step. Using principal components Z i = d i e i, X, one can write X = X 1 + X 2 := k i=1 Z i e i + p i=k+1 Z i e i. One then has the following model in mind. Definition II We say that X R p follows independent component model with at most k non-gaussian components if 1 X 1 and X 2 are independent, 2 X 2 is Gaussian, and 3 there exist a full-rank A R k k such that the components of A(Z 1,..., Z k ) are independent. Oja (UTU) FICA Date bottom 29 / 38

66 Independent component analysis (ICA) ICA for functional data Let X be random function with the covariance operator Cov(X ) = i=1 λ i e i e i. The variables Z i = Λ i e i, X, i = 1, 2,... are then uncorrelated, and, by Karhunen-Loeve expansion, X = X 1 + X 2 := k Z i e i + i=1 As in the vector valued case, we then give the following. i=k+1 Z i e i Oja (UTU) FICA Date bottom 3 / 38

67 Independent component analysis (ICA) ICA for functional data Let X be random function with the covariance operator Cov(X ) = i=1 λ i e i e i. The variables Z i = Λ i e i, X, i = 1, 2,... are then uncorrelated, and, by Karhunen-Loeve expansion, X = X 1 + X 2 := k Z i e i + i=1 As in the vector valued case, we then give the following. i=k+1 Z i e i Definition III We say that X H follows independent component model with at most k non-gaussian components if 1 X 1 and X 2 are independent, 2 X 2 is Gaussian, and 3 there exist a full-rank A R k k such that the components of A(Z 1,..., Z k ) are independent. Then X 1 = k Z l=1 l ( k j=1 A 1 jl e j ) where ( Z 1,..., Z k ) = A(Z 1,..., Z k ) is the vector of independent components. Oja (UTU) FICA Date bottom 3 / 38

68 Real data analysis Steps in data analysis 1 One observes typically X i = (X i (t 1 ),..., X i (t mi ), i = 1,..., n. 2 Choose orthonormal basis functions f 1,..., f K, K > n, and transform/project X i X i = K X ij f j, i = 1,..., n. 3 Find the sample covariance operator and its eigendecomposition j=1 min(n 1,K) Cov(X ) = ˆλ j û j û j i=1 4 Write Z ij = û j, X i, i = 1,..., n; j = 1,..., min(n 1, K), and X i = X i1 + X i2 := k j=1 min(n 1,K) Z ij û j + Z ij û j, j=k+1 i = 1,..., n Find Ĉov and Ĉov 4 for (Z i1,..., Z ik ), i = 1,..., n. 6 FOBI : Find A R k k such that AĈovA = I k and AĈov 4 A Oja (UTU) FICA Date bottom 31 / 38

69 Real data analysis Australian weather data (Analysis by Germain Van Bever) Precipitations (mm) Figure: Australian weather data (analysed in Delaigle and Hall, 21). Daily measurements in 19 weather stations from 184 to 199 (length of records vary from one station to the other). day Oja (UTU) FICA Date bottom 32 / 38

70 Real data analysis Australian weather data Precipitations (mm) day One outlier deleted from the dataset PC explained variances = (.73,.21,.3,.1) PC1 : summer/winter behaviour. Oja (UTU) FICA Date bottom 33 / 38

71 Real data analysis First principal scores Histogram of first principal scores Density PC scores Figure: Left : Histogram of the first principal score. Right : k = 4 first principal functions. Oja (UTU) FICA Date bottom 34 / 38

72 Real data analysis Fourth FOBI scores Histogram of fourth FOBI scores Density FOBI scores Figure: Histogram of the fourth FOBI score. Right : k = 4 FFOBI functions. Oja (UTU) FICA Date bottom 3 / 38

73 Real data analysis Locations Figure: Locations of the 19 Australian weather stations together with their classification based on 2-means clustering on first principal score (left) and fourth FOBI score (right). Oja (UTU) FICA Date bottom 36 / 38

74 Real data analysis Locations Figure: Locations of the 19 Australian weather stations together with their classification based on 2-means clustering on first principal score (left) and fourth FOBI score (right). Oja (UTU) FICA Date bottom 36 / 38

75 Real data analysis Concluding remarks To be given... Oja (UTU) FICA Date bottom 37 / 38

76 References Key References Bosq, D. (2). Linear Processes in Function Spaces. Springer, New York. Conway, J.B. (199). A Course in Functional Analysis. Springer, New York. Horváth, L. and Kokoszka, P. (212). Inference for Functional Data with Applications Springer, New York. Li, G., Van Bever, G., Oja, H., Sabolova, R. & Critchley, F. (216). Functional independent component analysis : an extension of fourth-order blind identification. Submitted. Miettinen, J., Nordhausen, K., Oja, H. and Taskinen, S. (214). Fourth moments and independent components. Statistical Science. Nordhausen, K., Oja, H. & Tyler, D.E. (28). Tools for exploring multivariate data : The package ICS. Journal of Statistical Software, 28, Peña, D., Prieto, J. and Rendón, C. (214). Independent components techniques based on kurtosis for functional data analysis. Working Paper 14-1, Statistics and Econometrics Series (6). Ramsay, J.D. and Silverman, B.W. (2). Functional Data Analysis. Springer, New York. Tyler, D. E. (21). A note on multivariate location and scatter statistics for sparse data sets Statistics and Probability Letters,8, Tyler, D. E., Critchley, F., Dümbgen, L. & Oja, H. (29). Invariant co-ordinate selection. Journal of the Royal Statistical Society, Series B, 71, Wang,J-L., Chiou,J-M. and Müller, H-G. (216). Functional Data Analysis. Annual Review of Statistics and Its Application, 3, Oja (UTU) FICA Date bottom 38 / 38

Invariant coordinate selection for multivariate data analysis - the package ICS

Invariant coordinate selection for multivariate data analysis - the package ICS Klaus Nordhausen 1 Hannu Oja 1 David E. Tyler 2 1 Tampere School of Public Health University of Tampere 2 Department of Statistics