Independent component analysis for functional data

Size: px
Start display at page:

Download "Independent component analysis for functional data"

Transcription

1 Independent component analysis for functional data Hannu Oja Department of Mathematics and Statistics University of Turku Version August 216 Oja (UTU) FICA Date bottom 1 / 38

2 Outline 1 Probability space, random variables, random functions 2 Affine (linear) transformations and linear operators 3 Descriptive statistics for random vectors and functions 4 Independent component analysis (ICA) Real data analysis 6 References Oja (UTU) FICA Date bottom 2 / 38

3 Acknowledgement The presentation is very much based on the paper Li, G., Van Bever, G., Oja, H., Sabolova, R. and Critchley, F. (216). Functional independent component analysis : an extension of fourth-order blind identification. Submitted. Oja (UTU) FICA Date bottom 3 / 38

4 Acknowledgement The presentation is very much based on the paper Li, G., Van Bever, G., Oja, H., Sabolova, R. and Critchley, F. (216). Functional independent component analysis : an extension of fourth-order blind identification. Submitted. See also Peña, D., Prieto, J. and Rendón, C. (214). Independent components techniques based on kurtosis for functional data analysis. Working Paper 14-1, Statistics and Econometrics Series (6). Oja (UTU) FICA Date bottom 3 / 38

5 Prologue : PCA vs. ICA in R p Let X R p be a random vector. Assume that E(X ) =. In the principal component analysis (PCA), one then finds a transformation matrix A R p p such that AA = I p and AE(XX )A = D where D is a diagonal matrix with diagonal elements d 1... d p. Oja (UTU) FICA Date bottom 4 / 38

6 Prologue : PCA vs. ICA in R p Let X R p be a random vector. Assume that E(X ) =. In the principal component analysis (PCA), one then finds a transformation matrix A R p p such that AA = I p and AE(XX )A = D where D is a diagonal matrix with diagonal elements d 1... d p. In the independent component analysis (ICA), fourth order blind separation (FOBI) finds a transformation matrix A R p p such that AE(XX )A = I p and AE ( XX E(XX ) 1 XX ) A = D where the diagonal elements D are ordered according to marginal kurtosis : d 1... d p. Oja (UTU) FICA Date bottom 4 / 38

7 Prologue : PCA vs. ICA in R p Let X R p be a random vector. Assume that E(X ) =. In the principal component analysis (PCA), one then finds a transformation matrix A R p p such that AA = I p and AE(XX )A = D where D is a diagonal matrix with diagonal elements d 1... d p. In the independent component analysis (ICA), fourth order blind separation (FOBI) finds a transformation matrix A R p p such that AE(XX )A = I p and AE ( XX E(XX ) 1 XX ) A = D where the diagonal elements D are ordered according to marginal kurtosis : d 1... d p. Idea : Transform X AX. Some of the components present signal, some present noise! Oja (UTU) FICA Date bottom 4 / 38

8 Iris data - PCA Sepal.Length Sepal.Width Petal.Length Petal.Width Oja (UTU) FICA Date bottom / 38

9 Iris data - PCA Sepal.Length Comp Sepal.Width Comp.2 Petal.Length Comp Petal.Width Comp Figure: Classical Iris data with three species : Left panel original data, right panel principal components. Oja (UTU) FICA Date bottom / 38

10 Iris data - ICA Sepal.Length Sepal.Width Petal.Length Petal.Width Oja (UTU) FICA Date bottom 6 / 38

11 Iris data - ICA Sepal.Length IC Sepal.Width IC.2 Petal.Length IC Petal.Width IC Figure: Classical Iris data with three species : Left panel original data, right panel independent components. Oja (UTU) FICA Date bottom 6 / 38

12 Toy data I - PCA V1 1 V2 1 1 V3 1 V4 V V Oja (UTU) 1 FICA Date bottom 7 / 38

13 Toy data I - PCA V V1 V2 1 1 V2 1 V V3 2 V4 V V V4 V6 1. V Figure: Left panel original data, right panel principal components. Oja (UTU) FICA Date bottom 7 / 38

14 Toy data I - ICA V1 1 V2 1 1 V3 1 V4 V V Oja (UTU) 1 FICA Date bottom 8 / 38

15 Toy data I - ICA V1 3 1 V1 V2 3 1 V V V2 V4 1 V V V4 V6 3 V Figure: Left panel original data, right panel independent components. Oja (UTU) FICA Date bottom 8 / 38

16 Toy data II - PCA V1 4 V2 4 V V V 4 V6 4 Oja (UTU) FICA Date bottom 9 / 38

17 Toy data II - PCA V1 V1 V V2 V V3 V4 2 V V V4 V V Figure: Left panel original data, right panel principal components. Oja (UTU) FICA Date bottom 9 / 38

18 Toy data II - ICA V1 4 V2 4 V V V 4 V6 4 Oja (UTU) FICA Date bottom 1 / 38

19 Toy data II - ICA V1 1 3 V1 V V3 V V2 V4 2 V V V4 V V Figure: Left panel original data, right panel independent components. Oja (UTU) FICA Date bottom 1 / 38

20 Toy data III - PCA var var 2 var var 4 var Oja (UTU) FICA Date bottom 11 / 38

21 Toy data III - PCA var 1 var 2 var Comp.1 Comp.2 Comp var Comp.4 var Comp Figure: Left panel original data, right panel principal components. Oja (UTU) FICA Date bottom 11 / 38

22 Toy data III - ICA var var 2 var var 4 var Oja (UTU) FICA Date bottom 12 / 38

23 Toy data III - ICA var IC var 2 var IC.2 IC var IC.4 var IC Figure: Left panel original data, right panel independent components. Oja (UTU) FICA Date bottom 12 / 38

24 Probability space, random variables, random functions Probability space Let (Ω, F, P) be a probability space for a random experiment that is planned to collect functional data. Functional observations are assumed to belong to H = L 2 (, 1), a separable Hilbert space with the scalar product 1 scalar product f, g = f (t)g(t)dt, and norm f = f, f 1/2. Oja (UTU) FICA Date bottom 13 / 38

25 Probability space, random variables, random functions Probability space Let (Ω, F, P) be a probability space for a random experiment that is planned to collect functional data. Functional observations are assumed to belong to H = L 2 (, 1), a separable Hilbert space with the scalar product 1 scalar product f, g = f (t)g(t)dt, and norm f = f, f 1/2. Let B be the Borel σ-field generated by the open sets in H. A random function on H is then a measurable mapping X : Ω H that is F/B measurable. Oja (UTU) FICA Date bottom 13 / 38

26 Probability space, random variables, random functions Basis functions In the separable Hilbert space H = L 2 (, 1), there exists an orthonormal basis (f j ), that is, fj, f k = δjk such that K H = {x : x = x j f j } H K = {x : x = x j f j }. j=1 j=1 Oja (UTU) FICA Date bottom 14 / 38

27 Probability space, random variables, random functions Basis functions In the separable Hilbert space H = L 2 (, 1), there exists an orthonormal basis (f j ), that is, fj, f k = δjk such that K H = {x : x = x j f j } H K = {x : x = x j f j }. j=1 j=1 In practice, one chooses orthonormal basis functions (g j ) such as (i) Fourier basis functions, (ii) spline functions, (iii) wavelets, (iv) basis functions based on kernel smoothing of data functions, etc. The data analysis is then made in the space H = {x : x = x j g j } j=1 or in the truncated space K H K = {x : x = x j g j }. j=1 Oja (UTU) FICA Date bottom 14 / 38

28 Probability space, random variables, random functions Random vectors and random functions We consider three types of random elements, namely random variables, that is, mappings X : Ω R, random vectors, that is, mappings X : Ω R p, and random functions, mappings X : Ω H, that are F/B-measurable with relevant Borel σ-fields.. Oja (UTU) FICA Date bottom 1 / 38

29 Probability space, random variables, random functions Random vectors and random functions We consider three types of random elements, namely random variables, that is, mappings X : Ω R, random vectors, that is, mappings X : Ω R p, and random functions, mappings X : Ω H, that are F/B-measurable with relevant Borel σ-fields.. In our experiment (Ω, F, P), to estimate population quantities, we collect random samples, that is, iid observations X 1,..., X n, measurable mappings X i : Ω R, i = 1,..., n measurable mapping X i : Ω R p, i = 1,..., n, and measurable mappings X i : Ω H, i = 1,..., n. Oja (UTU) FICA Date bottom 1 / 38

30 Affine (linear) transformations and linear operators Affine transformations for random variables For a univariate random variable, linear transformations are given by X Y = ax + b, a, b R. For a p-variate random vector X R p, affine or linear transformations are given by X Y = AX + b, A R k p, b R k.. Oja (UTU) FICA Date bottom 16 / 38

31 Affine (linear) transformations and linear operators Affine transformations for random variables For a univariate random variable, linear transformations are given by X Y = ax + b, a, b R. For a p-variate random vector X R p, affine or linear transformations are given by X Y = AX + b, A R k p, b R k.. Some important square matrices often used for affine transformations : Centering matrix A = I p 1 p 1p1 p. Rescaling matrix is a diagonal matrix with positive diagonal elements. Orthogonal matrix A satisfying A A = AA = I p. Sign-change and permutation matrices are orthogonal. Projection matrix A satisfying A = A and A 2 = A. Then A = UU where U R k p has orthonormal columns. Oja (UTU) FICA Date bottom 16 / 38

32 Affine (linear) transformations and linear operators Eigenvector-eigenvalue decomposition Let A R p p be symmetric, e.g. a covariance matrix. Then one can write (eigenvector-eigenvalue decomposition) that A = UDU = p d i u i u i, where U = (u 1 u p) is an orthogonal matrix and D is a diagonal matrix with ordered diagonal elements d 1 d 2 d p. i=1 Oja (UTU) FICA Date bottom 17 / 38

33 Affine (linear) transformations and linear operators Eigenvector-eigenvalue decomposition Let A R p p be symmetric, e.g. a covariance matrix. Then one can write (eigenvector-eigenvalue decomposition) that A = UDU = p d i u i u i, where U = (u 1 u p) is an orthogonal matrix and D is a diagonal matrix with ordered diagonal elements d 1 d 2 d p. The columns of U = (u 1 u p) are called the eigenvectors and the diagonal elements d 1,..., d p corresponding eigenvalues of A. i=1 Oja (UTU) FICA Date bottom 17 / 38

34 Affine (linear) transformations and linear operators Eigenvector-eigenvalue decomposition Let A R p p be symmetric, e.g. a covariance matrix. Then one can write (eigenvector-eigenvalue decomposition) that A = UDU = p d i u i u i, where U = (u 1 u p) is an orthogonal matrix and D is a diagonal matrix with ordered diagonal elements d 1 d 2 d p. The columns of U = (u 1 u p) are called the eigenvectors and the diagonal elements d 1,..., d p corresponding eigenvalues of A. Note that det(a) = p d i, tr(a) = i=1 i=1 p d i and A 2 = i=1 p di 2. i=1. Oja (UTU) FICA Date bottom 17 / 38

35 Affine (linear) transformations and linear operators Eigenvector-eigenvalue decomposition Let A R p p be symmetric, e.g. a covariance matrix. Then one can write (eigenvector-eigenvalue decomposition) that A = UDU = p d i u i u i, where U = (u 1 u p) is an orthogonal matrix and D is a diagonal matrix with ordered diagonal elements d 1 d 2 d p. The columns of U = (u 1 u p) are called the eigenvectors and the diagonal elements d 1,..., d p corresponding eigenvalues of A. Note that det(a) = p d i, tr(a) = i=1 i=1 p d i and A 2 = i=1 p di 2. i=1. A symmetric matrix A R p p is positive (non-negative) definite if the smallest eigenvalue d p > ( ). Then b Ab > ( ), for all nonzero b R p. For a symmetric matrix A = p i=1 d i u i u i with non-zero eigenvalues, the inverse is A 1 = p i=1 d i 1 u i u i. For a p p symmetric matrix A, a square root matrix A 1/2 is any matrix that satisfies A 1/2 (A 1/2 ) = A. Often, a symmetric version of A 1/2 is chosen and then for example A 1/2 = p i=1 d 1/2 i u i u i. Oja (UTU) FICA Date bottom 17 / 38

36 Affine (linear) transformations and linear operators Singular value decomposition (SVD) Next let A be any k p matrix, and assume that k p. Then it can be written (SVD) as A = UDV, where U R k k is orthogonal, V R p k has orthonormal columns (V V = I k ) and D R k k is a diagonal matrix with diagonal elements d 1... d k. Oja (UTU) FICA Date bottom 18 / 38

37 Affine (linear) transformations and linear operators Singular value decomposition (SVD) Next let A be any k p matrix, and assume that k p. Then it can be written (SVD) as A = UDV, where U R k k is orthogonal, V R p k has orthonormal columns (V V = I k ) and D R k k is a diagonal matrix with diagonal elements d 1... d k. Then k AA = UD 2 U, A A = VD 2 V and A 2 = tr(aa ) = di 2. i=1 Oja (UTU) FICA Date bottom 18 / 38

38 Affine (linear) transformations and linear operators Singular value decomposition (SVD) Next let A be any k p matrix, and assume that k p. Then it can be written (SVD) as A = UDV, where U R k k is orthogonal, V R p k has orthonormal columns (V V = I k ) and D R k k is a diagonal matrix with diagonal elements d 1... d k. Then k AA = UD 2 U, A A = VD 2 V and A 2 = tr(aa ) = di 2. i=1 If U = (u 1,..., u k ) and V = (v 1,..., v k ) then A = k i=1 d i u i v i and k x y = Ax = d i v i, x u i i=1 may be seen as a change of a coordinate system with componentwise rescalings. For a projection matrix A R p p with rank k (p k zero eigenvalues), we get k A = UU = u i u i i=1 with some U R p k with orthonormal columns. Oja (UTU) FICA Date bottom 18 / 38

39 Affine (linear) transformations and linear operators Linear operators for functional data I In functional data analysis, we speak about linear operators rather than matrices. Let L be the set of bounded linear operators A : H H with norm A L = sup Ax, A L. x =1 (Linearity means that A(ax + by) = aax + bay, a, b R and x.y H.) Oja (UTU) FICA Date bottom 19 / 38

40 Affine (linear) transformations and linear operators Linear operators for functional data I In functional data analysis, we speak about linear operators rather than matrices. Let L be the set of bounded linear operators A : H H with norm A L = sup Ax, A L. x =1 (Linearity means that A(ax + by) = aax + bay, a, b R and x.y H.)An integral operator L with real kernel K : I I R given by 1 (Lf )(t) = K(t, s)f (s)ds, f H is a linear operator. Oja (UTU) FICA Date bottom 19 / 38

41 Affine (linear) transformations and linear operators Linear operators for functional data I In functional data analysis, we speak about linear operators rather than matrices. Let L be the set of bounded linear operators A : H H with norm A L = sup Ax, A L. x =1 (Linearity means that A(ax + by) = aax + bay, a, b R and x.y H.)An integral operator L with real kernel K : I I R given by 1 (Lf )(t) = K(t, s)f (s)ds, f H is a linear operator. The adjoint operator A of A L is uniquely defined through Af, g = f, A g, f, gh. Operator A L is symmetric if A = A, that is, if Ax, y = x, Ay, x, y H and positive definite if Ax, x, x H. Operator U is unitary if UU = U U = I, the identity operator. Oja (UTU) FICA Date bottom 19 / 38

42 Affine (linear) transformations and linear operators Linear operators for functional data II The class of Hilbert-Schmidt operators is particularly interesting for our purpose : A linear operator A : H H is Hilbert-Schmidt if it allows a singular value decomposition (SVD) Ax = λ j gj, x f j j=1 with two orthonormal bases (g j ) and (f j ) and a sequence (λ j ) of non-negative real numbers such that j=1 λ2 j <. Oja (UTU) FICA Date bottom 2 / 38

43 Affine (linear) transformations and linear operators Linear operators for functional data II The class of Hilbert-Schmidt operators is particularly interesting for our purpose : A linear operator A : H H is Hilbert-Schmidt if it allows a singular value decomposition (SVD) Ax = λ j gj, x f j j=1 with two orthonormal bases (g j ) and (f j ) and a sequence (λ j ) of non-negative real numbers such that j=1 λ2 j <. Operator A is thus an infinite weighted sum of elemental, tensor product operators f e : H H given by (f e)x = e, x f For min(λ 1,..., λ k ) >, A = k j=1 λ j (f j g j ) : H H give linear transformations to a finite subspace spanned by orthonormal functions f 1,..., f k. Oja (UTU) FICA Date bottom 2 / 38

44 Affine (linear) transformations and linear operators Linear operators for functional data II The class of Hilbert-Schmidt operators is particularly interesting for our purpose : A linear operator A : H H is Hilbert-Schmidt if it allows a singular value decomposition (SVD) Ax = λ j gj, x f j j=1 with two orthonormal bases (g j ) and (f j ) and a sequence (λ j ) of non-negative real numbers such that j=1 λ2 j <. Operator A is thus an infinite weighted sum of elemental, tensor product operators f e : H H given by (f e)x = e, x f For min(λ 1,..., λ k ) >, A = k j=1 λ j (f j g j ) : H H give linear transformations to a finite subspace spanned by orthonormal functions f 1,..., f k. Note also that the projection to the subspace spanned by orthonormal f 1,..., f k is A = k j=1 (f j f j ). For f, g, h H and A, B L, (g f ) = f g, (Ag) (Bf ) = A(g f )B and (h g)(g f ) = g 2 (h f ). Oja (UTU) FICA Date bottom 2 / 38

45 Descriptive statistics for random vectors and functions Moments of random variables Let X R be a univariate random variable with expected value µ = E(X ) and variance σ 2 = Var(X ). Write R = X µ σ for the standardized variable, that is, the standardized (signed) distance between X and µ. Oja (UTU) FICA Date bottom 21 / 38

46 Descriptive statistics for random vectors and functions Moments of random variables Let X R be a univariate random variable with expected value µ = E(X ) and variance σ 2 = Var(X ). Write R = X µ σ for the standardized variable, that is, the standardized (signed) distance between X and µ. Assume also that E(X 4 ) <. Then the classical measures of (univariate) location, scale, skewness and kurtosis are E(X ), Var(X ), E(R 3 ) and E(R 4 ). Kurtosis is then the ratio of two scale measures Var(RX ) and Var(X ), and may be seen through the comparisons of E((X E(X )) 4 ) and E((X E(X )) 2 ). Oja (UTU) FICA Date bottom 21 / 38

47 Descriptive statistics for random vectors and functions Moments of random variables Let X R be a univariate random variable with expected value µ = E(X ) and variance σ 2 = Var(X ). Write R = X µ σ for the standardized variable, that is, the standardized (signed) distance between X and µ. Assume also that E(X 4 ) <. Then the classical measures of (univariate) location, scale, skewness and kurtosis are E(X ), Var(X ), E(R 3 ) and E(R 4 ). Kurtosis is then the ratio of two scale measures Var(RX ) and Var(X ), and may be seen through the comparisons of E((X E(X )) 4 ) and E((X E(X )) 2 ). Let then X 1,..., X n be a random sample from a distribution of X. Natural estimates of µ and σ 2 are then ˆµ = 1 n n i=1 X i and ˆσ 2 = 1 n n i=1 (X i ˆµ) 2. If ˆR i = (X i ˆµ)/ˆσ, i = 1,..., n, then 1 n n i=1 ˆR 3 i and 1 n n i=1 ˆR 4 i can be used to test for normality, for example. Oja (UTU) FICA Date bottom 21 / 38

48 Descriptive statistics for random vectors and functions Moments of random vectors... Let next X R p be random vector with mean vector with bounded fourth moments, µ = E(X ) R p and covariance matrix Σ = Cov(X ) R p p +. Write now R 2 = (X µ) Σ 1 (X µ) for the standardized (squared Mahalanobis) distance between X and µ. Oja (UTU) FICA Date bottom 22 / 38

49 Descriptive statistics for random vectors and functions Moments of random vectors... Let next X R p be random vector with mean vector with bounded fourth moments, µ = E(X ) R p and covariance matrix Σ = Cov(X ) R p p +. Write now R 2 = (X µ) Σ 1 (X µ) for the standardized (squared Mahalanobis) distance between X and µ. More generally, location and scatter functionals, T (X ) R p and C(X ) R p p are functionals that are affine equivariant in the sense that T (AX + b) = AT (X ) + b and C(AX ) = AC(X )A, for all X, A R p p and b R p One-step M-estimates T w (X ) = E(w(R 2 )X )/E(w(R 2 )) and C w (X ) = Cov(w(R 2 )X ) are then often used to find robust location and scatter functionals. Oja (UTU) FICA Date bottom 22 / 38

50 Descriptive statistics for random vectors and functions Moments of random vectors... Let next X R p be random vector with mean vector with bounded fourth moments, µ = E(X ) R p and covariance matrix Σ = Cov(X ) R p p +. Write now R 2 = (X µ) Σ 1 (X µ) for the standardized (squared Mahalanobis) distance between X and µ. More generally, location and scatter functionals, T (X ) R p and C(X ) R p p are functionals that are affine equivariant in the sense that T (AX + b) = AT (X ) + b and C(AX ) = AC(X )A, for all X, A R p p and b R p One-step M-estimates T w (X ) = E(w(R 2 )X )/E(w(R 2 )) and C w (X ) = Cov(w(R 2 )X ) are then often used to find robust location and scatter functionals. Kurtosis studies are often made using the functionals C(X ) := E(((X E(X ))(X E(X )) ) 2 ) and Cov(X ) = E((X E(X ))(X E(X )) ). Note C(X ) is not affine equivariant but an affine equivariant version is obtained through Cov 4 (X ) := Cov(R 2 X ) = Cov(X ) 1/2 C(Cov(X ) 1/2 X )Cov(X ) 1/2. These two matrices are also used for the solution of the independence component analysis (ICA) problem as the eigenvectors and eigenvalues of Cov(X ) 1 Cov 4 (X ) give both directional and quantitative information about the kurtosis. Oja (UTU) FICA Date bottom 22 / 38

51 Descriptive statistics for random vectors and functions... and their estimates when n > p + 1 Let then X 1,..., X n be a random sample from a distribution of X. Natural estimates of µ and Σ are ˆµ = 1 n n i=1 X i and ˆΣ = 1 n n i=1 (X i ˆµ)(X i ˆµ). If ˆR i 2 = (X i ˆµ) ˆΣ 1 (X i ˆµ), i = 1,..., n, then one-step M-estimates are obtained as ˆµ = n i=1 w T ( ˆR i 2)X i / n i=1 w T ( ˆR i 2 ) and n i=1 w S ( ˆR i 2)(X i ˆµ)(X i ˆµ). Then or, equivalently, Cov 4 (X ) = n ˆR i 2 (X i ˆµ)(X i ˆµ), i=1 [ ] Cov 4 (X ) = ˆΣ 1/2 1 n (ˆΣ 1/2 (X i ˆµ)(X i ˆµ) ˆΣ 1/2) 2 ˆΣ 1/2. n i=1 Oja (UTU) FICA Date bottom 23 / 38

52 Descriptive statistics for random vectors and functions... and their estimates when n p + 1 Tyler (21) has shown that, if n p + 1, all location statistics (with the affine equivariance property) are equal to the sample mean vector, an all scatter statistics (with the affine equivariance property) are proportional to the sample sample covariance matrix. The rank of the covariance matrix is n 1. For n = p + 1, ˆΣ is still ( invertible and maximal invariant statistic under affine transformations, that is, the matrix (X i ˆµ) ˆΣ ) 1 (X j ˆµ) is a constant. Therefore the requirement of affine equivariance/invariance is not sensible for n p + 1! Oja (UTU) FICA Date bottom 24 / 38

53 Descriptive statistics for random vectors and functions... and their estimates when n p + 1 Tyler (21) has shown that, if n p + 1, all location statistics (with the affine equivariance property) are equal to the sample mean vector, an all scatter statistics (with the affine equivariance property) are proportional to the sample sample covariance matrix. The rank of the covariance matrix is n 1. For n = p + 1, ˆΣ is still ( invertible and maximal invariant statistic under affine transformations, that is, the matrix (X i ˆµ) ˆΣ ) 1 (X j ˆµ) is a constant. Therefore the requirement of affine equivariance/invariance is not sensible for n p + 1! If the matrix of eigenvalues of Cov(X ) is decomposed as U = (U 1, U 2 ) where U 1 R p k, k < n, and C(X ) is a robust scatter matrix then a robust, orthogonally equivariant functional can be defined as U 1 C(U 1 X )U 1 + U 2Cov(U 2 X )U 2 and this can be applied to the data when k < n p + 1. The idea is then that outliers are hiding in the subspace given by the k first eigenvectors Cov(X ). Oja (UTU) FICA Date bottom 24 / 38

54 Descriptive statistics for random vectors and functions... and their estimates when n p + 1 Tyler (21) has shown that, if n p + 1, all location statistics (with the affine equivariance property) are equal to the sample mean vector, an all scatter statistics (with the affine equivariance property) are proportional to the sample sample covariance matrix. The rank of the covariance matrix is n 1. For n = p + 1, ˆΣ is still ( invertible and maximal invariant statistic under affine transformations, that is, the matrix (X i ˆµ) ˆΣ ) 1 (X j ˆµ) is a constant. Therefore the requirement of affine equivariance/invariance is not sensible for n p + 1! If the matrix of eigenvalues of Cov(X ) is decomposed as U = (U 1, U 2 ) where U 1 R p k, k < n, and C(X ) is a robust scatter matrix then a robust, orthogonally equivariant functional can be defined as U 1 C(U 1 X )U 1 + U 2Cov(U 2 X )U 2 and this can be applied to the data when k < n p + 1. The idea is then that outliers are hiding in the subspace given by the k first eigenvectors Cov(X ). Kurtosis studies for X are then made using for example Cov(X ) and Cov 4,k (X ) = U 1 Cov 4 (U 1 X )U 1 + U 2Cov(U 2 X )U 2. Oja (UTU) FICA Date bottom 24 / 38

55 Descriptive statistics for random vectors and functions Moment operators of random functions... Let B be a the Borel σ-field generated by the open sets in H. Let then X be a random element in H such that E X 4 <. We now define the mean function and the covariance operator of X, again denoted by E(X ) and Cov(X ). Oja (UTU) FICA Date bottom 2 / 38

56 Descriptive statistics for random vectors and functions Moment operators of random functions... Let B be a the Borel σ-field generated by the open sets in H. Let then X be a random element in H such that E X 4 <. We now define the mean function and the covariance operator of X, again denoted by E(X ) and Cov(X ). By the Riesz representation theorem, there is a function E(X ) H such that f, E(X ) = E f, X, f H Naturally, E(X E(X )) = (constant zero function). Assume next that E(X ) =. Oja (UTU) FICA Date bottom 2 / 38

57 Descriptive statistics for random vectors and functions Moment operators of random functions... Let B be a the Borel σ-field generated by the open sets in H. Let then X be a random element in H such that E X 4 <. We now define the mean function and the covariance operator of X, again denoted by E(X ) and Cov(X ). By the Riesz representation theorem, there is a function E(X ) H such that f, E(X ) = E f, X, f H Naturally, E(X E(X )) = (constant zero function). Assume next that E(X ) =. The covariance operator Cov(X ) : H H is defined by Cov(X )(f ) = E ( f, X X ), f H. Again, using the Riesz representation theorem, the expected value of a random operator A : H H can be defined via f, E(A)g = E f, Ag, f H and we can then also write Cov(X ) = E ((X E(X )) (X E(X ))). As in the vector case, we also define a functional based on fourth moments, namely, C(X ) = E ( (X E(X )) (X E(X )) 2) = E ( X E(X ) 2 (X E(X )) (X E(X )) ). Oja (UTU) FICA Date bottom 2 / 38

58 Descriptive statistics for random vectors and functions... and their properties I and... It is easy to see that E(X ) and Cov(X ) are affine equivariant in the sense that E(AX + b) = AE(X ) + b and Cov(AX + b) = ACov(X )A, X H, A L and b H. Again, the functional C(X ) is equivariant only under transformations x Ux + b with unitary U, and then C(UX + b) = UC(X )U. Oja (UTU) FICA Date bottom 26 / 38

59 Descriptive statistics for random vectors and functions... and their properties I and... It is easy to see that E(X ) and Cov(X ) are affine equivariant in the sense that E(AX + b) = AE(X ) + b and Cov(AX + b) = ACov(X )A, X H, A L and b H. Again, the functional C(X ) is equivariant only under transformations x Ux + b with unitary U, and then C(UX + b) = UC(X )U. Cov(X ) is an integral operator : (Cov(X )f )(t) = K(t, s)f (s)ds, where K(t, s) = Cov(X (t), X (s)) Cov(X ) is symmetric and positive definite and, if λ j, e j, j = 1, 2,..., are the eigenvalues and eigenfunctions of Cov(X ), λ 1 λ 2... and j λ j <, the eigenvector eigenfunction decomposition of Cov(X ) is Cov(X ) = λ j e j e j. j=1 Oja (UTU) FICA Date bottom 26 / 38

60 Descriptive statistics for random vectors and functions... and properties II... For a chosen k, write P k = k e i e i and Q k = i=1 i=k+1 and consider the projected (finite dimensional) random variable Further write Σ k := Cov(X k ) = k i=1 k X k := ( e i e i )X. i=1 λ i e i e i and Σ ±1/2 k := e i e i k i=1 λ ±1/2 i e i e i Finally, Cov 4,k (X ) = Σ 1/2 k C(Σ 1/2 k X )Σ 1/2 k + Cov(Q k X ) Oja (UTU) FICA Date bottom 27 / 38

61 Descriptive statistics for random vectors and functions... and their estimates Let then X 1,..., X n be a random sample from a distribution of X. Let ˆX H be a random function with possible values X 1,..., X n with equal probabilities 1. Natural estimates of n µ = E(X ) H and Σ = Cov(X ) L are ˆµ = E( ˆX ) = 1 n X i and ˆΣ = Cov( ˆX ) = 1 n (X i ˆµ) (X i ˆµ). n n i=1 i=1 Oja (UTU) FICA Date bottom 28 / 38

62 Descriptive statistics for random vectors and functions... and their estimates Let then X 1,..., X n be a random sample from a distribution of X. Let ˆX H be a random function with possible values X 1,..., X n with equal probabilities 1. Natural estimates of n µ = E(X ) H and Σ = Cov(X ) L are ˆµ = E( ˆX ) = 1 n X i and ˆΣ = Cov( ˆX ) = 1 n (X i ˆµ) (X i ˆµ). n n i=1 i=1 Note that ˆµ and ˆΣ are both located in finite dimensional subspaces, and ˆΣ has at most n 1 positive eigenvalues ˆλ 1... ˆλ n 1 with corresponding eigenfunctions so that n 1 ˆΣ = ˆλ i ê i ê i i=1 For n > k, one can again use estimates such as Cov 4,k (X ) = ˆΣ 1/2 k C(ˆΣ 1/2 k ˆX )ˆΣ 1/2 k + Cov( ˆQ k ˆX ) Oja (UTU) FICA Date bottom 28 / 38

63 Independent component analysis (ICA) ICA for a random vector Definition I We say that X R p follows independent component model if there exists a full-rank A R p p such that the components of AX are independent. Oja (UTU) FICA Date bottom 29 / 38

64 Independent component analysis (ICA) ICA for a random vector Definition I We say that X R p follows independent component model if there exists a full-rank A R p p such that the components of AX are independent. For X with high dimension p and covariance matrix Σ = p i=1 d i e i e i, d 1... d p, principal component analysis is often used as a preliminary step. Using principal components Z i = d i e i, X, one can write X = X 1 + X 2 := k i=1 Z i e i + p i=k+1 Z i e i. One then has the following model in mind. Oja (UTU) FICA Date bottom 29 / 38

65 Independent component analysis (ICA) ICA for a random vector Definition I We say that X R p follows independent component model if there exists a full-rank A R p p such that the components of AX are independent. For X with high dimension p and covariance matrix Σ = p i=1 d i e i e i, d 1... d p, principal component analysis is often used as a preliminary step. Using principal components Z i = d i e i, X, one can write X = X 1 + X 2 := k i=1 Z i e i + p i=k+1 Z i e i. One then has the following model in mind. Definition II We say that X R p follows independent component model with at most k non-gaussian components if 1 X 1 and X 2 are independent, 2 X 2 is Gaussian, and 3 there exist a full-rank A R k k such that the components of A(Z 1,..., Z k ) are independent. Oja (UTU) FICA Date bottom 29 / 38

66 Independent component analysis (ICA) ICA for functional data Let X be random function with the covariance operator Cov(X ) = i=1 λ i e i e i. The variables Z i = Λ i e i, X, i = 1, 2,... are then uncorrelated, and, by Karhunen-Loeve expansion, X = X 1 + X 2 := k Z i e i + i=1 As in the vector valued case, we then give the following. i=k+1 Z i e i Oja (UTU) FICA Date bottom 3 / 38

67 Independent component analysis (ICA) ICA for functional data Let X be random function with the covariance operator Cov(X ) = i=1 λ i e i e i. The variables Z i = Λ i e i, X, i = 1, 2,... are then uncorrelated, and, by Karhunen-Loeve expansion, X = X 1 + X 2 := k Z i e i + i=1 As in the vector valued case, we then give the following. i=k+1 Z i e i Definition III We say that X H follows independent component model with at most k non-gaussian components if 1 X 1 and X 2 are independent, 2 X 2 is Gaussian, and 3 there exist a full-rank A R k k such that the components of A(Z 1,..., Z k ) are independent. Then X 1 = k Z l=1 l ( k j=1 A 1 jl e j ) where ( Z 1,..., Z k ) = A(Z 1,..., Z k ) is the vector of independent components. Oja (UTU) FICA Date bottom 3 / 38

68 Real data analysis Steps in data analysis 1 One observes typically X i = (X i (t 1 ),..., X i (t mi ), i = 1,..., n. 2 Choose orthonormal basis functions f 1,..., f K, K > n, and transform/project X i X i = K X ij f j, i = 1,..., n. 3 Find the sample covariance operator and its eigendecomposition j=1 min(n 1,K) Cov(X ) = ˆλ j û j û j i=1 4 Write Z ij = û j, X i, i = 1,..., n; j = 1,..., min(n 1, K), and X i = X i1 + X i2 := k j=1 min(n 1,K) Z ij û j + Z ij û j, j=k+1 i = 1,..., n Find Ĉov and Ĉov 4 for (Z i1,..., Z ik ), i = 1,..., n. 6 FOBI : Find A R k k such that AĈovA = I k and AĈov 4 A Oja (UTU) FICA Date bottom 31 / 38

69 Real data analysis Australian weather data (Analysis by Germain Van Bever) Precipitations (mm) Figure: Australian weather data (analysed in Delaigle and Hall, 21). Daily measurements in 19 weather stations from 184 to 199 (length of records vary from one station to the other). day Oja (UTU) FICA Date bottom 32 / 38

70 Real data analysis Australian weather data Precipitations (mm) day One outlier deleted from the dataset PC explained variances = (.73,.21,.3,.1) PC1 : summer/winter behaviour. Oja (UTU) FICA Date bottom 33 / 38

71 Real data analysis First principal scores Histogram of first principal scores Density PC scores Figure: Left : Histogram of the first principal score. Right : k = 4 first principal functions. Oja (UTU) FICA Date bottom 34 / 38

72 Real data analysis Fourth FOBI scores Histogram of fourth FOBI scores Density FOBI scores Figure: Histogram of the fourth FOBI score. Right : k = 4 FFOBI functions. Oja (UTU) FICA Date bottom 3 / 38

73 Real data analysis Locations Figure: Locations of the 19 Australian weather stations together with their classification based on 2-means clustering on first principal score (left) and fourth FOBI score (right). Oja (UTU) FICA Date bottom 36 / 38

74 Real data analysis Locations Figure: Locations of the 19 Australian weather stations together with their classification based on 2-means clustering on first principal score (left) and fourth FOBI score (right). Oja (UTU) FICA Date bottom 36 / 38

75 Real data analysis Concluding remarks To be given... Oja (UTU) FICA Date bottom 37 / 38

76 References Key References Bosq, D. (2). Linear Processes in Function Spaces. Springer, New York. Conway, J.B. (199). A Course in Functional Analysis. Springer, New York. Horváth, L. and Kokoszka, P. (212). Inference for Functional Data with Applications Springer, New York. Li, G., Van Bever, G., Oja, H., Sabolova, R. & Critchley, F. (216). Functional independent component analysis : an extension of fourth-order blind identification. Submitted. Miettinen, J., Nordhausen, K., Oja, H. and Taskinen, S. (214). Fourth moments and independent components. Statistical Science. Nordhausen, K., Oja, H. & Tyler, D.E. (28). Tools for exploring multivariate data : The package ICS. Journal of Statistical Software, 28, Peña, D., Prieto, J. and Rendón, C. (214). Independent components techniques based on kurtosis for functional data analysis. Working Paper 14-1, Statistics and Econometrics Series (6). Ramsay, J.D. and Silverman, B.W. (2). Functional Data Analysis. Springer, New York. Tyler, D. E. (21). A note on multivariate location and scatter statistics for sparse data sets Statistics and Probability Letters,8, Tyler, D. E., Critchley, F., Dümbgen, L. & Oja, H. (29). Invariant co-ordinate selection. Journal of the Royal Statistical Society, Series B, 71, Wang,J-L., Chiou,J-M. and Müller, H-G. (216). Functional Data Analysis. Annual Review of Statistics and Its Application, 3, Oja (UTU) FICA Date bottom 38 / 38

Invariant coordinate selection for multivariate data analysis - the package ICS

Invariant coordinate selection for multivariate data analysis - the package ICS Invariant coordinate selection for multivariate data analysis - the package ICS Klaus Nordhausen 1 Hannu Oja 1 David E. Tyler 2 1 Tampere School of Public Health University of Tampere 2 Department of Statistics

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

Scatter Matrices and Independent Component Analysis

Scatter Matrices and Independent Component Analysis AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 175 189 Scatter Matrices and Independent Component Analysis Hannu Oja 1, Seija Sirkiä 2, and Jan Eriksson 3 1 University of Tampere, Finland

More information

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we

More information

arxiv: v2 [stat.me] 31 Aug 2017

arxiv: v2 [stat.me] 31 Aug 2017 Asymptotic and bootstrap tests for subspace dimension Klaus Nordhausen 1,2, Hannu Oja 1, and David E. Tyler 3 arxiv:1611.04908v2 [stat.me] 31 Aug 2017 1 Department of Mathematics and Statistics, University

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:

More information

1. Introduction to Multivariate Analysis

1. Introduction to Multivariate Analysis 1. Introduction to Multivariate Analysis Isabel M. Rodrigues 1 / 44 1.1 Overview of multivariate methods and main objectives. WHY MULTIVARIATE ANALYSIS? Multivariate statistical analysis is concerned with

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5. 2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1 FUNCTIONAL DATA ANALYSIS Contribution to the International Handbook (Encyclopedia) of Statistical Sciences July 28, 2009 Hans-Georg Müller 1 Department of Statistics University of California, Davis One

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 59 CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 4. INTRODUCTION Weighted average-based fusion algorithms are one of the widely used fusion methods for multi-sensor data integration. These methods

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

SOME TOOLS FOR LINEAR DIMENSION REDUCTION

SOME TOOLS FOR LINEAR DIMENSION REDUCTION SOME TOOLS FOR LINEAR DIMENSION REDUCTION Joni Virta Master s thesis May 2014 DEPARTMENT OF MATHEMATICS AND STATISTICS UNIVERSITY OF TURKU UNIVERSITY OF TURKU Department of Mathematics and Statistics

More information

Linear Algebra and Dirac Notation, Pt. 3

Linear Algebra and Dirac Notation, Pt. 3 Linear Algebra and Dirac Notation, Pt. 3 PHYS 500 - Southern Illinois University February 1, 2017 PHYS 500 - Southern Illinois University Linear Algebra and Dirac Notation, Pt. 3 February 1, 2017 1 / 16

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

INVARIANT COORDINATE SELECTION

INVARIANT COORDINATE SELECTION INVARIANT COORDINATE SELECTION By David E. Tyler 1, Frank Critchley, Lutz Dümbgen 2, and Hannu Oja Rutgers University, Open University, University of Berne and University of Tampere SUMMARY A general method

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Characteristics of multivariate distributions and the invariant coordinate system

Characteristics of multivariate distributions and the invariant coordinate system Characteristics of multivariate distributions the invariant coordinate system Pauliina Ilmonen, Jaakko Nevalainen, Hannu Oja To cite this version: Pauliina Ilmonen, Jaakko Nevalainen, Hannu Oja. Characteristics

More information

A Note on Hilbertian Elliptically Contoured Distributions

A Note on Hilbertian Elliptically Contoured Distributions A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Independent Component Analysis and Its Application on Accelerator Physics

Independent Component Analysis and Its Application on Accelerator Physics Independent Component Analysis and Its Application on Accelerator Physics Xiaoying Pang LA-UR-12-20069 ICA and PCA Similarities: Blind source separation method (BSS) no model Observed signals are linear

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1:, Multivariate Location Contents , pauliina.ilmonen(a)aalto.fi Lectures on Mondays 12.15-14.00 (2.1. - 6.2., 20.2. - 27.3.), U147 (U5) Exercises

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Analysis Preliminary Exam Workshop: Hilbert Spaces

Analysis Preliminary Exam Workshop: Hilbert Spaces Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Ioannis Gkioulekas arvard SEAS Cambridge, MA 038 igkiou@seas.harvard.edu Todd Zickler arvard SEAS Cambridge, MA 038 zickler@seas.harvard.edu

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Invariant co-ordinate selection

Invariant co-ordinate selection J. R. Statist. Soc. B (2009) 71, Part 3, pp. 549 592 Invariant co-ordinate selection David E. Tyler, Rutgers University, Piscataway, USA Frank Critchley, The Open University, Milton Keynes, UK Lutz Dümbgen

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

arxiv: v2 [math.st] 4 Aug 2017

arxiv: v2 [math.st] 4 Aug 2017 Independent component analysis for tensor-valued data Joni Virta a,, Bing Li b, Klaus Nordhausen a,c, Hannu Oja a a Department of Mathematics and Statistics, University of Turku, 20014 Turku, Finland b

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

University of Colorado Denver Department of Mathematical and Statistical Sciences Applied Linear Algebra Ph.D. Preliminary Exam June 8, 2012

University of Colorado Denver Department of Mathematical and Statistical Sciences Applied Linear Algebra Ph.D. Preliminary Exam June 8, 2012 University of Colorado Denver Department of Mathematical and Statistical Sciences Applied Linear Algebra Ph.D. Preliminary Exam June 8, 2012 Name: Exam Rules: This is a closed book exam. Once the exam

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Data Mining and Exploration

Data Mining and Exploration Lecture Notes Data Mining and Exploration Michael Gutmann The University of Edinburgh Spring Semester 27 February 27, 27 Contents First steps in exploratory data analysis. Distribution of single variables......................

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

3.1. The probabilistic view of the principal component analysis.

3.1. The probabilistic view of the principal component analysis. 301 Chapter 3 Principal Components and Statistical Factor Models This chapter of introduces the principal component analysis (PCA), briefly reviews statistical factor models PCA is among the most popular

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains

More information

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA Last time: PCA Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2015 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

12.2 Dimensionality Reduction

12.2 Dimensionality Reduction 510 Chapter 12 of this dimensionality problem, regularization techniques such as SVD are almost always needed to perform the covariance matrix inversion. Because it appears to be a fundamental property

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

Fundamentals of Matrices

Fundamentals of Matrices Maschinelles Lernen II Fundamentals of Matrices Christoph Sawade/Niels Landwehr/Blaine Nelson Tobias Scheffer Matrix Examples Recap: Data Linear Model: f i x = w i T x Let X = x x n be the data matrix

More information

Linear Algebra - Part II

Linear Algebra - Part II Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

Hands-on Matrix Algebra Using R

Hands-on Matrix Algebra Using R Preface vii 1. R Preliminaries 1 1.1 Matrix Defined, Deeper Understanding Using Software.. 1 1.2 Introduction, Why R?.................... 2 1.3 Obtaining R.......................... 4 1.4 Reference Manuals

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Linear Algebra Formulas. Ben Lee

Linear Algebra Formulas. Ben Lee Linear Algebra Formulas Ben Lee January 27, 2016 Definitions and Terms Diagonal: Diagonal of matrix A is a collection of entries A ij where i = j. Diagonal Matrix: A matrix (usually square), where entries

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information