Canonical Correlation Analysis
|
|
- Sarah Preston
- 5 years ago
- Views:
Transcription
1 Chapter 7 Canonical Correlation Analysis 7.1 Introduction Let x be the p 1 vector of predictors, and partition x = (w T, y T ) T where w is m 1 and y is q 1 where m = p q q and m, q 2. Canonical correlation analysis (CCA) seeks m pairs of linear combinations (a T 1 w, b T 1 y),..., (a T mw, b T my) such that corr(a T i w, b T i y) is large under some constraints on the a i and b i where i = 1,..., m. The first pair (a T 1 w, bt 1 y) has the largest correlation. The next pair (a T 2 w, bt 2 y) has the largest correlation among all pairs uncorrelated with the first pair and the process continues so that (a T mw, b T my) is the pair with the largest correlation that is uncorrelated with the first m 1 pairs. The correlations are called canonical correlations while the pairs of linear combinations are called canonical variables. Some notation is needed to explain CCA. Let the p p positive definite symmetric dispersion matrix ( ) Σ11 Σ Σ = 12. Σ 21 Σ 22 Let J = Σ 1/2 11 Σ 12 Σ 1/2 22. Let Σ a = Σ 1 11 Σ 12 Σ 1 Σ 1/2 11 Σ 12 Σ 1 22 Σ 21Σ 1/2 11, Σ b = Σ 1 22 Σ 21Σ 1 Σ 1/2 22 Σ 21 Σ 1 11 Σ 12Σ 1/2 22 Σ 21, Σ A = JJ T = 11 Σ 12 and Σ B = J T J = 22. Let e i and g i be sets of orthonormal eigenvectors, so e T i e i = 1, e T i e j = 0 for i j, g T i g i = 1 and g T i g j = 0 for i j. Let the e i be m 1 while the g i are q 1. Let Σ a have eigenvalue eigenvector pairs (λ 1, a 1 ),..., (λ m, a m ) where λ 1 λ 2 λ m. Let Σ A have eigenvalue eigenvector pairs (λ i, e i ) for i = 165
2 1,..., m. Let Σ b have eigenvalue eigenvector pairs (λ 1, b 1 ),..., (λ q, b q ). Let Σ B have eigenvalue eigenvector pairs (λ i, g i ) for i = 1,..., q. It can be shown that the m largest eigenvalues of the four matrices are the same. Hence λ i (Σ a ) = λ i (Σ A ) = λ i (Σ b ) = λ i (Σ B ) λ i for i = 1,..., m. It can be shown that a i = Σ 1/2 11 e i and b i = Σ 1/2 22 g i. The eigenvectors a i are not necessarily orthonormal and the eigenvectors b i are not necessarily orthonormal. Theorem 7.1. Assume the p p dispersion matrix Σ is positive definite. Assume Σ 11,Σ 22,Σ A,Σ a,σ B and Σ b are positive definite and that ˆΣ P cσ for some constant c > 0. Let d i be an eigenvector of the corresponding matrix. Hence d i = a i, b i, e i or g i. Let (ˆλ i, ˆd i ) be the ith eigenvalue eigenvector pair of ˆΣ γ. a) ˆΣ γ P Σ γ and ˆλ i (ˆΣ γ ) P λ i (Σ γ ) = λ i where γ = A, a, B or b. b) Σ γˆdi λ iˆdi P 0 and ˆΣ γ d i ˆλ i d i P 0. c) If the jth eigenvalue λ j is unique where j m, then the absolute value of the correlation of ˆd j with d j converges to 1 in probability: corr(ˆd j, d j ) P 1. Proof. a) ˆΣ P γ Σ γ since matrix multiplication is a continuous function of the relevant matrices and matrix inversion is a continuous function of a positive definite matrix. Then ˆλ i (ˆΣ γ ) P λ i since an eigenvalue is a continuous function of its associated matrix. b) Note that (Σ γ λ i I)ˆd i = [(Σ γ λ i I) (ˆΣ γ ˆλ i I)]ˆd i = o P (1)O P (1) P 0, and ˆΣ γ d i ˆλ i d i P Σ γ d i λ i d i = 0. c) If n is large, then ˆd i ˆd i,n is arbitrarily close to either d i or d i, and the result follows. Rule of thumb 7.1. To use CCA, assume the DD plot and subplots of the scatterplot matrix are linear. Want n > 10p for classical CCA and n > 20p for robust CCA that uses FCH, RFCH or RMVN. Also make the DD plot for the y variables and the DD plot for the z variables. Definition 7.1. Let the dispersion matrix be Cov(x) = Σx. Let (λ i, e i ) and (λ i, g i ) be the eigenvalue eigenvector pairs of Σ A and Σ B. The kth pair of population canonical variables is U k = a T k w = et k Σ 1/2 11 w and V k = b T k y = gt k Σ 1/2 22 y for k = 1,..., m. Then the population canonical correlations ρ k = corr(u k, V k ) 166
3 = λ k for k = 1,..., m. The vectors a k = Σ 1/2 11 e k and b k = Σ 1/2 22 g k are the kth canonical correlation coefficient vectors for w and y. Theorem 7.2. Johnson and Wichern (1988, p ): Let the dispersion matrix be Cov(x) = Σx. Then V (U k ) = V (V k ) = 1, Cov(C k, D j ) = corr(c k, D j ) = 0 for k j where C k = U k or C k = V k, and D j = U j or D j = V j and j, k = 1,..., m. That is, U k is uncorrelated with V j and U j for j k, and V k is uncorrelated with V j and U j for j k. The first pair of canonical variables is the pair of linear combinations (U, V ) having unit variances that maximizes corr(u, V ) and this maximum is corr(u 1, V 1 ) = ρ 1. The ith pair of canonical variables are the linear combinations (U, V ) with unit variances that maximize corr(u, V ) among all choices uncorrelated with the previous k 1 canonical variable pairs. Definition 7.2. Suppose standardized data z = (w T, y T ) T is used and the dispersion matrix is the correlation matrix Σ = ρ. Hence Σ ii = ρ ii for i = 1, 2. Let (λ i, e i ) and (λ i, g i ) be the eigenvalue eigenvector pairs of Σ A and Σ B. The kth pair of population canonical variables is U k = a T k w = et k Σ 1/2 11 w and V k = b T k y = gt k Σ 1/2 22 y for k = 1,..., m for k = 1,..., m. Then the population canonical correlations ρ k = corr(u k, V k ) = λ k for k = 1,..., m. Then Theorem 7.2 holds for the standardized data and the canonical correlations are unchanged by the standardization. Let ( ) ˆΣ ˆΣ = 11 ˆΣ 12. ˆΣ 21 ˆΣ22 Define estimators ˆΣ a, ˆΣ A, ˆΣ b and ˆΣ B in the same manner as their population analogs but using ˆΣ instead of Σ. For example, ˆΣ 1 a = ˆΣ ˆΣ ˆΣ 1 ˆΣ Let ˆΣ a have eigenvalue eigenvector pairs (ˆλ i, â i ), and let ˆΣ A have eigenvalue eigenvector pairs (ˆλ i, ê i ) for i = 1,..., m. Let ˆΣ b have eigenvalue eigenvector pairs (ˆλ 1, ˆb 1 ), and let ˆΣ B have eigenvalue eigenvector pairs (ˆλ i, ĝ i ) for i = 1,..., q. For these four matrices ˆλ 1 ˆλ 2 ˆλ m. Definition 7.3. Let ˆΣ = S if data x = (w T, y T ) T is used, and let ˆΣ = R if standardized data z = (w T, y T ) T is used. The kth pair of sample 167
4 canonical variables is Û k = â T k w = ê T k 1/2 ˆΣ 11 w and ˆV k = ˆb T k y = ĝ T 1/2 k ˆΣ 22 y for k = 1,..., m. Then the sample canonical correlations ˆρ k = corr(ûk, ˆV k ) 1/2 = ˆλk for k = 1,..., m. The vectors â k = ˆΣ 11 ê k and ˆb 1/2 k = ˆΣ 22 ĝ k are the kth sample canonical correlation vectors for w and y. Theorem 7.3. Under the conditions of Definition 7.3, the first pair of canonical variables (Û1, ˆV 1 ) is the pair of linear combinations (Û, ˆV ) having unit sample variances that maximizes the sample correlation corr(û, ˆV ) and this maximum is corr(û1, ˆV 1 ) = ˆρ 1. The ith pair of canonical variables are the linear combinations (Û, ˆV ) with unit sample variances that maximize the sample corr(û, ˆV ) among all choices uncorrelated with the previous k 1 canonical variable pairs. 7.2 Robust CCA The R function cancor does classical CCA and the mpack function rcancor does robust CCA by applying cancor on the RMVN set: the subset of the data used to compute RMVN. Some theory is simple: the FCH, RFCH and RMVN methods of RCCA produce consistent estimators of the kth canonical correlation ρ k on a large class of elliptically contoured distributions. P To see this, suppose Cov(x) = cxσ and C C(X) cσ where c x > 0 and c > 0 are some constants. Then C 1 XX C XY C 1 Σ 1 XX Σ XY Σ 1 Y Y Σ Y X, and C 1 Y Y C Y XC 1 XX C XY P Y Y C Y X P Σ A = Σ B = Σ 1 Y Y Σ Y XΣ 1 XX Σ XY. Note that Σ A and Σ B only depend on Σ and do not depend on the constants c or cx. (If C is also the classical covariance matrix applied to some subset of the data, then the correlation matrix G R C applied to the same subset satisfies G 1 XX G XY G 1 Y Y G P Y X R A = R 1 XX R XY R 1 Y Y R Y X, and G 1 Y Y G Y XG 1 XX G P XY R B = R 1 Y Y R Y XR 1 XX R XY.) Since eigenvalues are continuous functions of the associated matrix, and the FCH, RFCH and RMVN estimators are consistent estimators of c 1 Σ, c 2 Σ and c 3 Σ on a large class of elliptically contoured distributions, Theorem 168
5 7.1 holds, so these three RCCA methods and rcancor produce consistent estimators the kth canonical correlation ρ k on that class of distributions. Example 7.1. Example 2.2 describes the mussel data. Log transformation were taken on muscle mass M, shell width W and on the shell mass S. Then x contained the two log mass measurements while y contains L, H and log(w). The robust and classical CCAs were similar, but the canonical coefficients were difficult to interpret since log(w) has different units than L and H. Hence the log transformation were taken on all five variables and output is shown below. The data set zm contains x and y, and the DD plot showed case 48 was separated from the bulk of the data, but near the identity line. The DD plot for x showed two cases, 8 and 48, were separated from the bulk of the data. Also the plotted points did not cluster tightly about the identity line. The DD plot for y looked fine. The classical CCA produces output $cor, $xcoef and $ycoef. These are the canonical correlations, the a i and the b i. The labels for the RCCA are $out$cor, $out$xcoef and $out$ycoef. Note that the first correlation was about 0.98 while the second correlation was small. The RCCA is the CCA on the RMVN data set, which is contained in a compact ellipsoidal region. The variability of the truncated data set is less than that of the entire data set, hence expect the robust a i and b i to be larger in magnitude, ignoring sign, than that of the classical a i and b i, since the variance of each canonical variate is equal to one, and RCCA uses the truncated data. Note that a 1 was roughly proportional to log(s) while b 1 gave slightly higher weight for log(h) then log(w) and then log(l). Note that the five variables have high pairwise correlations, so log(m) was not important given that log(s) was in x. The second pair (a 2, b 2 ) might be ignored since the second canonical correlation was very low. > cancor(x,y) $cor [1] $xcoef [,1] [,2] S M
6 $ycoef [,1] [,2] [,3] L W H $xcenter S M $ycenter L W H > rcancor(x,y) $out $out$cor [1] $out$xcoef [,1] [,2] S M $out$ycoef [,1] [,2] [,3] L W H $out$xcenter S M $out$ycenter L W H
7 7.3 Summary 1) Let x be the p 1 vector of predictors, and partition x = (w T, y T ) T where w is m 1 and y is q 1 where m = p q q and m, q 2. Canonical correlation analysis (CCA) seeks m pairs of linear combinations (a T 1 w, b T 1 y),..., (a T mw, b T my) such that corr(a T i w, b T i y) is large under some constraints on the a i and b i where i = 1,..., m. The first pair (a T 1 w, bt 1 y) has the largest correlation. The next pair (a T 2 w, bt 2 y) has the largest correlation among all pairs uncorrelated with the first pair and the process continues so that (a T mw, b T my) is the pair with the largest correlation that is uncorrelated with the first m 1 pairs. The correlations are called canonical correlations while the pairs of linear combinations are called canonical variables. corr ˆρ 1 ˆρ 1 wcoef 2) R output is shown in symbols for the following table. w â 1 â m ycoef y ˆb1 ˆbm ˆbq 64) $out$cor [1] $out$ycoef $out$xcoef [,1] [,2] [,3] [,1] [,2] L S W M H ) Some notation is needed to explain CCA. Let the p p positive definite symmetric dispersion matrix ( ) Σ11 Σ Σ = 12. Σ 21 Σ 22 Let J = Σ 1/2 11 Σ 12 Σ 1/2 22. Let Σ a = Σ 1 11 Σ 12 Σ 1 Σ 1/2 11 Σ 12 Σ 1 22 Σ 21Σ 1/2 11, Σ b = Σ 1 22 Σ 21Σ 1 Σ 1/2 22 Σ 21 Σ 1 11 Σ 12Σ 1/2 22 Σ 21, Σ A = JJ T = 11 Σ 12 and Σ B = J T J = 22. Let e i and g i be sets of orthonormal eigenvectors, so e T i e i = 1, e T i e j = 0 for i j, g T i g i = 1 and g T i g j = 0 for i j. Let the e i be m 1 while the g i are q 1. Let Σ a have eigenvalue eigenvector pairs (λ 1, a 1 ),..., (λ m, a m ) where λ 1 λ 2 λ m. Let Σ A have eigenvalue eigenvector pairs (λ i, e i ) for i = 171
8 1,..., m. Let Σ b have eigenvalue eigenvector pairs (λ 1, b 1 ),..., (λ q, b q ). Let Σ B have eigenvalue eigenvector pairs (λ i, g i ) for i = 1,..., q. It can be shown that the m largest eigenvalues of the four matrices are the same. Hence λ i (Σ a ) = λ i (Σ A ) = λ i (Σ b ) = λ i (Σ B ) λ i for i = 1,..., m. It can be shown that a i = Σ 1/2 11 e i and b i = Σ 1/2 22 g i. The eigenvectors a i are not necessarily orthonormal and the eigenvectors b i are not necessarily orthonormal. Theorem 7.1. Assume the p p dispersion matrix Σ is positive definite. Assume Σ 11,Σ 22,Σ A,Σ a,σ B and Σ b are positive definite and that ˆΣ P cσ for some constant c > 0. Let d i be an eigenvector of the corresponding matrix. Hence d i = a i, b i, e i or g i. Let (ˆλ i, ˆd i ) be the ith eigenvalue eigenvector pair of ˆΣ γ. a) ˆΣ γ P Σ γ and ˆλ i (ˆΣ γ ) P λ i (Σ γ ) = λ i where γ = A, a, B or b. b) Σ γˆdi λ iˆdi P 0 and ˆΣ γ d i ˆλ i d i P 0. c) If the jth eigenvalue λ j is unique where j m, then the absolute value of the correlation of ˆd j with d j converges to 1 in probability: corr(ˆd j, d j ) P Complements Muirhead and Waternaux (1980) shows that if the population canonical correlations ρ k are distinct and if the underlying population distribution has a finite fourth moments, then the limiting joint distribution of n(ˆρ 2 k ρ2 k ) is multivariate normal where the ˆρ k are the classical sample canonical correlations and k = 1,..., p. If the data are iid from an elliptically contoured distribution with kurtosis 3κ, then the limiting joint distribution of n ˆρ 2 k ρ2 k 2ρ k (1 ρ 2 k ) for k = 1,..., p is N p (0, (κ + 1)I p ). Note that κ = 0 for multivariate normal data. Alkenani and Yu (2012), Zhang (2011) and Zhang, Olive and Ye (2012) develop robust CCA based on FCH, RFCH and RMVN. 7.5 Problems PROBLEMS WITH AN ASTERISK * ARE ESPECIALLY USE- 172
9 FUL Examine the R output in Example 7.1. a) What is the first canonical correlation ˆρ 1? b) What is â 1? c) What is ˆb 1? 7.2. The R output below is for a canonical correlation analysis on Venables and Ripley (2003) CPU data. The variables were syct = log(cycle time + 1), mmin = log(minimum main memory + 1), chmin = log(minimum number of channels + 1), chmax = log(maximum number of channels + 1), perf = log(published performance + 1) and estperf = 20/ (estimated performance+1). These six variables had a linear scatterplot matrix and DD plot and similar variances. Want to compare the two performance variables with the four remaining variables. a) What is the first canonical correlation ˆρ 1? b) What is â 1? c) What is ˆb 1? d) Interpret the second canonical variable U 2 = â T 2 w. > cancor(w,y) $cor [1] $xcoef 173
10 [,1] [,2] perf estperf $ycoef [,1] [,2] [,3] [,4] syct mmin chmin chmax Edited SAS output for SAS Institute (1985, p. 146) Fitness Club Data is given below for CCA. Three physiological and three exercise variables measured on 20 middle aged men at a fitness club. a) What is the first canonical correlation ˆρ 1? b) What is â 1? c) What is ˆb 1? Canonical Correlation
11 Raw Canonical Coefficients for the Physiological Variables PHYS1 PHYS2 PHYS3 weight waist pulse Raw Canonical Coefficients for the Exercise Variables Exer1 Exer2 Exer3 chinups situps jumps The output below is for a canonical correlations analysis on the R Seatbelts data set where y 1 = drivers = number of drivers killed or seriously injured, y 2 = front = number of front seat passengers killed or seriously injured, and y 3 = rear = number of back seat passengers killed or seriously injured, x 1 = kms = distance driven, x 2 = PetrolPrice = petrol price and x 3 = V ankilled = number of van drivers killed. The data consists of 192 monthly totals in Great Britain from January 1969 to December a) What is the first canonical correlation ˆρ 1? b) What is â 1? c) What is ˆb 1? d) Let z = (x T, y T ) T. The from the DD plot, the z i appeared to follow a multivariate normal distribution. Sketch the DD plot. > rcancor(x,y) $out $out$cor 175
12 [1] $out$xcoef [,1] [,2] [,3] x.kms e e-06 x.petrolprice e e+00 x.vankilled e e-02 $out$ycoef [,1] [,2] [,3] y.drivers e e y.front e e y.rear e e The R output below is for a canonical correlation analysis on some iris data. An iris is a flower, and there were 50 observations with 4 variables sepal length, sepal width, petal length and petal width. a) What is the first canonical correlation ˆρ 1? b) What is â 1? c) What is ˆb 1? w<-iris3[,,3] x <- w[,1:2] y <- w[,3:4] cancor(x,y) $cor [1] $xcoef [,1] [,2] Sepal L
13 Sepal W $ycoef [,1] [,2] Petal L Petal W
Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus
Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n
More informationRobust Covariance Matrix Estimation With Canonical Correlation Analysis
Robust Covariance Matrix Estimation With Canonical Correlation Analysis Abstract This paper gives three easily computed highly outlier resistant robust n consistent estimators of multivariate location
More informationCanonical Correlation Analysis of Longitudinal Data
Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate
More information6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationCanonical Correlations
Canonical Correlations Like Principal Components Analysis, Canonical Correlation Analysis looks for interesting linear combinations of multivariate observations. In Canonical Correlation Analysis, a multivariate
More informationAPPLICATIONS OF A ROBUST DISPERSION ESTIMATOR. Jianfeng Zhang. Doctor of Philosophy in Mathematics, Southern Illinois University, 2011
APPLICATIONS OF A ROBUST DISPERSION ESTIMATOR by Jianfeng Zhang Doctor of Philosophy in Mathematics, Southern Illinois University, 2011 A Research Dissertation Submitted in Partial Fulfillment of the Requirements
More informationDefinition 5.1: Rousseeuw and Van Driessen (1999). The DD plot is a plot of the classical Mahalanobis distances MD i versus robust Mahalanobis
Chapter 5 DD Plots and Prediction Regions 5. DD Plots A basic way of designing a graphical display is to arrange for reference situations to correspond to straight lines in the plot. Chambers, Cleveland,
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis
MS-E2112 Multivariate Statistical (5cr) Lecture 8: Contents Canonical correlation analysis involves partition of variables into two vectors x and y. The aim is to find linear combinations α T x and β
More informationSTAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment
STAT 5 Assignment Name Spring Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Examine the matrix properties
More informationSTAT 501 Assignment 1 Name Spring 2005
STAT 50 Assignment Name Spring 005 Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Written Assignment: Due
More informationTechniques and Applications of Multivariate Analysis
Techniques and Applications of Multivariate Analysis Department of Statistics Professor Yong-Seok Choi E-mail: yschoi@pusan.ac.kr Home : yschoi.pusan.ac.kr Contents Multivariate Statistics (I) in Spring
More informationIndependent component analysis for functional data
Independent component analysis for functional data Hannu Oja Department of Mathematics and Statistics University of Turku Version 12.8.216 August 216 Oja (UTU) FICA Date bottom 1 / 38 Outline 1 Probability
More informationPrincipal Components Analysis
Principal Components Analysis Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Mar-2017 Nathaniel E. Helwig (U of Minnesota) Principal
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationChapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix
Probability Theory Linear transformations A transformation is said to be linear if every single function in the transformation is a linear combination. Chapter 5 The multivariate normal distribution When
More information1. Introduction to Multivariate Analysis
1. Introduction to Multivariate Analysis Isabel M. Rodrigues 1 / 44 1.1 Overview of multivariate methods and main objectives. WHY MULTIVARIATE ANALYSIS? Multivariate statistical analysis is concerned with
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationLecture 4: Principal Component Analysis and Linear Dimension Reduction
Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:
More information4.1 Order Specification
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2009, Mr Ruey S Tsay Lecture 7: Structural Specification of VARMA Models continued 41 Order Specification Turn to data
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationANOVA: Analysis of Variance - Part I
ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.
More informationRandom vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.
Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just
More informationTAMS39 Lecture 10 Principal Component Analysis Factor Analysis
TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis
More informationCanonical Correlation Analysis with Kernels
Canonical Correlation Analysis with Kernels Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Computational Diagnostics Group Seminar 2003 Mar 10 1 Overview
More informationAn Introduction to Multivariate Methods
Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More informationStandardization and Singular Value Decomposition in Canonical Correlation Analysis
Standardization and Singular Value Decomposition in Canonical Correlation Analysis Melinda Borello Johanna Hardin, Advisor David Bachman, Reader Submitted to Pitzer College in Partial Fulfillment of the
More informationRadoslav Harman. Department of Applied Mathematics and Statistics
Multivariate Statistical Analysis Selected lecture notes for the students of Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava Radoslav Harman Department of Applied Mathematics
More informationRobust Multivariate Analysis
Robust Multivariate Analysis David J. Olive Southern Illinois University Department of Mathematics Mailcode 4408 Carbondale, IL 62901-4408 dolive@siu.edu February 1, 2013 Contents Preface vi 1 Introduction
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationSTAT 730 Chapter 1 Background
STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationPackage yacca. September 11, 2018
Package yacca September 11, 2018 Type Package Title Yet Another Canonical Correlation Analysis Package Version 1.1.1 Date 2009-01-09 Depends R (>= 1.8.0), utils Author Carter T. Butts
More informationFINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3
FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More information1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.
Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline
More informationMULTIVARIATE HOMEWORK #5
MULTIVARIATE HOMEWORK #5 Fisher s dataset on differentiating species of Iris based on measurements on four morphological characters (i.e. sepal length, sepal width, petal length, and petal width) was subjected
More informationPrincipal Components. Summary. Sample StatFolio: pca.sgp
Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationRobust scale estimation with extensions
Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance
More informationStatistical Inference with Monotone Incomplete Multivariate Normal Data
Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful
More informationILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS
ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS W. T. Federer, C. E. McCulloch and N. J. Miles-McDermott Biometrics Unit, Cornell University, Ithaca, New York 14853-7801 BU-901-MA December 1986
More informationP (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n
JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are
More informationIn most cases, a plot of d (j) = {d (j) 2 } against {χ p 2 (1-q j )} is preferable since there is less piling up
THE UNIVERSITY OF MINNESOTA Statistics 5401 September 17, 2005 Chi-Squared Q-Q plots to Assess Multivariate Normality Suppose x 1, x 2,..., x n is a random sample from a p-dimensional multivariate distribution
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More information7. The Multivariate Normal Distribution
of 5 7/6/2009 5:56 AM Virtual Laboratories > 5. Special Distributions > 2 3 4 5 6 7 8 9 0 2 3 4 5 7. The Multivariate Normal Distribution The Bivariate Normal Distribution Definition Suppose that U and
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationMultivariate Analysis Homework 1
Multivariate Analysis Homework A490970 Yi-Chen Zhang March 6, 08 4.. Consider a bivariate normal population with µ = 0, µ =, σ =, σ =, and ρ = 0.5. a Write out the bivariate normal density. b Write out
More informationExercises * on Principal Component Analysis
Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide
More informationIntroduction. x 1 x 2. x n. y 1
This article, an update to an original article by R. L. Malacarne, performs a canonical correlation analysis on financial data of country - specific Exchange Traded Funds (ETFs) to analyze the relationship
More informationIEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion
IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there
More informationPrincipal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors
Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationREVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)
REVIEW OF MAIN CONCEPTS AND FORMULAS Boolean algebra of events (subsets of a sample space) DeMorgan s formula: A B = Ā B A B = Ā B The notion of conditional probability, and of mutual independence of two
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationSolutions to Homework Set #6 (Prepared by Lele Wang)
Solutions to Homework Set #6 (Prepared by Lele Wang) Gaussian random vector Given a Gaussian random vector X N (µ, Σ), where µ ( 5 ) T and 0 Σ 4 0 0 0 9 (a) Find the pdfs of i X, ii X + X 3, iii X + X
More informationUncorrelatedness and Independence
Uncorrelatedness and Independence Uncorrelatedness:Two r.v. x and y are uncorrelated if C xy = E[(x m x )(y m y ) T ] = 0 or equivalently R xy = E[xy T ] = E[x]E[y T ] = m x m T y White random vector:this
More informationExercise Sheet 1.
Exercise Sheet 1 You can download my lecture and exercise sheets at the address http://sami.hust.edu.vn/giang-vien/?name=huynt 1) Let A, B be sets. What does the statement "A is not a subset of B " mean?
More information3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.
3d scatterplots You can also make 3d scatterplots, although these are less common than scatterplot matrices. > library(scatterplot3d) > y par(mfrow=c(2,2)) > scatterplot3d(y,highlight.3d=t,angle=20)
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationRobust Maximum Association Between Data Sets: The R Package ccapp
Robust Maximum Association Between Data Sets: The R Package ccapp Andreas Alfons Erasmus Universiteit Rotterdam Christophe Croux KU Leuven Peter Filzmoser Vienna University of Technology Abstract This
More informationGaussian random variables inr n
Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b
More informationP = A(A T A) 1 A T. A Om (m n)
Chapter 4: Orthogonality 4.. Projections Proposition. Let A be a matrix. Then N(A T A) N(A). Proof. If Ax, then of course A T Ax. Conversely, if A T Ax, then so Ax also. x (A T Ax) x T A T Ax (Ax) T Ax
More informationCh.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4]
Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4] With 2 sets of variables {x i } and {y j }, canonical correlation analysis (CCA), first introduced by Hotelling (1936), finds the linear modes
More informationRobust Multivariate Analysis
Robust Multivariate Analysis David J. Olive Robust Multivariate Analysis 123 David J. Olive Department of Mathematics Southern Illinois University Carbondale, IL USA ISBN 978-3-319-68251-8 ISBN 978-3-319-68253-2
More informationIntroduction to Probability and Stocastic Processes - Part I
Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R
More informationBIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84
Chapter 2 84 Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random variables. For instance, X = X 1 X 2.
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationMultivariate probability distributions and linear regression
Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,
More informationIndependent Component (IC) Models: New Extensions of the Multinormal Model
Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationSecond-Order Inference for Gaussian Random Curves
Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationUniversity of Michigan School of Public Health
University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationEconomics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.
Economics 573 Problem Set 5 Fall 00 Due: 4 October 00 1. In random sampling from any population with E(X) = and Var(X) =, show (using Chebyshev's inequality) that sample mean converges in probability to..
More informationHighest Density Region Prediction
Highest Density Region Prediction David J. Olive Southern Illinois University November 3, 2015 Abstract Practical large sample prediction regions for an m 1 future response vector y f, given x f and past
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationFactor Analysis. Summary. Sample StatFolio: factor analysis.sgp
Factor Analysis Summary... 1 Data Input... 3 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 9 Extraction Statistics... 10 Rotation Statistics... 11 D and 3D Scatterplots...
More informationComputer exercise 3: PCA, CCA and factors. Principal component analysis. Eigenvalues and eigenvectors
UPPSALA UNIVERSITY Department of Mathematics Måns Thulin Multivariate Methods Spring 2011 thulin@math.uu.se Computer exercise 3: PCA, CCA and factors In this computer exercise the following topics are
More informationProblem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51
Yi Lu Correlation and Covariance Yi Lu ECE 313 2/51 Definition Let X and Y be random variables with finite second moments. the correlation: E[XY ] Yi Lu ECE 313 3/51 Definition Let X and Y be random variables
More informationKey Algebraic Results in Linear Regression
Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in
More information7 Principal Components and Factor Analysis
7 Principal Components and actor nalysis 7.1 Principal Components a oal. Relationships between two variables can be graphically well captured in a meaningful way. or three variables this is also possible,
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationOutline Week 1 PCA Challenge. Introduction. Multivariate Statistical Analysis. Hung Chen
Introduction Multivariate Statistical Analysis Hung Chen Department of Mathematics https://ceiba.ntu.edu.tw/972multistat hchen@math.ntu.edu.tw, Old Math 106 2009.02.16 1 Outline 2 Week 1 3 PCA multivariate
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationPhysics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester
Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals
More informationA Peak to the World of Multivariate Statistical Analysis
A Peak to the World of Multivariate Statistical Analysis Real Contents Real Real Real Why is it important to know a bit about the theory behind the methods? Real 5 10 15 20 Real 10 15 20 Figure: Multivariate
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationStatistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }
Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v
More information