UCLA STAT 233 Statistical Methods in Biomedical Imaging

Similar documents
2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Chapter 4: Factor Analysis

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Principal component analysis

Data Preprocessing Tasks

Introduction to Machine Learning

Exploratory Factor Analysis and Principal Component Analysis

Introduction to Factor Analysis

Or, in terms of basic measurement theory, we could model it as:

Dimensionality Reduction Techniques (DRT)

Exploratory Factor Analysis and Principal Component Analysis

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Components Analysis

Inter Item Correlation Matrix (R )

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:

Principal Components Analysis. Sargur Srihari University at Buffalo

Unconstrained Ordination

Introduction to Factor Analysis

Dimensionality of Hierarchical

Machine Learning 2nd Edition

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

PRINCIPAL COMPONENTS ANALYSIS (PCA)

Applied Multivariate Analysis

Quantitative Understanding in Biology Principal Components Analysis

Applied Multivariate Analysis

Machine Learning (Spring 2012) Principal Component Analysis

PRINCIPAL COMPONENTS ANALYSIS

Factor Analysis Continued. Psy 524 Ainsworth

Principles of factor analysis. Roger Watson

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Exploratory Factor Analysis and Canonical Correlation

DIMENSION REDUCTION AND CLUSTER ANALYSIS

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Introduction to Linear Regression

Machine learning for pervasive systems Classification in high-dimensional spaces

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Lecture: Face Recognition and Feature Reduction

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

CSE 554 Lecture 7: Alignment

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

9.1 Orthogonal factor model.

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

Dimension Reduction and Classification Using PCA and Factor. Overview

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Lecture: Face Recognition and Feature Reduction

Factor analysis. George Balabanis

appstats8.notebook October 11, 2016

What is Principal Component Analysis?

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

1 Principal Components Analysis

Principal Components Analysis (PCA)

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

e 2 e 1 (a) (b) (d) (c)

Machine Learning, Fall 2009: Midterm

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

Factor Analysis. -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD

Factor Analysis (1) Factor Analysis

7. Variable extraction and dimensionality reduction

Independent Component Analysis

How to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Penalized varimax. Abstract

STATISTICAL LEARNING SYSTEMS

Basics of Multivariate Modelling and Data Analysis

Dimensionality reduction

PRINCIPAL COMPONENT ANALYSIS

7 Principal Components and Factor Analysis

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Components Analysis using R Francis Huang / November 2, 2016

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Principal component analysis

The Principal Component Analysis

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA)

UCLA STAT 251. Statistical Methods for the Life and Health Sciences. Hypothesis Testing. Instructor: Ivo Dinov,

3.1. The probabilistic view of the principal component analysis.

Principal Component Analysis

L11: Pattern recognition principles

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

Motivating the Covariance Matrix

ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Dimensionality Reduction

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Transcription:

UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/ Principle Component Analysis () Introduction to Slide 1 Slide 2 - purpose The main applications of analytic techniques are: to reduce the number of variables and to detect structure in the relationships between variables, that is to classify variables. Therefore, analysis is applied as a data reduction or structure detection method (the term factor analysis was first introduced by Thurstone, 1931). The topics listed below will describe the principles of factor analysis, and how it can be applied for addressing these two goals. - Basics Suppose we conducted a (rather "silly") study in which we measure 100 people's height in inches and centimeters. Thus, we would have two variables that measure height. If in future studies, we want to research, for example, the effect of different nutritional food supplements on height, would we continue to use both measures? Probably not; height is one characteristic of a person, regardless of how it is measured, since there are measure-conversion rules! Slide 3 Slide 4 - Basics Let us now extrapolate from this trivial study to something that one might actually do as a researcher. Suppose we want to measure people's satisfaction with their lives. We design a satisfaction questionnaire with various items; among other things we ask our subjects how satisfied they are with their hobbies (item 1) and how intensely they are pursuing a hobby (item 2). Most likely, the responses to the two items are highly correlated with each other. Given a high correlation between the two items, we can conclude that they may be quite redundant. Slide 5 - Basics Combining Two Variables into a Single Factor. Correlations between two variables may be summarized in a scatterplot. A regression line can then be fitted that represents the "best" summary of the linear relationship between the variables. If we could define a variable that would approximate the regression line in such a plot, then that variable would capture most of the "essence" of the two items. Subjects' single scores on that new factor, represented by the regression line, could then be used in future data analyses to represent that essence of the two items. In a sense we have reduced the two variables to one factor. Note that the new factor is actually a linear combination of the two variables. Y Slide 6 X 1

- Basics Principal Components Analysis. The example described above, combining two correlated variables into one factor, illustrates the basic idea of factor analysis, or of principal components analysis to be precise.if we extend the twovariable example to multiple variables, then the computations become more involved, but the basic principle of expressing two or more variables by a single factor remains the same. - Basics Computational aspects of principal components analysis: Study of the extraction of principal components amounts to a variance maximizing (varimax) rotation of the original variable space. For example, in a scatterplot we can think of the regression line as the original X axis, rotated so that it approximates the regression line. This type of rotation is called variance maximizing because the criterion for (goal of) the rotation is to: maximize the variance (variability) of the "new" variable (factor), while minimizing the variance around the new variable. Slide 7 Slide 8 - Basics Generalizing to the Case of Multiple Variables. When there are more than two variables, we can think of them as defining a "space," just as two variables defined a plane. Thus, when we have three variables, we could plot a three- dimensional scatterplot, and, again we could fit a plane through the data. scatterplot example With more than 3 variables it becomes impossible to illustrate the points in a scatterplot, however, the logic of rotating the axes so as to maximize the variance of the new factor remains the same. But up to 3-variables we can use a scatterplot: Slide 9 Slide 10 - Basics Multiple orthogonal factors. After we have found the line on which the variance is maximal, there remains some variability around this line. In, after the first factor has been extracted, that is, after the first line has been drawn through the data, we iteratively continue to define others line that maximize the remaining variability. In this manner, consecutive factors are extracted. Because each consecutive factor is defined to maximize the variability that is not captured by the preceding factor, consecutive factors are independent of each other. Put another way, consecutive factors are uncorrelated or orthogonal to each other. - Basics How many Factors to Extract? Remember that, so far, we are considering as a data reduction method, that is, as a method for reducing the number of variables. The question then is, how many factors do we want to extract? Note that as we extract consecutive factors, they account for less and less variability. The decision of when to stop extracting factors basically depends on when there is only very little random variability left. The nature of this decision is arbitrary; however, various guidelines have been developed. Slide 11 Slide 12 2

- Basics Standard results from a analysis: We are extracting factors that account for less and less variance. To simplify matters, one usually starts with the correlation matrix, where the variances of all variables are equal to 1.0. Therefore, the total variance in that matrix is equal to the number of variables. For example, if we have 10 variables each with a variance of 1 then the total variability that can potentially be extracted is equal to 10 times 1. Suppose that in the life-satisfaction study introduced earlier we included 10 items to measure different aspects of satisfaction at home and at work. The variance accounted for by successive factors would be summarized as follows: Slide 13 - Example Extraction: Principal components Value Eigenvalue % Total Variance 1 2 3 4 5 6 7 8 9 10 6.118369 1.800682.472888.407996.317222.293300.195808.170431.137970.085334 61.18369 18.00682 4.72888 4.07996 3.17222 2.93300 1.95808 1.70431 1.37970 0.85334 Slide 14 Eigenvalue 6.11837 7.91905 8.39194 8.79993 9.11716 9.41046 9.60626 9.77670 9.91467 10.00000 % 61.1837 79.1905 83.9194 87.9993 91.1716 94.1046 96.0626 97.7670 99.1467 100.0000 - Eigenvalues Eigenvalues - In the second column (Eigenvalue) above, we find the variance on the new factors that were successively extracted. In column 3, these values are expressed as a percent of the total variance (in this example, 10). As we can see, factor 1 accounts for 61 percent of the variance, factor 2 for 18 percent, and so on. As expected, the sum of the eigenvalues is equal to the number of variables. The third column contains the cumulative variance extracted. The variances extracted by the factors are called the eigenvalues. This name derives from the computational issues involved. - Eigenvalues Eigenvalues and the Number-of-Factors Problem: Now that we have a measure of how much variance each successive factor extracts, we can return to the question of how many factors to retain. By its nature this is an arbitrary decision. However, there are some guidelines that are commonly used, and that, in practice, seem to yield the best results. Slide 15 Slide 16 How many Factors?!? The Kaiser criterion. First, we can retain only factors with eigenvalues greater than 1. In essence this is like saying that, unless a factor extracts at least as much as the equivalent of one original variable, we drop it. This criterion was proposed by Kaiser (1960), and is probably the one most widely used. In our example above, using this criterion, we would retain 2 factors (principal components). Value 1 2 3 Eigenvalue % 6.118369 1.800682.472888 Total Variance 61.18369 18.00682 4.72888 Slide 17 Eigenvalue 6.11837 7.91905 8.39194 61.1837 79.1905 83.9194 % How many Factors?!? The scree test. The scree test is a graphical method first proposed by Cattell (1966). We can plot the eigenvalues shown above in a simple line plot. Cattell s idea: Find a place where the smooth decrease of eigenvalues appears to level off to the right of the plot. To the right of this point, one expects to find only factorial scree scree is the geological term referring to the debris which collects on the lower part of a rocky slope. According to this criterion, we would probably retain 2 or 3 factors here. Slide 18 3

Which criterion to use? Both criteria have been studied in detail (Browne, 1968; Cattell & Jaspers, 1967; Hakstian, Rogers, & Cattell, 1982; Linn, 1968; Tucker, Koopman & Linn, 1969). It appears as if the first method (Kaiser criterion) sometimes retains too many factors, while the second technique (scree test) sometimes retains too few; Both do quite well under normal conditions, that is, when there are relatively few factors and many cases. In practice, an additional important aspect is the extent to which a solution is interpretable. Therefore, one usually examines several solutions with more or fewer factors, and chooses the one that makes the best "sense." PFA Principal Factors Analysis Principal Factors Analysis Recall our satisfaction questionnaire example. Let s slightly perturb this model for factor analysis. We can think of subjects' responses as being dependent on two components. First, there are some underlying common factors, such as the "satisfaction-with-hobbies" factor we looked at before. Each item measures some part of this common aspect of satisfaction. Second, each item also captures a unique aspect of satisfaction that is not addressed by any other item. Slide 19 Slide 20 vs. PFA Principal factors vs. principal components To distinguishes between the two factor analytic models is that: assumes that all variability in an item should be used in the analysis, PFA (principal factors analysis) we only use the variability in an item that it has in common with the other items. In most cases, these two methods usually yield very similar results. However, principal components analysis is often preferred as a method for data reduction, while principal factors analysis is often preferred when the goal of the analysis is to detect structure. Factor Analysis as a Classification Method We ll use the term factor analysis generically to encompass both & PFA. Assume now that we are at the point in our analysis where we basically know how many factors to extract. We may now want to know the meaning of the (composite) factors. How can we interpret them in a meaningful manner? Let s try to work backwards, that is, begin with a meaningful structure and then see how it is reflected in the results of a factor analysis. In our satisfaction example; here is the correlation matrix for items pertaining to satisfaction at work and items pertaining to satisfaction at home. Slide 21 Slide 22 Factor Analysis as a Classification Method Symmetry of the Covariance matrix? Why? Variable Work1 Work2 Work3 Home1 Home2 Home3 Work1 0.15 Work2 0.18 Work3 0.16 0.25 Home1 0.16 0.66 0.59 Home2 0.15 0.18 0.66 Home3 0.25 0.59 Factor Analysis as a Classification Method The work satisfaction items are highly correlated amongst themselves, and the home satisfaction items are highly inter-correlated amongst themselves, too. The correlations across these two types of items (work vs. home satisfaction items) is comparatively small. Are there only two relatively independent factors reflected in the correlation matrix? one related to satisfaction at work, the other related to satisfaction at home. Slide 23 Slide 24 4

analysis of the two-factors. Specifically, look at the correlations between the variables and the two factors (or derived variables), as they are extracted. Variable Factor 1 Factor 2 Work1 Work2 Work3 Home1 Home2 Home3 ExplainedVar ProportTotal 4384 0.715256 0.741688 0.634120 0.706267 0.707446 2.891313 0.481885 0.564143 0.541444 0.508212-0.563123-0.572658-0.525602 1.791000 0.298500 Apparently, the factor_1 is generally more highly correlated with the original variables than the factor_2. This is to be expected because, as previously described, these factors are extracted successively and will account for less and less variance overall. Slide 25 Slide 26 Interpreting the Factor Structure. As expected, the first factor is marked by high loadings on the work satisfaction items. Factor 2 is marked by high loadings on the home satisfaction items. We would thus conclude that satisfaction, as measured by our questionnaire, is composed of those two aspects; And there is our classification of the variables. Consider another example, this time with 4 additional Hobby/Misc variables added to our earlier example. Slide 27 E.g., The initial 10 variables were reduced to 3 specific factors, a work factor, a home factor and a hobby/misc. factor. Note that factor loadings for each factor are spread out over the values of the other two factors but are high for its own values. For example, the factor loadings for the hobby/misc variables (in green) have both high and low "work" and "home" values, but all 3 of these variables have high factor loadings on the "hobby/misc" factor. Slide 28 3 specific factors, work factor, a home factor and a hobby/misc. factor. If time permits do another example. SYSTAT. C:\Ivo.dir\Research\Data.dir\WM_GM_CSF_tissueMaps.dir\ ATLAS_IVO_all.xls : Data Reduction Factor Analysis Var s: Hemi. Tisse, Method, Value Slide 29 Slide 30 5

Practical Notes on computation Data:X = {x 1, x 2,, x n } Compute the standardized matrix, Z Z = {z 1, z 2,, z n }, where z k =(x k - µ k )/σ n, Compute the correlation matrix, R = Z T Z Compute the eigenvalues for R, R λi =0 Compute the eigenvectors, v k, for R, solve: R v k = λ k v k., Set V= {v 1, v 2,, v n } Practical Notes on computation Test: the orthogonality of the matrix V={v 1, v 2,, v n } V T V=I Let L=diag( λ 1, λ 2,, λ n ), Compute L ½ Compute the factor structure matrix, S, S=V L ½ Decide on the number of essential eigenvalues, scree test, w = n max_arg( λ (k) >=1). Compute the communality matrix, using only the first w eigenvectors from S, by the (order statistic), C = S S T. Slide 31 Slide 32 Practical Notes on computation Compute the communality matrix, using only the first w eigenvectors from S, by the (order statistic), C = S S T. Communality diagonal entries report how much of the variability of the data is explained by the 1-st (c 1,1 ), 1 & 2 (c 2,2 ), 1 & 2 & 3 (c 3,3 ), principle components. In the factor structure matrix, S, S=V L ½, the entry c i,j, shows the correlation between the i th variable and the j th principle component. Practical Notes on computation Compute the coefficient matrix, B = V L -½, Compute the Factor Scores F = Z B, the columns of F are the principle factors, these represent the loadings of the variables on each principle component, e.i., f i,j shows the loading of the i th variable on the j th principle component. Slide 33 Slide 34 Principal Component Analysis for Face Recognition Optimal representation with fewer basis functions We want to design a set of basis functions such that we can reconstruct the original image with smallest possible error with a given number of basis functions Slide 35 Slide 36 6

for Face Recognition First 20 (largest) principal components for Face Recognition Components with low eigenvalues Slide 37 Slide 38 for Face Recognition Original (right) vs. Approximations (below) Slide 39 7