Principal component analysis

Similar documents
Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

Bootstrapping, Randomization, 2B-PLS

Discriminant Analysis

Regularized Discriminant Analysis and Reduced-Rank LDA

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Introduction to Machine Learning

DIMENSION REDUCTION AND CLUSTER ANALYSIS

Computation. For QDA we need to calculate: Lets first consider the case that

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Discriminant analysis and supervised classification

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Experimental Design and Data Analysis for Biologists

Statistical Tools for Multivariate Six Sigma. Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

MULTIVARIATE HOMEWORK #5

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

Classification: Linear Discriminant Analysis

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Multivariate Statistical Analysis

Multivariate Analysis of Variance

Chapter 7, continued: MANOVA

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

ANOVA, ANCOVA and MANOVA as sem

Statistical Analysis. G562 Geometric Morphometrics PC 2 PC 2 PC 3 PC 2 PC 1. Department of Geological Sciences Indiana University

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Generalized Linear Models (GLZ)

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Quiz #3 Research Hypotheses that Involve Comparing Non-Nested Models

Lecture 8: Classification

Multivariate analysis of variance and covariance

Group comparison test for independent samples

Motivating the Covariance Matrix

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principal Component Analysis (PCA) Theory, Practice, and Examples

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

M A N O V A. Multivariate ANOVA. Data

CHAPTER 2. Types of Effect size indices: An Overview of the Literature

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Machine Learning 2nd Edition

A User's Guide To Principal Components

Table of Contents. Multivariate methods. Introduction II. Introduction I

Dimensionality Reduction and Principal Components

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Vector Space Models. wine_spectral.r

Canonical Correlation & Principle Components Analysis

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

MULTIVARIATE ANALYSIS OF VARIANCE

Multivariate Statistical Analysis

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

Small sample size in high dimensional space - minimum distance based classification.

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

One-way ANOVA. Experimental Design. One-way ANOVA

Unconstrained Ordination

Logistic Regression: Regression with a Binary Dependent Variable

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

STAT 501 EXAM I NAME Spring 1999

Dimensionality Reduction and Principle Components

STAT 730 Chapter 5: Hypothesis Testing

Multivariate Linear Models

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

An Introduction to Multivariate Statistical Analysis

INFORMATION THEORY AND STATISTICS

Multivariate analysis of genetic data: exploring groups diversity

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University

Chemometrics: Classification of spectra

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

Lecture 5: Hypothesis tests for more than one sample

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

G562 Geometric Morphometrics. Statistical Tests. Department of Geological Sciences Indiana University. (c) 2012, P. David Polly

Machine Learning 11. week

General Principles Within-Cases Factors Only Within and Between. Within Cases ANOVA. Part One

Multivariate Regression (Chapter 10)

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

L11: Pattern recognition principles

L5: Quadratic classifiers

Overview of clustering analysis. Yuehua Cui

1 Principal Components Analysis

ECE 661: Homework 10 Fall 2014

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

6-1. Canonical Correlation Analysis

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Multivariate analysis of genetic data: an introduction

Course in Data Science

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis

Transcription:

Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance of eigenvectors and eigenvalues assume multivariate normality. Bootstrap tests assume only that sample is representative of the population. Can be used with multiple samples for exploration: Search for structure: e.g., how many groups? Not optimized for discovering group structure. Classical significance tests can t be used. If discover structure by exploring data, then can t test for significance.

Principal component analysis PC (.%).. PC (.%).. Scores on. Scores on. -. -. Scores on PC (.%) Scores on PC (.%) MANOVA: p <. But: data were sampled randomly from a single multivariate-normal i t lpopulation.

Multiple groups and multiple variables Suppose that: We have two or more groups (treatments, etc.) defined on extrinsic criteria. We wish to know whether and how we can discriminate groups on the basis of two or more measured variables. Things we might want to know: Can we discriminate the groups? If so, how well? How different are the groups? Are the groups significantly different? How do we assess significance in the presence of correlations among variables? Which variables are most important in discriminating the groups? Can group membership be predicted for unknown individuals? How good is the prediction?

Multiple groups and multiple variables These questions are answered using three related methods: () Discriminant function analysis (DFA): = Discriminant analysis (DA), = canonical variate analysis (CVA). Determines the linear combinations of variables that best discriminate groups. () Multivariate analysis of variance (MANOVA): Determines whether multivariate samples differ non-randomly (significantly). () Mahalanobis distance (D ): Measures distances in multivariate character space in the presence of correlations among variables. Developed independently by three mathematicians: Fisher (DFA) in England, Hotelling (MANOVA) in the United States, Mahalanobis (D ) in India. Due to differences in notation, underlying similarities not noticed for years. Now have a common matrix formulation.

Discriminant analysis Principal component analysis: Inherently a single-group procedure: Assumes that data represent a single homogeneous sample from a population. Can be used for multiple groups, but cannot take group structure into consideration. Often used to determine whether groups differ in terms of the variables used, but: Can t use grouping information even if it exists. Maximizes variance, regardless of its source. Not guaranteed to discriminate groups. Discriminant analysis: Explicitly a multiple-group procedure. Assumes that groups are known (correctly) before analysis, on the basis of extrinsic criteria. Optimizes i discrimination i i bt between the groups by one or more linear combinations of the variables (discriminant functions).

Discriminant analysis Q: How are the groups different, and which hihvariables ibl most contribute tibt to the differences? A: For k groups, find the k linear discriminant functions (axes, vectors, functions) that t maximally separate the k groups. Discriminant functions (DFs) are eigenvectors of the among-group variance (rather than total variance). Like PCs, discriminant functions: Are linear combinations of the original variables. Are specified by sets of eigenvector coefficients (weights). Can be rescaled as vector correlations. Allow interpretation of contributions of individual variables. Have corresponding eigenvalues. Specify the proportion of among-groupgroup variance (rather than total variance) accounted for by each DF. Can be estimated from either the covariance matrices (one per group) or the correlation matrices. Groups are assumed to have multivariate normal distributions with identical covariance matrices.

Discriminant analysis Example: groups, variables: Example: groups, variables: Original data Original data with % data ellipses X X X A B X res F from ANOVA of scor Angle of line from horizontal Line A: ANOVA F=. Line B: ANOVA F=. DF: ANOVA F=. Projection scor res.. -. - Projection scor res.. -. - Projection scor res... -. -. -. - - - -. Group Group Group

Discriminant analysis Example: groups, variables: Example: groups, variables: X Original data X X Original data with % data ellipses A B X F from ANOVA of score es Angle of line from horizontal. Line A: ANOVA F=.. Line B: ANOVA F=.. DF: ANOVA F=.. Projection sco ores. -. - -. - -. Projection sco ores. -. - -. - -. - Projection sco ores. -. - -. - -. Group Group Group

Discriminant analysis The discriminant functions are eigenvectors: For PCA, the eigenvectors are estimated from S, the covariance matrix, with accounts for the total variance of the sample. For DFA, the eigenvectors are estimated t from a matrix that t accounts for the among-group variance. For a single variable, a measure of among-group variation, scaled by within-group variation, is the ratio: s a s w Discriminant functions are eigenvectors of the matrix W = pooled within-group covariance matrix. B = among-group covariance matrix. Analogous to univariate measure. W B

Thus the DFA eigenvectors: Discriminant analysis Maximize the ratio of among-group variation to within-group variation. Optimize i discrimination i i among all groups simultaneously. l For any set of data, there exists one axis (the discriminant function, DF) for which projections of groups of individuals are maximally separated, as measured by ANOVA of the projections onto the axis. For groups: this DF completely accounts for group discrimination. For + groups, have series of orthogonal DFs: DF accounts for largest proportion of among-group variance. DF accounts for largest proportion of residual among-group group variance. Etc. DFs can be used as bases of a new coordinate system for plotting DFs can be used as bases of a new coordinate system for plotting scores of observations, and loadings of original variables.

Discriminant analysis Example: groups, variables: Original data Original data with % data ellipses X X X X Scores on DF (.%) - - Loadings on DF.... -. -. -. -. X X X X - - Scores on DF (.%) - - -.. Loadings on DF

Discriminant analysis Example: groups, variables: Oi Original i ldata Original i data with % data ellipses X X X X. (.%) Scores on DF - - Loadings on DF... -. -. -. X X X X X - -. - - - - Scores on DF (.%) - -.. Loadings on DF

Discriminant analysis Discriminant i i functions have no necessary relationship to principal components: PC axes DF axes PC axes DF axes X X X X X PC axes DF axes X X PC axes DF axes X

MANOVA Q: Are the groups significantly heterogeneous? A: Multivariate analysis of variance: General case of testing ti for significant ifi differences among a set of predefined groups (treatments), with multiple correlated variables. ANOVA: special case for one variable (univariate). Hotelling s T -test: special case of MANOVA for two groups. t-test: special univariate case for two groups.

MANOVA Discriminant i i functions are eigenvectors of the matrix: The eigenvalues of W B are,,, p. W B A general multivariate test statistic is Wilk s lambda: Commonly reported by statistical packages. Expression to determine significance is complicated. Wilk s lambda can be transformed to an F-statistic, but the expression for this is complicated, too. Several other test statistics are commonly reported by statistical packages: Varying terminology, varying assumptions. All reported with corresponding p-values. p j j

Mahalanobis distance Q: How to measure the distance between two groups? A: Depends on whether we want to take correlations among variables into consideration. If not, just measure the Euclidean distance between centroids. If so, must measure the Mahalanobis distance between centroids along - the covariance structure: D x- x S x - x Can also measure Mahalanobis distances between points. Euclidean distances Mahalanobis distances.. X.. X.. X X

Classifying unknowns into predetermined groups Context: t have k known groups of observations. Also have one or more unknown observations, assumed to be a member of one of the known groups. Task: to assign each unknown observation to one of the k groups. Procedure: Find Mahalanobis distance from the unknown observation to each of the centroids of the k groups. Assign the unknown to the closest group. Can be randomized: Bootstrap the known observations by sampling within groups, with ihreplacement. Assign the unknown observation to the closest group, based on distance from the observation to the group centroids. Repeat many times: gives the proportion of times the observation is assigned to each of the groups.

Classifying unknowns into predetermined groups Example: groups, variables, unknown, bootstrap iterations: X...... X X X..... Classification probabilities: Group :. Group :. Classification probabilities: Group :. Group :.

Assessing misclassification rates (probabilities) Would like to know how good the discriminant i i functions are. DFA involves finding axes of maximum discrimination for the data included in the analysis. Would like to know how well the procedure will generalize. Can t ttrust tmisclassification ifi rates based on the observations used in the analysis. Ideally, would like to have new, known data to assign to known groups based on the discriminant functions.

Assessing misclassification rates (probabilities) Alternately, can cross-validate: Divide all data into: () Calibration data set: used to find discriminant functions. () Test data set: used to test discriminant functions. Determine how well the DFs can correctly assign unknowns to their correct groups. Proportions of incorrect assignments are estimates of true misclassification rates. Problem: need all data to get the best estimates discriminant functions. Solution: cross-validate one observation at a time via the jackknife procedure.

Assessing misclassification probabilities Cross-validation via the jackknife ( leave-one-out ) procedure. Set one observation aside. Estimate discriminant functions from remaining observations. Classify the remaining i known observation using the discriminant i i functions. Repeat for all observations, leaving one out at a time. Example: groups, variables, observations/group:.. X... Scores on DF.. -. -..... - -. - -... X Scores on DF

Assessing misclassification probabilities Assign each individual, id in turn, to one of fthe known groups using the jackknife procedure. Bootstrap times. Misclassification rate: / = % Percentage of bootstrap replicates Observation Group Assigned to group Group Group....................