Lecture 10, Principal Component Analysis

Similar documents
Principal Components

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

A Matrix Representation of Panel Data

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

Lecture 8: Multiclass Classification (I)

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Lecture 3: Principal Components Analysis (PCA)

Computational modeling techniques

IN a recent article, Geary [1972] discussed the merit of taking first differences

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Comparing Several Means: ANOVA. Group Means and Grand Mean

Chapter 3: Cluster Analysis

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

Equilibrium of Stress

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Simple Linear Regression (single variable)

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

5 th grade Common Core Standards

Lecture 11: Regression Methods I (Linear Regression)

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Lecture 11: Regression Methods I (Linear Regression)

CHAPTER 1. Learning Objectives

Computational modeling techniques

Math Foundations 20 Work Plan

1 The limitations of Hartree Fock approximation

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Determining the Accuracy of Modal Parameter Estimation Methods

Distributions, spatial statistics and a Bayesian perspective

AP Statistics Notes Unit Two: The Normal Distributions

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Math 302 Learning Objectives

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

ELT COMMUNICATION THEORY

Pattern Recognition 2014 Support Vector Machines

Introduction to Smith Charts

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

SAMPLING DYNAMICAL SYSTEMS

THE TOPOLOGY OF SURFACE SKIN FRICTION AND VORTICITY FIELDS IN WALL-BOUNDED FLOWS

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

CMSC 425: Lecture 9 Basics of Skeletal Animation and Kinematics

ENGI 4430 Parametric Vector Functions Page 2-01

Physics 2010 Motion with Constant Acceleration Experiment 1

NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN. Junjiro Ogawa University of North Carolina

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

What is Statistical Learning?

ANSWER KEY FOR MATH 10 SAMPLE EXAMINATION. Instructions: If asked to label the axes please use real world (contextual) labels

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Chapter 9 Vector Differential Calculus, Grad, Div, Curl

A Simple Set of Test Matrices for Eigenvalue Programs*

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Chapter 2 GAUSS LAW Recommended Problems:

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Homology groups of disks with holes

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

Tree Structured Classifier

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Fundamental Concepts in Structural Plasticity

IAML: Support Vector Machines

NWACC Dept of Mathematics Dept Final Exam Review for Trig - Part 3 Trigonometry, 9th Edition; Lial, Hornsby, Schneider Fall 2008

Differentiation Applications 1: Related Rates

Chapters 29 and 35 Thermochemistry and Chemical Thermodynamics

Math Foundations 10 Work Plan

Transform Coding. coefficient vectors u = As. vectors u into decoded source vectors s = Bu. 2D Transform: Rotation by ϕ = 45 A = Transform Coding

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

GENESIS Structural Optimization for ANSYS Mechanical

Introduction to Quantitative Genetics II: Resemblance Between Relatives

A Comparison of Methods for Computing the Eigenvalues and Eigenvectors of a Real Symmetric Matrix. By Paul A. White and Robert R.

, which yields. where z1. and z2

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

NUMBERS, MATHEMATICS AND EQUATIONS

8. Issues with Bases

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Kinematic transformation of mechanical behavior Neville Hogan

Pipetting 101 Developed by BSU CityLab

Supplementary Course Notes Adding and Subtracting AC Voltages and Currents

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Chapter 16. Capacitance. Capacitance, cont. Parallel-Plate Capacitor, Example 1/20/2011. Electric Energy and Capacitance

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Module M3: Relative Motion

(1.1) V which contains charges. If a charge density ρ, is defined as the limit of the ratio of the charge contained. 0, and if a force density f

The standards are taught in the following sequence.

Thermodynamics and Equilibrium

AEC 874 (2007) Field Data Collection & Analysis in Developing Countries. VII. Data Analysis & Project Documentation

Three charges, all with a charge of 10 C are situated as shown (each grid line is separated by 1 meter).

Massachusetts Institute of Technology 2.71/2.710 Optics Spring 2014 Solution for HW2

Trigonometry, 8th ed; Lial, Hornsby, Schneider

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

Figure 1a. A planar mechanism.

Math 0310 Final Exam Review Problems

Smoothing, penalized least squares and splines

Transcription:

Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16

Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 2 / 16

Mtivatins Principal Cmpnent Analysis The principal cmpnent analysis (PCA) is cncerned with explaining the variance-cvariance structure f X = (X 1,, X p ) thrugh a few linear cmbinatins f these variables. Main purpses: data (dimensin) reductin interpretatin Easy t visualize Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 3 / 16

Principal Cmpnent Analysis Variance-Cvariance Matrix f Randm Vectr Define the randm vectr and its mean vectr X = (X 1,, X p ), µ = E(X) = (µ 1,, µ p ). The variance-cvariance matrix f X is the Σ = Cv(X) = E(X µ)(x µ), its ij-th entry σ ij = E(X i µ i )(X j µ j ) fr any 1 i j p. µ is the ppulatin mean Σ is the ppulatin variance-cvariance matrix. In practice, µ and Σ are unknwn and estimated frm the data. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 4 / 16

Principal Cmpnent Analysis Sample Variance-Cvariance Matrix Sample mean: X = 1 n X 1 n, X is the design matrix, and 1 n is the vectr f 1 f length n. (Unbiased) Sample variance-cvariance matrix S n = 1 n 1 X cx c = 1 n 1 n (X i X)(X i X), i=1 where X c the centered design matrix, and X i = (X i1,, X ip ) fr i = 1,, n. It is easy t shw that S n = 1 n 1 X (I n 1 n 1 n1 n)x. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 5 / 16

Principal Cmpnent Analysis Linear Cmbinatins f Inputs Cnsider the linear cmbinatins Z 1 = v 1X = v 11 X 1 + v 12 X 2 + + v 1p X p, Z 2 = v 2X = v 21 X 1 + v 22 X 2 + + v 2p X p, = Z p = v px = v p1 X 1 + v p2 X 2 + + v pp X p. Then Var(Z j ) = v jσv j, j = 1,, p. Cv(Z j, Z k ) = v jσv k, j k. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 6 / 16

What is PCA Principal Cmpnent Analysis Principal cmpnent analysis (PCA, Pearsn 1901) is a statistical prcedure that uses an rthgnal transfrmatin t cnvert a set f bservatins f crrelated variables int a set f linearly uncrrelated variables (called principal cmpnents) finds directins with maximum variability Principal cmpnents (PCs): PCs are uncrrelated, rthgnal, linear cmbinatins Z 1,, Z p whse variances are as large as pssible. PCs frm a new crdinate system by rtating the riginal system cnstructed by X 1,, X p Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 7 / 16

Principal Cmpnent Analysis Ð Ñ ÒØ Ó ËØ Ø Ø Ð Ä ÖÒ Ò À Ø Ì Ö Ò ² Ö Ñ Ò ¾¼¼½ ÔØ Ö -4-2 0 2 4-4 -2 0 2 4 Largest Principal Cmpnent Smallest Principal Cmpnent ÈË Ö Ö ÔÐ Ñ ÒØ ½ ¾ ÙÖ º ÈÖ Ò Ô Ð ÓÑÔÓÒ ÒØ Ó ÓÑ ÒÔÙØ Ø ÔÓ ÒØ º Ì Ð Ö Ø ÔÖ Ò Ô Ð ÓÑÔÓÒ ÒØ Ø Ö ¹ Ø ÓÒ Ø Ø Ñ Ü Ñ Þ Ø Ú Ö Ò Ó Ø ÔÖÓ Ø Ø Ò Ø Ñ ÐÐ Ø ÔÖ Ò Ô Ð ÓÑÔÓÒ ÒØ Ñ Ò Ñ Þ Ø Ø Ú Ö Ò º Ê Ö Ö ÓÒ ÔÖÓ Ø Ý ÓÒØÓ Ø Óѹ ÔÓÒ ÒØ Ò Ø Ò Ö Ò Ø Ó Æ ÒØ Ó Ø ÐÓÛ¹ Ú Ö Ò ÓÑÔÓÒ ÒØ ÑÓÖ Ø Ò Ø ¹Ú Ö Ò Óѹ ÔÓÒ ÒØ º Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 8 / 16

Principal Cmpnent Analysis Mathematical Frmulatin The prcedure seeks the directin f high variances: The first PC = linear cmbinatin Z 1 = v 1 X that maximizes Var(v 1 X) subject t v 1 = 1. The secnd PC = linear cmbinatin Z 2 = v 2 X that maximizes Var(v 2 X) subject t v 2 = 1 and Cv(v 1 X, v 2 X) = 0 The jth PC satisfies max Var(v jx) subject t v j = 1, where j = 2,, p. Cv(v l X, v jx v j ) = v l Σv j = 0, fr l = 1,..., j 1, Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 9 / 16

Principal Cmpnent Analysis Interpretatin f PCA Z 1 = v 1 X has the largest sample variance amng all nrmalized linear cmbinatins f the clumns f X. Z 2 = v 2 X has the highest variance amng all nrmalized liner cmbinatins f the clumns f X, satisfying v 2 rthgnal t v 1.... The last PC Z p = v px has the minimum variance amng all nrmalized linear cmbinatins f the clumns f X, subject t v p being rthgnal t the earlier nes. If Σ is unknwn, we use S n as its estimatr. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 10 / 16

Hw t Slve PCs Principal Cmpnent Analysis There are tw ways: eigen-decmpsitin f Σ singular value decmpsitin (SVD) f X c. Cmment: Efficient algrithms exist t calculate SVD f X withut cmputing X T X Cmputing SVD is nw the standard way t calculate PCA frm a data matrix Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 11 / 16

Principal Cmpnent Analysis Eigen-Decmpsitin f Σ Assume Σ has p eigenvalue-eigenvectr pairs (λ, e) satisfying: Σe j = λ j e j, j = 1, p, where λ 1 λ 2 λ p > 0 and e j = 1 fr all j. This gives the fllwing spectral decmpsitin Σ = p λ j e j e j. j=1 The jth PC is given by Z j = e jx and its variance is Var(Z j ) = e jσe j = λ j. The magnitude f e jk measures the imprtance f the kth variable t the jth PC, irrespective f the ther variables. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 12 / 16

Number f PCs Principal Cmpnent Analysis The ttal (ppulatin) variance f inputs p p Var(X j ) = σ jj = j=1 j=1 p λ j = j=1 p Var(Z j ). j=1 Prprtin f ttal variance due t the jth PC The number f PCs are decided based n λ j p k=1 λ k. the amunt f ttal sample variance explained, the variances f the sample PC, and the subject-matter interpretatins the scree plt plt the rdered eigenvalues λ 1,, λ p and lk fr the elbw (bend) in the plt. The number f PCs is the pint where the remaining eigenvalues are relatively small and all abut the same size. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 13 / 16

Principal Cmpnent Analysis Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 14 / 16

Wide Applicatins Principal Cmpnent Analysis PCA is very useful in explratry data analysis. prvide a simpler and mre parsimnius descriptin f the cvariance structure dimensin reductin visualizatin fr high-dimensinal data Applicatins: in signal prcessing, called discrete KLT transfrm in linear algebra, called eigenvalue decmpsitin (EVD) f X T X. Glub and Van Lan (1983), called singular value decmpsitin (SVD) f X. in nise and vibratin, called spectral decmpsitin. Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 15 / 16

Further Remarks Principal Cmpnent Analysis Remarks: PCs are slely determined by the cvariance matrix Σ. The PCA analysis des nt require a multivariate nrmal distributin. Cncerns: unsupervised learning ignre the respnse Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 16 / 16