Lecture 3: Principal Components Analysis (PCA)

Similar documents
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 10, Principal Component Analysis

Pattern Recognition 2014 Support Vector Machines

Simple Linear Regression (single variable)

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

What is Statistical Learning?

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Mrs. Newgard. Lesson Plans. Geography. Grade 6 and 7

AP Statistics Notes Unit Two: The Normal Distributions

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Part 3 Introduction to statistical classification techniques

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Principal Components

Differentiation Applications 1: Related Rates

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

EASTERN ARIZONA COLLEGE Introduction to Statistics

Distributions, spatial statistics and a Bayesian perspective

5 th grade Common Core Standards

Chapter 3 Kinematics in Two Dimensions; Vectors

IAML: Support Vector Machines

Chapter 3: Cluster Analysis

Support-Vector Machines

Basics. Primary School learning about place value is often forgotten and can be reinforced at home.

NUMBERS, MATHEMATICS AND EQUATIONS

Resampling Methods. Chapter 5. Chapter 5 1 / 52

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

, which yields. where z1. and z2

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Determining the Accuracy of Modal Parameter Estimation Methods

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Lecture 8: Multiclass Classification (I)

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Figure 1a. A planar mechanism.

CS 109 Lecture 23 May 18th, 2016

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

Tree Structured Classifier

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

More Tutorial at

Functions. EXPLORE \g the Inverse of ao Exponential Function

Physics 212. Lecture 12. Today's Concept: Magnetic Force on moving charges. Physics 212 Lecture 12, Slide 1

Computational modeling techniques

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN. Junjiro Ogawa University of North Carolina

A Matrix Representation of Panel Data

Physics 2010 Motion with Constant Acceleration Experiment 1

Kinetic Model Completeness

Least Squares Optimal Filtering with Multirate Observations

AP Physics Kinematic Wrap Up

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Misc. ArcMap Stuff Andrew Phay

Comparing Several Means: ANOVA. Group Means and Grand Mean

CHAPTER 2 Algebraic Expressions and Fundamental Operations

SAMPLING DYNAMICAL SYSTEMS

1 Course Notes in Introductory Physics Jeffrey Seguritan

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Name: Period: Date: PERIODIC TABLE NOTES ADVANCED CHEMISTRY

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

1b) =.215 1c).080/.215 =.372

Determination of Static Orientation Using IMU Data Revision 1

The standards are taught in the following sequence.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

The blessing of dimensionality for kernel methods

SPH3U1 Lesson 06 Kinematics

Elements of Machine Intelligence - I

Name: Period: Date: PERIODIC TABLE NOTES HONORS CHEMISTRY

Thermodynamics Partial Outline of Topics

Physics 101 Math Review. Solutions

INSTRUMENTAL VARIABLES

Support Vector Machines and Flexible Discriminants

Math Foundations 20 Work Plan

Smoothing, penalized least squares and splines

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Lecture 6: Phase Space and Damped Oscillations

L a) Calculate the maximum allowable midspan deflection (w o ) critical under which the beam will slide off its support.

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

Math 10 - Exam 1 Topics

AP Physics. Summer Assignment 2012 Date. Name. F m = = + What is due the first day of school? a. T. b. = ( )( ) =

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture

Transcription:

Lecture 3: Principal Cmpnents Analysis (PCA) Reading: Sectins 6.3.1, 10.1, 10.2, 10.4 STATS 202: Data mining and analysis Jnathan Taylr, 9/28 Slide credits: Sergi Bacallad 1 / 24

The bias variance decmpsitin The inputs, x 1,..., x n are fixed, a test pint x 0 is als fixed. y i = f(x i ) + ε i ε i i.i.d, mean 0. A regressin methd fit t (x 1, y 1 ),..., (x n, y n ) prduces the estimate ˆf. Then, the Mean Squared Errr at x 0 satisfies: MSE(x 0 ) = E(y 0 ˆf(x 0 )) 2 = Var( ˆf(x 0 ))+[Bias( ˆf(x 0 ))] 2 +Var(ε). 2 / 24

The bias variance decmpsitin The inputs, x 1,..., x n are fixed, a test pint x 0 is als fixed. y i = f(x i ) + ε i ε i i.i.d, mean 0. A regressin methd fit t (x 1, y 1 ),..., (x n, y n ) prduces the estimate ˆf. Then, the Mean Squared Errr at x 0 satisfies: MSE(x 0 ) = E(y 0 ˆf(x 0 )) 2 = Var( ˆf(x 0 ))+[Bias( ˆf(x 0 ))] 2 +Var(ε). Bth variance and squared bias are always psitive, s t minimize the MSE, yu must reach a tradeff between bias and variance. 2 / 24

Squiggly f, high nise Linear f, high nise Squiggly f, lw nise 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 MSE Bias Var 2 5 10 20 Flexibility 2 5 10 20 Flexibility 2 5 10 20 Flexibility Figure 2.12 3 / 24

Classificatin prblems In a classificatin setting, the utput takes values in a discrete set. Fr example, if we are predicting the brand f a car based n a number f variables, the functin f takes values in the set {Frd, Tyta, Mercedes-Benz,... }. 4 / 24

Classificatin prblems In a classificatin setting, the utput takes values in a discrete set. Fr example, if we are predicting the brand f a car based n a number f variables, the functin f takes values in the set {Frd, Tyta, Mercedes-Benz,... }. The mdel: Y = f(x) + ε becmes insufficient, as X is nt necessarily real-valued. 4 / 24

Classificatin prblems In a classificatin setting, the utput takes values in a discrete set. Fr example, if we are predicting the brand f a car based n a number f variables, the functin f takes values in the set {Frd, Tyta, Mercedes-Benz,... }. The mdel: Y = f(x) + ε becmes insufficient, as X is nt necessarily real-valued. 4 / 24

Classificatin prblems In a classificatin setting, the utput takes values in a discrete set. Fr example, if we are predicting the brand f a car based n a number f variables, the functin f takes values in the set {Frd, Tyta, Mercedes-Benz,... }. We will use slightly different ntatin: P (X, Y ) : jint distributin f (X, Y ), P (Y X) : cnditinal distributin f Y given X, ŷ i : predictin fr x i. 4 / 24

Lss functin fr classificatin There are many ways t measure the errr f a classificatin predictin. One f the mst cmmn is the 0-1 lss: E(1(y 0 ŷ 0 )) 5 / 24

Lss functin fr classificatin There are many ways t measure the errr f a classificatin predictin. One f the mst cmmn is the 0-1 lss: E(1(y 0 ŷ 0 )) Like the MSE, this quantity can be estimated frm training and test data by taking a sample average: 1 n n 1(y i ŷ i ) i=1 5 / 24

Bayes classifier X1 X2 Figure 2.13 In practice, we never knw the jint prbability P. Hwever, we can assume that it exists. 6 / 24

Bayes classifier X1 X2 Figure 2.13 In practice, we never knw the jint prbability P. Hwever, we can assume that it exists. The Bayes classifier assigns: ŷ i = argmax j P (Y = j X = x i ) It can be shwn that this is the best classifier under the 0-1 lss. 6 / 24

Principal Cmpnents Analysis This is the mst ppular unsupervised prcedure ever. Invented by Karl Pearsn (1901). Develped by Harld Htelling (1933). 7 / 24

Principal Cmpnents Analysis This is the mst ppular unsupervised prcedure ever. Invented by Karl Pearsn (1901). Develped by Harld Htelling (1933). Stanfrd pride! 7 / 24

Principal Cmpnents Analysis This is the mst ppular unsupervised prcedure ever. Invented by Karl Pearsn (1901). Develped by Harld Htelling (1933). What des it d? It prvides a way t visualize high dimensinal data, summarizing the mst imprtant infrmatin. 7 / 24

What is PCA gd fr? Murder 50 150 250 10 20 30 40 5 10 15 50 100 200 300 Assault UrbanPp 30 40 50 60 70 80 90 5 10 15 10 20 30 40 30 50 70 90 Rape 8 / 24

What is PCA gd fr? 0.5 0.0 0.5 Secnd Principal Cmpnent 3 2 1 0 1 2 3 UrbanPp Hawaii Rhde Massachusetts Island Utah Califrnia New Jersey Cnnecticut Washingtn Clrad New Yrk Ohi Illinis Arizna Nevada Wiscnsin Minnesta Pennsylvania Oregn Rape Texas Kansas Oklahma Delaware Nebraska Missuri Iwa Indiana Michigan New Hampshire Flrida Idah Virginia New Mexic Maine Wyming Maryland rth Dakta Mntana Assault Suth Dakta Tennessee Kentucky Luisiana Arkansas Alabama Alaska Gergia VermntWest Virginia Murder Suth Carlina Nrth Carlina Mississippi 0.5 0.0 0.5 3 2 1 0 1 2 3 First Principal Cmpnent Figure 10.1 8 / 24

What is the first principal cmpnent? It is the vectr which passes the clsest t a clud f samples, in terms f squared Euclidean distance. Ad Spending 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 Ppulatin 9 / 24

i.e. The green directin minimizes the average squared length f the dtted lines. Ad Spending 5 10 15 20 25 30 2nd Principal Cmpnent 10 5 0 5 10 20 30 40 50 Ppulatin 20 10 0 10 20 1st Principal Cmpnent Figure 6.15 10 / 24

What des this lk like with 3 variables? The first tw principal cmpnents span a plane which is clsest t the data. First principal cmpnent Secnd principal cmpnent 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 Figure 10.2 11 / 24

A secnd interpretatin The prjectin nt the first principal cmpnent is the ne with the highest variance. Ad Spending 5 10 15 20 25 30 2nd Principal Cmpnent 10 5 0 5 10 20 30 40 50 Ppulatin 20 10 0 10 20 1st Principal Cmpnent Figure 6.15 12 / 24

Hw d we say this in math? Let X be a data matrix with n samples, and p variables. 13 / 24

Hw d we say this in math? Let X be a data matrix with n samples, and p variables. Frm each variable, we subtract the mean f the clumn; i.e. we center the variables. 13 / 24

Hw d we say this in math? Let X be a data matrix with n samples, and p variables. Frm each variable, we subtract the mean f the clumn; i.e. we center the variables. T find the first principal cmpnent φ 1 = (φ 11,..., φ p1 ), we slve the fllwing ptimizatin 13 / 24

Hw d we say this in math? Let X be a data matrix with n samples, and p variables. Frm each variable, we subtract the mean f the clumn; i.e. we center the variables. T find the first principal cmpnent φ 1 = (φ 11,..., φ p1 ), we slve the fllwing ptimizatin max φ 11,...,φ p1 1 n n i=1 2 p φ j1 x ij j=1 subject t p φ 2 j1 = 1. j=1 13 / 24

Hw d we say this in math? Let X be a data matrix with n samples, and p variables. Frm each variable, we subtract the mean f the clumn; i.e. we center the variables. T find the first principal cmpnent φ 1 = (φ 11,..., φ p1 ), we slve the fllwing ptimizatin max φ 11,...,φ p1 1 n n i=1 2 p φ j1 x ij j=1 subject t p φ 2 j1 = 1. Prjectin f the ith sample nt φ 1. Als knwn as the scre z i1 j=1 13 / 24

Hw d we say this in math? Let X be a data matrix with n samples, and p variables. Frm each variable, we subtract the mean f the clumn; i.e. we center the variables. T find the first principal cmpnent φ 1 = (φ 11,..., φ p1 ), we slve the fllwing ptimizatin max φ 11,...,φ p1 1 n n i=1 2 p φ j1 x ij j=1 subject t p φ 2 j1 = 1. j=1 Variance f the n samples prjected nt φ 1. 13 / 24

Hw d we say this in math? T find the secnd principal cmpnent φ 2 = (φ 12,..., φ p2 ), we slve the fllwing ptimizatin 14 / 24

Hw d we say this in math? T find the secnd principal cmpnent φ 2 = (φ 12,..., φ p2 ), we slve the fllwing ptimizatin subject t max φ 12,...,φ p2 1 n p φ 2 j2 = 1 j=1 2 n p φ j2 x ij i=1 and j=1 p φ j1 φ j2 = 0. j=1 14 / 24

Hw d we say this in math? T find the secnd principal cmpnent φ 2 = (φ 12,..., φ p2 ), we slve the fllwing ptimizatin subject t max φ 12,...,φ p2 1 n p φ 2 j2 = 1 j=1 2 n p φ j2 x ij i=1 and j=1 p φ j1 φ j2 = 0. j=1 First and secnd principal cmpnents must be rthgnal. 14 / 24

Hw d we say this in math? T find the secnd principal cmpnent φ 2 = (φ 12,..., φ p2 ), we slve the fllwing ptimizatin subject t max φ 12,...,φ p2 1 n p φ 2 j2 = 1 j=1 2 n p φ j2 x ij i=1 and j=1 p φ j1 φ j2 = 0. j=1 First and secnd principal cmpnents must be rthgnal. Equivalent t saying that the scres (z 11,..., z n1 ) and (z 12,..., z n2 ) are uncrrelated. 14 / 24

Slving the ptimizatin This ptimizatin is fundamental in linear algebra. It is satisfied by either: 15 / 24

Slving the ptimizatin This ptimizatin is fundamental in linear algebra. It is satisfied by either: The singular value decmpsitin (SVD) f X: X = UΣΦ T where the ith clumn f Φ is the ith principal cmpnent φ i, and the ith clumn f UΣ is the ith vectr f scres (z 1i,..., z ni ). 15 / 24

Slving the ptimizatin This ptimizatin is fundamental in linear algebra. It is satisfied by either: The singular value decmpsitin (SVD) f X: X = UΣΦ T where the ith clumn f Φ is the ith principal cmpnent φ i, and the ith clumn f UΣ is the ith vectr f scres (z 1i,..., z ni ). The eigendecmpsitin f X T X: X T X = ΦΣ 2 Φ T 15 / 24

PCA in practice: The biplt 0.5 0.0 0.5 Secnd Principal Cmpnent 3 2 1 0 1 2 3 UrbanPp Hawaii Rhde Massachusetts Island Utah Califrnia New Jersey Cnnecticut Washingtn Clrad New Yrk Ohi Illinis Arizna Nevada Wiscnsin Minnesta Pennsylvania Oregn Rape Texas Kansas Oklahma Delaware Nebraska Missuri Iwa Indiana Michigan New Hampshire Flrida Idah Virginia New Mexic Maine Wyming Maryland rth Dakta Mntana Assault Suth Dakta Tennessee Kentucky Luisiana Arkansas Alabama Alaska Gergia VermntWest Virginia Murder Suth Carlina Nrth Carlina Mississippi 0.5 0.0 0.5 3 2 1 0 1 2 3 First Principal Cmpnent Figure 10.1 16 / 24

Scaling the variables Mst f the time, we dn t care abut the abslute numerical value f a variable. 17 / 24

Scaling the variables Mst f the time, we dn t care abut the abslute numerical value f a variable. We care abut the value relative t the spread bserved in the sample. Befre PCA, in additin t centering each variable, we als multiply it times a cnstant t make its variance equal t 1. 17 / 24

Example: scaled vs. unscaled PCA Scaled 0.5 0.0 0.5 Unscaled 0.5 0.0 0.5 1.0 Secnd Principal Cmpnent 3 2 1 0 1 2 3 UrbanPp Rape Assault Murder 0.5 0.0 0.5 Secnd Principal Cmpnent 100 50 0 50 100 150 UrbanPp Rape Murder Assau 0.5 0.0 0.5 1.0 3 2 1 0 1 2 3 First Principal Cmpnent 100 50 0 50 100 150 First Principal Cmpnent Figure 10.3 18 / 24

Scaling the variables In special cases, we have variables measured in the same unit; e.g. gene expressin levels fr different genes. 19 / 24

Scaling the variables In special cases, we have variables measured in the same unit; e.g. gene expressin levels fr different genes. Therefre, we care abut the abslute value f the variables and we can perfrm PCA withut scaling. 19 / 24

Hw many principal cmpnents are enugh? Murder 50 150 250 10 20 30 40 5 10 15 50 100 200 300 Assault UrbanPp 30 40 50 60 70 80 90 5 10 15 10 20 30 40 30 50 70 90 Rape 20 / 24

Hw many principal cmpnents are enugh? 0.5 0.0 0.5 Secnd Principal Cmpnent 3 2 1 0 1 2 3 UrbanPp Hawaii Rhde Massachusetts Island Utah Califrnia New Jersey Cnnecticut Washingtn Clrad New Yrk Ohi Illinis Arizna Nevada Wiscnsin Minnesta Pennsylvania Oregn Rape Texas Kansas Oklahma Delaware Nebraska Missuri Iwa Indiana Michigan New Hampshire Flrida Idah Virginia New Mexic Maine Wyming Maryland rth Dakta Mntana Assault Suth Dakta Tennessee Kentucky Luisiana Arkansas Alabama Alaska Gergia VermntWest Virginia Murder Suth Carlina Nrth Carlina Mississippi 0.5 0.0 0.5 3 2 1 0 1 2 3 First Principal Cmpnent We said 2 principal cmpnents capture mst f the relevant infrmatin. But hw can we tell? 20 / 24

The prprtin f variance explained We can think f the tp principal cmpnents as directins in space in which the data vary the mst. 21 / 24

The prprtin f variance explained We can think f the tp principal cmpnents as directins in space in which the data vary the mst. The ith scre vectr (z 1i,..., z ni ) can be interpreted as a new variable. The variance f this variable decreases as we take i frm 1 t p. 21 / 24

The prprtin f variance explained We can think f the tp principal cmpnents as directins in space in which the data vary the mst. The ith scre vectr (z 1i,..., z ni ) can be interpreted as a new variable. The variance f this variable decreases as we take i frm 1 t p. Hwever, the ttal variance f the scre vectrs is the same as the ttal variance f the riginal variables: p i=1 1 n n zji 2 = j=1 p Var(x k ). k=1 21 / 24

The prprtin f variance explained We can think f the tp principal cmpnents as directins in space in which the data vary the mst. The ith scre vectr (z 1i,..., z ni ) can be interpreted as a new variable. The variance f this variable decreases as we take i frm 1 t p. Hwever, the ttal variance f the scre vectrs is the same as the ttal variance f the riginal variables: p i=1 1 n n zji 2 = j=1 p Var(x k ). k=1 We can quantify hw much f the variance is captured by the first m principal cmpnents/scre variables. 21 / 24

The prprtin f variance explained The variance f the mth scre variable is: 1 n n i=1 z 2 im 22 / 24

The prprtin f variance explained The variance f the mth scre variable is: 1 n zim 2 = 1 n p φ jm x ij n n i=1 i=1 j=1 2 22 / 24

The prprtin f variance explained The variance f the mth scre variable is: 1 n zim 2 = 1 n p φ jm x ij n n i=1 i=1 j=1 2 = 1 n Σ2 mm. 22 / 24

The prprtin f variance explained The variance f the mth scre variable is: 1 n zim 2 = 1 n p φ jm x ij n n i=1 i=1 j=1 2 = 1 n Σ2 mm. Prp. Variance Explained 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Prp. Variance Explained 0.0 0.2 0.4 0.6 0.8 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Principal Cmpnent 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Principal Cmpnent 22 / 24

The prprtin f variance explained The variance f the mth scre variable is: 1 n zim 2 = 1 n p φ jm x ij n n i=1 i=1 j=1 2 = 1 n Σ2 mm. Prp. Variance Explained 0.0 0.2 0.4 0.6 0.8 1.0 Cumulative Prp. Variance Explained 0.0 0.2 0.4 0.6 0.8 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Principal Cmpnent Scree plt 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Principal Cmpnent 22 / 24

Generalizatins f PCA PCA wrks under a Euclidean gemetry in the space f variables. Often, the natural gemetry is different: 23 / 24

Generalizatins f PCA PCA wrks under a Euclidean gemetry in the space f variables. Often, the natural gemetry is different: We expect sme variables t be clser t each ther that t ther variables. 23 / 24

Generalizatins f PCA PCA wrks under a Euclidean gemetry in the space f variables. Often, the natural gemetry is different: We expect sme variables t be clser t each ther that t ther variables. Sme crrelatins between variables wuld be mre surprising than thers. 23 / 24

Generalizatins f PCA PCA wrks under a Euclidean gemetry in the space f variables. Often, the natural gemetry is different: We expect sme variables t be clser t each ther that t ther variables. Sme crrelatins between variables wuld be mre surprising than thers. Examples: Variables are pixel values, samples are different images f the brain. We expect neighbring pixels t have strnger crrelatins. 23 / 24

Generalizatins f PCA PCA wrks under a Euclidean gemetry in the space f variables. Often, the natural gemetry is different: We expect sme variables t be clser t each ther that t ther variables. Sme crrelatins between variables wuld be mre surprising than thers. Examples: Variables are pixel values, samples are different images f the brain. We expect neighbring pixels t have strnger crrelatins. Variables are rainfall measurements at different regins. We expect neighbring regins t have higher crrelatins. 23 / 24

Generalizatins f PCA There are ways t include this knwledge in a PCA. See: 1. Susan Hlmes. Multivariate Analysis, the French way. (2006). 2. Omar de la Cruz and Susan Hlmes. An intrductin t the duality diagram. (2011). 3. Stéphane Dray and Thibaut Jmbart. Revisiting Guerry s data: Intrducing spatial cnstraints in multivariate analysis. (2011). 4. Genevera Allen, Lgan Grsenick, and Jnathan Taylr. A Generalized Least Squares Matrix Decmpsitin. (2011). 24 / 24