Principal Components

Similar documents
T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

Lecture 10, Principal Component Analysis

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Smoothing, penalized least squares and splines

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

A Matrix Representation of Panel Data

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Math 302 Learning Objectives

Pattern Recognition 2014 Support Vector Machines

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN. Junjiro Ogawa University of North Carolina

Transform Coding. coefficient vectors u = As. vectors u into decoded source vectors s = Bu. 2D Transform: Rotation by ϕ = 45 A = Transform Coding

SAMPLING DYNAMICAL SYSTEMS

Lecture 3: Principal Components Analysis (PCA)

Chapter 3 Kinematics in Two Dimensions; Vectors

1 The limitations of Hartree Fock approximation

Computational modeling techniques

24 Multiple Eigenvectors; Latent Factor Analysis; Nearest Neighbors

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Physics 212. Lecture 12. Today's Concept: Magnetic Force on moving charges. Physics 212 Lecture 12, Slide 1

Versatility of Singular Value Decomposition (SVD) January 7, 2015

Kinematic transformation of mechanical behavior Neville Hogan

A Scalable Recurrent Neural Network Framework for Model-free

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Spring 2010 Instructor: Michele Merler.

Here is instructions on how to use the simulation program.(the first simulation is used in question 5)

Lecture 8: Multiclass Classification (I)

Distributions, spatial statistics and a Bayesian perspective

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Challenges in advanced imaging with radio interferometric telescopes. S.Bhatnagar NRAO, Socorro April 08, 2005

Trigonometry, 8th ed; Lial, Hornsby, Schneider

What is Statistical Learning?

Part a: Writing the nodal equations and solving for v o gives the magnitude and phase response: tan ( 0.25 )

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

A Simple Set of Test Matrices for Eigenvalue Programs*

ENGI 1313 Mechanics I

AP Statistics Notes Unit Two: The Normal Distributions

Lecture 5: Equilibrium and Oscillations

NWACC Dept of Mathematics Dept Final Exam Review for Trig - Part 3 Trigonometry, 9th Edition; Lial, Hornsby, Schneider Fall 2008

A Comparison of Methods for Computing the Eigenvalues and Eigenvectors of a Real Symmetric Matrix. By Paul A. White and Robert R.

ECE 2100 Circuit Analysis

5 th grade Common Core Standards

Support-Vector Machines

Tree Structured Classifier

Homology groups of disks with holes

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Linear Methods for Regression

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

Equilibrium of Stress

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

Margin Distribution and Learning Algorithms

IN a recent article, Geary [1972] discussed the merit of taking first differences

Rigid Body Dynamics (continued)

Introduction to Quantitative Genetics II: Resemblance Between Relatives

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)

Aircraft Performance - Drag

Copyright Paul Tobin 63

Chapter 14. Nanoscale Resolution in the Near and Far Field Intensity Profile of Optical Dipole Radiation

Chapter 9 Vector Differential Calculus, Grad, Div, Curl

Determining the Accuracy of Modal Parameter Estimation Methods

The Creation and Propagation of Radiation: Fields Inside and Outside of Sources

Concept Category 2. Trigonometry & The Unit Circle

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Homework 1 AERE355 Fall 2017 Due 9/1(F) NOTE: If your solution does not adhere to the format described in the syllabus, it will be grade as zero.

Finding the Earth s magnetic field

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

Q1. In figure 1, Q = 60 µc, q = 20 µc, a = 3.0 m, and b = 4.0 m. Calculate the total electric force on q due to the other 2 charges.

On Boussinesq's problem

NUMERICAL SIMULATION OF CHLORIDE DIFFUSION IN REINFORCED CONCRETE STRUCTURES WITH CRACKS

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

GAUSS' LAW E. A. surface

Predictive Coding. U n " S n

Efficient Methods of Clutter Suppression for Coexisting Land and Weather Clutter Systems

JIRI GALAS Czech Technical University, Engineering, Prague.

ENGI 4430 Parametric Vector Functions Page 2-01

Electric Current and Resistance

Perturbation approach applied to the asymptotic study of random operators.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Matter & Interactions Chapter 17 Solutions

Fundamental Concepts in Structural Plasticity

sin sin Reminder, repetition Image formation by simple curved surface (sphere with radius r): The power (refractive strength):

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Kinetics of Particles. Chapter 3

Physics 2010 Motion with Constant Acceleration Experiment 1

Appendix I: Derivation of the Toy Model

REPRESENTATIONS OF sp(2n; C ) SVATOPLUK KR YSL. Abstract. In this paper we have shown how a tensor product of an innite dimensional

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 4. Function spaces

Further observations, tests and developments

SPH3U1 Lesson 06 Kinematics

Kinetic Model Completeness

Module 4: General Formulation of Electric Circuit Theory

SOME CONSTRUCTIONS OF OPTIMAL BINARY LINEAR UNEQUAL ERROR PROTECTION CODES

Transcription:

Principal Cmpnents Suppse we have N measurements n each f p variables X j, j = 1,..., p. There are several equivalent appraches t principal cmpnents: Given X = (X 1,... X p ), prduce a derived (and small) set f uncrrelated variables Z k = Xα k, k = 1,..., q < p that are linear cmbinatins f the riginal variables, and that explain mst f the variatin in the riginal set. Apprximate the riginal set f N pints in IR p by a least-squares ptimal linear manifld f c-dimensin q < p. Apprximate the N p data matrix X by the best rank-q matrix ˆX (q). This is the usual mtivatin fr the SVD. SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 1

SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 2 PC: Derived Variables -4-2 0 2 4-4 -2 0 2 4 Largest Principal Cmpnent Smallest Principal Cmpnent replacements X 1 X2 Z 1 = Xα 1 is the prjectin f the data nt the lngest directin, and has the largest variance amngst all such nrmalized prjectins. α 1 is the eigenvectr crrespnding t the largest eigenvalue f ˆΣ, the sample cvariance matrix f X. Z 2 and α 2 crrespnd t the secnd-largest eigenvectr.

PC: Least Squares Apprximatin Find the linear manifld f(λ) = µ + V q λ that best apprximates the data in a least-squares sense: min µ,{λ i }, V q N x i µ V q λ i 2. i=1 Slutin: µ = x, v k = α k, λ k = V T q (x i x). SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 3

PC: Singular Value Decmpsitin Let X be the N p data matrix with centered clumns (assume N > p). is the SVD f X, where X = UDV T U is N p rthgnal, the left singular vectrs. V is p p rthgnal, the right singular vectrs. D is diagnal, with d 1 d 2... d p 0, the singular values. The SVD always exists, and is unique up t signs. The clumns f V are the principal cmpnents, and Z j = U j d j. Let D q be D, with all but the first q diagnal elements set t zer. Then ˆX q = UD q V T slves min X ˆX q rank( ˆX q )=q SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 4

PC: Example Digit Data 130 threes, a subset f 638 such threes and part f the handwritten digit dataset. Each three is a 16 16 greyscale image, and the variables X j, j = 1,..., 256 are the greyscale values fr each pixel. SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 5

SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 6 Rank-2 Mdel fr Threes First Principal Cmpnent Secnd Principal Cmpnent -6-4 -2 0 2 4 6 8-5 0 5 Tw-cmpnent mdel has the frm ˆf(λ) = x + λ 1 v 1 + λ 2 v 2 = + λ 1 + λ 2. Here we have displayed the first tw principal cmpnent directins, v 1 and v 2, as images.

SVD: Expressin Arrays The rws are genes (variables) and the clumns are bservatins (samples, DNA arrays). Typically 6-10K genes, 50 samples. SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 7

Eigengenes The first principal cmpnent r eigengene is the linear cmbinatin f the genes shwing the mst variatin ver the samples. The individual gene ladings fr each eigengene r eigenarrays can have bilgical meaning. The sample values fr the eigengenes shw useful lw-dimensinal prjectins. SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 8

Example: NCI Cancer Data First tw eigengenes Pints are clred accrding t NCI cancer classes Lading fr PC-1-0.10 0.0 Principal Cmpnent 2-0.2-0.1 0.0 0.1 0.2 Lading fr PC-2-0.05 0.0 0.05 1 9 9 9 2 2 6 6 9 2 2 2 7 9 1 9 6 6 2 9 7 1 5 6 5 5 5 1 8 5 61 1 6 5 6 4 6 7 7 7 7 4 8 1 34 3 3 4 3 3 3 3 1 4 4 0.04 0.08 0.12 0.16 Principal Cmpnent 1 First tw eigenarrays 0 2000 4000 6000 8000 Gene 0 2000 4000 6000 8000 Gene SL&DM c Hastie & Tibshirani January 25, 2010 Dimensin Reductin: 9