Machine Learning for Data Science (CS4786) Lecture 4

Similar documents
Machine Learning for Data Science (CS4786) Lecture 9

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786)

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

5.1 Review of Singular Value Decomposition (SVD)

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Topics in Eigen-analysis

Dimensionality Reduction vs. Clustering

Lecture 8: October 20, Applications of SVD: least squares approximation

Mon Apr Second derivative test, and maybe another conic diagonalization example. Announcements: Warm-up Exercise:

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Linear Regression Demystified

ECON 3150/4150, Spring term Lecture 3

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

The DOA Estimation of Multiple Signals based on Weighting MUSIC Algorithm

APPLIED MULTIVARIATE ANALYSIS

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Homework Set #3 - Solutions

10-701/ Machine Learning Mid-term Exam Solution

EECS 442 Computer vision. Multiple view geometry Affine structure from Motion

Paired Data and Linear Correlation

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Axis Aligned Ellipsoid

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

Assumptions. Motivation. Linear Transforms. Standard measures. Correlation. Cofactor. γ k

CMSE 820: Math. Foundations of Data Sci.

18.S096: Homework Problem Set 1 (revised)

Sensitivity Analysis of Daubechies 4 Wavelet Coefficients for Reduction of Reconstructed Image Error

Machine Learning Brett Bernstein

Affine Structure from Motion

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

Correlation and Covariance

Symmetric Matrices and Quadratic Forms

Variable selection in principal components analysis of qualitative data using the accelerated ALS algorithm

Simple Linear Regression

Non-linear Feature Extraction by the Coordination of Mixture Models

Notes The Incremental Motion Model:

Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

PCA SVD LDA MDS, LLE, CCA. Data mining. Dimensionality reduction. University of Szeged. Data mining

State Space Representation

Properties and Hypothesis Testing

18.657: Mathematics of Machine Learning

Algebra of Least Squares

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

CS276A Practice Problem Set 1 Solutions

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Math 475, Problem Set #12: Answers

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.

Chapter 1 Simple Linear Regression (part 6: matrix version)

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Machine Learning Brett Bernstein

Intelligent Systems I 08 SVM

STATS 306B: Unsupervised Learning Spring Lecture 8 April 23

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Optimum LMSE Discrete Transform

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

PROBLEM SET I (Suggested Solutions)

THE KALMAN FILTER RAUL ROJAS

Linear Support Vector Machines

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

LINEAR ALGEBRAIC GROUPS: LECTURE 6

Chapter Vectors

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Q. 1 Q. 5 carry one mark each.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Notes 27 : Brownian motion: path properties

Questions and answers, kernel part

Math 61CM - Solutions to homework 3

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Math Solutions to homework 6

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine

1 Principal Component Analysis in High Dimensions and the Spike Model

11 Correlation and Regression

Econ 325: Introduction to Empirical Economics

Eigenvalues and Eigenvectors

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 25 (Dec. 6, 2017)

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Linear Regression Models, OLS, Assumptions and Properties

Improvement of Generic Attacks on the Rank Syndrome Decoding Problem

Chapter 4. Fourier Series

1 Last time: similar and diagonalizable matrices

CHAPTER 5. Theory and Solution Using Matrix Techniques

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

(VII.A) Review of Orthogonality

MATH 10550, EXAM 3 SOLUTIONS

Probabilistic Unsupervised Learning

9. Simple linear regression G2.1) Show that the vector of residuals e = Y Ŷ has the covariance matrix (I X(X T X) 1 X T )σ 2.

Transcription:

Machie Learig for Data Sciece (CS4786) Lecture 4 Caoical Correlatio Aalysis (CCA) Course Webpage : http://www.cs.corell.edu/courses/cs4786/2016fa/

Aoucemet We are gradig HW0 ad you will be added to cms by moday HW1 will be posted toight o webpage (homework tab) HW1 o CCA ad PCA (due i a week)

QUIZ x > 1 X x > d Assume poits are cetered. Which of the followig are equal to the covariace matrix? X X A. = 1 t=1 x > t x t B. = 1 t=1 C. = XX > D. = XX > x t x > t

Example: Studets i classroom z y x

Maximize Spread Miimize Recostructio Error

! PRINCIPAL COMPONENT ANALYSIS Eigevectors of the covariace matrix are the pricipal compoets 1. =cov X Top K pricipal compoets are the eigevectors with K largest eigevalues 2. W = eigs,k Projectio = Data Top Keigevectors ( ) Recostructio = Projectio Traspose of top K eigevectors X µ Idepedetly discovered by Pearso i 1901 ad Hotellig i 1933. 3. Y = W

RECONSTRUCTION 4. bx W > +µ = Y

WHEN d >> If d >> the is large But we oly eed top K eige vectors. Idea: use SVD X µ = UDV V T V = I U T U = I The ote that, = (X µ) (X µ) = VD 2 V Hece, matrix V is the same as matrix W got from eige decompositio of, eigevalues are diagoal elemets of D 2 Alterative algorithm: [U, V] = SVD(X µ, K) W = V

WHEN TO USE PCA? Whe data aturally lies i a low dimesioal liear subspace To miimize recostructio error Fid directios where data is maximally spread

Caoical Correlatio Aalysis Age + Geder z y Agle x

Caoical Correlatio Aalysis Age + Geder z y Agle x

TWO VIEW DIMENSIONALITY REDUCTION Data comes i pairs (x 1, x 1 ),...,(x, x ) where x t s are d dimesioal ad x t s are d dimesioal Goal: Compress say view oe ito y 1,...,y, that are K dimesioal vectors Retai iformatio redudat betwee the two views Elimiate oise specific to oly oe of the views

EXAMPLE I: SPEECH RECOGNITION + Audio might have backgroud souds ucorrelated with video Video might have lightig chages ucorrelated with audio Redudat iformatio betwee two views: the speech

EXAMPLE II: COMBINING FEATURE EXTRACTIONS Method A ad Method B are both equally good feature extractio techiques Cocateatig the two features blidly yields large dimesioal feature vector with redudacy Applyig techiques like CCA extracts the key iformatio betwee the two methods Removes extra uwated iformatio

How do we get the right directio? (say K = 1) Age + Geder Agle

WHICH DIRECTION TO PICK? View I View II

WHICH DIRECTION TO PICK? PCA directio 0 0

WHICH DIRECTION TO PICK? Directio has large covariace

How do we pick the right directio to project to?

MAXIMIZING CORRELATION COEFFICIENT Say w 1 ad v 1 are the directios we choose to project i views 1 ad 2 respectively we wat these directios to maximize, 1 t=1 y t [1] 1 t=1 y t [1] y t[1] 1 y t[1] t=1 s.t. 1 t=1 y t[1] 1 t=1 y t[1] 2 = 1 t=1 y t [1] 1 t=1 y t [1] = 1 where y t [1] = w 1 x t ad y t [1] = v 1 x t

MAXIMIZING CORRELATION COEFFICIENT Say w 1 ad v 1 are the directios we choose to project i views 1 ad 2 respectively we wat these directios to maximize, 1 t=1 y t [1] 1 t=1 y t [1] y t[1] 1 y t[1] t=1 s.t. 1 t=1 y t[1] 1 t=1 y t[1] 2 = 1 t=1 y t [1] 1 t=1 y t [1] = 1 where y t [1] = w 1 x t ad y t [1] = v 1 x t

What is the problem with the above?

WHY NOT MAXIMIZE COVARIANCE Relevat iformatio Say 1 X t=1 x t [2] x 0 t[2] > 0 Scalig up this coordiate we ca blow up covariace

MAXIMIZING CORRELATION COEFFICIENT Say w 1 ad v 1 are the directios we choose to project i views 1 ad 2 respectively we wat these directios to maximize, 1 t=1 y t[1] 1 t=1 y t[1] y t [1] 1 t=1 y t [1] 1 t=1 y t[1] 1 t=1 y t[1] 2 1 t=1 y t [1] 1 t=1 y t [1]

BASIC IDEA OF CCA Normalize variace i chose directio to be costat (say 1) The maximize covariace This is same as maximizig correlatio coefficiet (recall from last class).

COVARIANCE VS CORRELATION Covariace(A, B) = E[(A E[A]) (B E[B])] Depeds o the scale of A ad B. If B is rescaled, covariace shifts. Corelatio(A, B) = E[(A E[A]) (B E[B])] Var(A) Var(B) Scale free.

MAXIMIZING CORRELATION COEFFICIENT Say w 1 ad v 1 are the directios we choose to project i views 1 ad 2 respectively we wat these directios to maximize, 1 t=1 y t [1] 1 t=1 y t [1] y t[1] 1 y t[1] t=1 s.t. 1 t=1 y t[1] 1 t=1 y t[1] 2 = 1 t=1 y t [1] 1 t=1 y t [1] = 1 where y t [1] = w 1 x t ad y t [1] = v 1 x t

CANONICAL CORRELATION ANALYSIS Hece we wat to solve for projectio vectors w 1 ad v 1 that maximize 1 subject to 1 t=1 w 1 (x t µ) v 1 (x t µ ) t=1(w 1 (x t µ)) 2 = 1 t=1 (v 1 (x t µ )) 2 = 1 where µ = 1 t=1 x t ad µ = 1 t=1 x t

CANONICAL CORRELATION ANALYSIS Hece we wat to solve for projectio vectors w 1 ad v 1 that maximize 1 subject to 1 t=1 w 1 (x t µ) v 1 (x t µ ) t=1(w 1 (x t µ)) 2 = 1 t=1 (v 1 (x t µ )) 2 = 1 where µ = 1 t=1 x t ad µ = 1 t=1 x t

CANONICAL CORRELATION ANALYSIS Hece we wat to solve for projectio vectors w 1 ad v 1 that maximize w 1 1,2v 1 subject to w 1 1,1w 1 = v 1 2,2v 1 = 1 Writig Lagragia takig derivative equatig to 0 we get =cov X 0 = X1,1 1,2 v 1 = 2 v 1 1,2 1 2,2 2,1w 1 = 2 1,1 w 1 ad 2,1 1 1,1 1,2v 1 = 2 2,2 v 1 11 12 or equivaletly 1 1,1 1,2 1 2,2 2,1 w21 1 = 2 22 w 1 ad 1 2,2 2,1 1!

SOLUTION eigs ( ) =,K 11 12 22 1 21 W 1 1 ( ) = eigs,k 22 21 11 1 12 W 2 1

1. CCA ALGORITHM! X = X 1 X 2 Write x t = x t x t the d + d dimesioal cocateated vectors., Calculate covariace matrix d1 of the joit d2 data poits = =cov 1,1 1,2 X 2. = 11 12 2,1 2,2 21 22! Calculate 1 1,1 1,2 1 2,2 2,1. The top K eige vectors of this matrix give us projectio matrix for view I. ( ) = eigs,k 11 12 22 1 21 3. W 1 1 Calculate 1 2,2 2,1 1 1,1 1,2. The top K eige vectors of this matrix give us projectio matrix for view II. X µ 4. Y = W 1 1 1 1