Machie Learig for Data Sciece (CS4786) Lecture 4 Caoical Correlatio Aalysis (CCA) Course Webpage : http://www.cs.corell.edu/courses/cs4786/2016fa/
Aoucemet We are gradig HW0 ad you will be added to cms by moday HW1 will be posted toight o webpage (homework tab) HW1 o CCA ad PCA (due i a week)
QUIZ x > 1 X x > d Assume poits are cetered. Which of the followig are equal to the covariace matrix? X X A. = 1 t=1 x > t x t B. = 1 t=1 C. = XX > D. = XX > x t x > t
Example: Studets i classroom z y x
Maximize Spread Miimize Recostructio Error
! PRINCIPAL COMPONENT ANALYSIS Eigevectors of the covariace matrix are the pricipal compoets 1. =cov X Top K pricipal compoets are the eigevectors with K largest eigevalues 2. W = eigs,k Projectio = Data Top Keigevectors ( ) Recostructio = Projectio Traspose of top K eigevectors X µ Idepedetly discovered by Pearso i 1901 ad Hotellig i 1933. 3. Y = W
RECONSTRUCTION 4. bx W > +µ = Y
WHEN d >> If d >> the is large But we oly eed top K eige vectors. Idea: use SVD X µ = UDV V T V = I U T U = I The ote that, = (X µ) (X µ) = VD 2 V Hece, matrix V is the same as matrix W got from eige decompositio of, eigevalues are diagoal elemets of D 2 Alterative algorithm: [U, V] = SVD(X µ, K) W = V
WHEN TO USE PCA? Whe data aturally lies i a low dimesioal liear subspace To miimize recostructio error Fid directios where data is maximally spread
Caoical Correlatio Aalysis Age + Geder z y Agle x
Caoical Correlatio Aalysis Age + Geder z y Agle x
TWO VIEW DIMENSIONALITY REDUCTION Data comes i pairs (x 1, x 1 ),...,(x, x ) where x t s are d dimesioal ad x t s are d dimesioal Goal: Compress say view oe ito y 1,...,y, that are K dimesioal vectors Retai iformatio redudat betwee the two views Elimiate oise specific to oly oe of the views
EXAMPLE I: SPEECH RECOGNITION + Audio might have backgroud souds ucorrelated with video Video might have lightig chages ucorrelated with audio Redudat iformatio betwee two views: the speech
EXAMPLE II: COMBINING FEATURE EXTRACTIONS Method A ad Method B are both equally good feature extractio techiques Cocateatig the two features blidly yields large dimesioal feature vector with redudacy Applyig techiques like CCA extracts the key iformatio betwee the two methods Removes extra uwated iformatio
How do we get the right directio? (say K = 1) Age + Geder Agle
WHICH DIRECTION TO PICK? View I View II
WHICH DIRECTION TO PICK? PCA directio 0 0
WHICH DIRECTION TO PICK? Directio has large covariace
How do we pick the right directio to project to?
MAXIMIZING CORRELATION COEFFICIENT Say w 1 ad v 1 are the directios we choose to project i views 1 ad 2 respectively we wat these directios to maximize, 1 t=1 y t [1] 1 t=1 y t [1] y t[1] 1 y t[1] t=1 s.t. 1 t=1 y t[1] 1 t=1 y t[1] 2 = 1 t=1 y t [1] 1 t=1 y t [1] = 1 where y t [1] = w 1 x t ad y t [1] = v 1 x t
MAXIMIZING CORRELATION COEFFICIENT Say w 1 ad v 1 are the directios we choose to project i views 1 ad 2 respectively we wat these directios to maximize, 1 t=1 y t [1] 1 t=1 y t [1] y t[1] 1 y t[1] t=1 s.t. 1 t=1 y t[1] 1 t=1 y t[1] 2 = 1 t=1 y t [1] 1 t=1 y t [1] = 1 where y t [1] = w 1 x t ad y t [1] = v 1 x t
What is the problem with the above?
WHY NOT MAXIMIZE COVARIANCE Relevat iformatio Say 1 X t=1 x t [2] x 0 t[2] > 0 Scalig up this coordiate we ca blow up covariace
MAXIMIZING CORRELATION COEFFICIENT Say w 1 ad v 1 are the directios we choose to project i views 1 ad 2 respectively we wat these directios to maximize, 1 t=1 y t[1] 1 t=1 y t[1] y t [1] 1 t=1 y t [1] 1 t=1 y t[1] 1 t=1 y t[1] 2 1 t=1 y t [1] 1 t=1 y t [1]
BASIC IDEA OF CCA Normalize variace i chose directio to be costat (say 1) The maximize covariace This is same as maximizig correlatio coefficiet (recall from last class).
COVARIANCE VS CORRELATION Covariace(A, B) = E[(A E[A]) (B E[B])] Depeds o the scale of A ad B. If B is rescaled, covariace shifts. Corelatio(A, B) = E[(A E[A]) (B E[B])] Var(A) Var(B) Scale free.
MAXIMIZING CORRELATION COEFFICIENT Say w 1 ad v 1 are the directios we choose to project i views 1 ad 2 respectively we wat these directios to maximize, 1 t=1 y t [1] 1 t=1 y t [1] y t[1] 1 y t[1] t=1 s.t. 1 t=1 y t[1] 1 t=1 y t[1] 2 = 1 t=1 y t [1] 1 t=1 y t [1] = 1 where y t [1] = w 1 x t ad y t [1] = v 1 x t
CANONICAL CORRELATION ANALYSIS Hece we wat to solve for projectio vectors w 1 ad v 1 that maximize 1 subject to 1 t=1 w 1 (x t µ) v 1 (x t µ ) t=1(w 1 (x t µ)) 2 = 1 t=1 (v 1 (x t µ )) 2 = 1 where µ = 1 t=1 x t ad µ = 1 t=1 x t
CANONICAL CORRELATION ANALYSIS Hece we wat to solve for projectio vectors w 1 ad v 1 that maximize 1 subject to 1 t=1 w 1 (x t µ) v 1 (x t µ ) t=1(w 1 (x t µ)) 2 = 1 t=1 (v 1 (x t µ )) 2 = 1 where µ = 1 t=1 x t ad µ = 1 t=1 x t
CANONICAL CORRELATION ANALYSIS Hece we wat to solve for projectio vectors w 1 ad v 1 that maximize w 1 1,2v 1 subject to w 1 1,1w 1 = v 1 2,2v 1 = 1 Writig Lagragia takig derivative equatig to 0 we get =cov X 0 = X1,1 1,2 v 1 = 2 v 1 1,2 1 2,2 2,1w 1 = 2 1,1 w 1 ad 2,1 1 1,1 1,2v 1 = 2 2,2 v 1 11 12 or equivaletly 1 1,1 1,2 1 2,2 2,1 w21 1 = 2 22 w 1 ad 1 2,2 2,1 1!
SOLUTION eigs ( ) =,K 11 12 22 1 21 W 1 1 ( ) = eigs,k 22 21 11 1 12 W 2 1
1. CCA ALGORITHM! X = X 1 X 2 Write x t = x t x t the d + d dimesioal cocateated vectors., Calculate covariace matrix d1 of the joit d2 data poits = =cov 1,1 1,2 X 2. = 11 12 2,1 2,2 21 22! Calculate 1 1,1 1,2 1 2,2 2,1. The top K eige vectors of this matrix give us projectio matrix for view I. ( ) = eigs,k 11 12 22 1 21 3. W 1 1 Calculate 1 2,2 2,1 1 1,1 1,2. The top K eige vectors of this matrix give us projectio matrix for view II. X µ 4. Y = W 1 1 1 1