SEMI-SUPERVISED LEARNING

Size: px

Start display at page:

Download "SEMI-SUPERVISED LEARNING"

Randall Chambers
5 years ago
Views:

1 SEMI-SUPERVISED LEARIG Matt Stokes ovember 3, opcs Background Label Propagaton Dentons ranston matrx (random walk) method Harmonc soluton Graph Laplacan method Kernel Methods Smoothness Kernel algnment

ypes o Learnng Unsupervsed Class labels are unknown o eedback/error sgnal Essentally densty estmaton Supervsed Gven labeled tranng examples Can evaluate perormance drectly Learn mappng o X to

2 ypes o Learnng Unsupervsed Class labels are unknown o eedback/error sgnal Essentally densty estmaton Supervsed Gven labeled tranng examples Can evaluate perormance drectly Learn mappng o X to Sem-supervsed Only some samples are labeled Saves tme/cost o labelng large datasets Assumptons Data exst n some knd o clusters Local assumpton Ponts near one another lkely to have the same label Global assumpton Ponts on the same structure (.e. manold) lkely to have the same label Smple clusterng methods (k-) rely only on local structure and can lead to suboptmal results -

3 Label Propagaton Problem setup (Zhu, ) Data (x, y ) (x, y ) consst o: L labeled samples (x, y ) (x L, y L ) U unlabelled samples (x L+, y L+ ) (x L+U, y L+U ) where class labels {y L+ y L+U } are unknown Usually, L<<U umber o classes (C) s known Create a ully connected graph wth samples as nodes, connecton weghts proportonal to sample proxmty D d d = = = ( x d x j wj exp exp σ σ d j ) Asymmetrc transton matrx has dmensons x wj j = P( j ) = w k = kj

4 Label Propagaton ode labels represented as a dstrbuton over classes n label matrx ( rows, C columns) Begn wth arbtrary assgnment o class dstrbutons to unlabeled ponts, known class to labeled ponts Repeat:. Propagate Labels spread normaton along local structure. Row normalze Keep proper dstrbuton over classes 3. Clamp labeled data to orgnal value Keep orgnally labeled ponts c = δ ( y, c) Convergence Represent as row-normalzed block matrces: L = U ll ul lu L U Iteratve update or U L s clamped at orgnal values + U ul L U Result o teraton: n = = Because row-normalzed and s a submatrx, we have: U ( ) ul lm n L n + lm n = Converges regardless o ntal : U ( ) ul L = I n

5 Class Assgnment How should we assgn classes to unlabeled ponts? Could choose most lkely class ML method does not explctly control class proportons Suppose we want labels to t a known or estmated dstrbuton over classes ormalze class mass scale columns o U to t class dstrbuton and then pck ML class Does not guarantee strct label proportons Perorm label bddng each entry U (,c) s a bd o sample or class c Handle bds rom largest to smallest Bd s taken class c s not ull, otherwse t s dscarded Parameterzaton Sngle parameter σ controls spread o labels For σ, classcaton o unlabeled ponts domnated by nearest labeled pont For σ, class probabltes just become class requences (no normaton rom label proxmty) Buld mnmum spannng tree, longest edges rst Set σ = d*/3, where d* s the rst edge connectng subgraphs contanng derently labeled ponts Can mnmze entropy o class labels Leads to condent classcatons However, mnmum entropy at σ=

6 Optmzng σ Add unorm transton component (U j =/) to ~ = εu + ( ε ) For small σ, unorm component domnates Mnmum entropy no longer at σ= Use σ σ to scale each dmenson ndependently Perorm gradent descent wth respect to σ s n order to mnmze entropy H σ d = H L + U C c = L+ c= c σ d What s gong on? ranston matrx holds probabltes o movng rom one node to another Very smlar to Markov random walker However, nsenstve to tmescale o the walk Constant source labels leads to equlbrum as teratons ncrease Mean eld approxmaton nterpretaton or parwse Markov random eld F Label propagaton nds most lkely labels or the approxmate mean eld soluton o F ot just most lkely state (MnCut) Can splt clusters equdstant rom labeled ponts

7 Harmonc Functons (Zhu, 3) ow dene class labelng n terms o a Gaussan over contnuous space, nstead o random eld over dscrete label set Dstrbuton on s a Gaussan eld βe( ) e pβ ( ) = Z β Z = exp( βe( )) d β L= l Useul or mult-label problems (P-hard or dscrete random elds) ML conguraton s now unque, attanable by matrx methods, and characterzed by harmonc unctons Harmonc Energy Energy o soluton labelng s dened as: E( ) = wj ( ( ) ( j)), j earby ponts should have smlar labels Soluton whch mnmzes E() s harmonc = or unlabeled ponts, where =D-W (combnatoral Laplacan) = l or labeled ponts Value o at an unlabeled pont s the average o at neghborng ponts ( j) = wj ( ), or j = L +, K, L + U d j ~ j = D W

8 Harmonc Soluton As beore, splt problem nto: = Solve usng =, L = l : u = ( D Wll W = Wul W ) W Can be vewed as heat kernel classcaton, but ndependent o tme parameter ul l W W = ( I P ) P P = D l lu u ul l W Other nterpretatons Consder random walker on data graph wth gven transton probabltes startng rom unlabeled node () s the probablty that the rst labeled node encountered s o class Soluton s an equlbrum state, not dependng on tme t Can also be vewed as electrcal network Class labels connected to source, class labels to ground Weghts represent conductance u s the resultng voltage on an unlabeled node Mnmzes energy dsspaton n the network

9 Reormulaton (Zhou) Explctly model sel-renorcement o labeled nodes o clampng o values Orgnal labels stored n Dstrbuton o labels now stored n F(t) Inormaton spreads symmetrcally S s the normalzed graph Laplacan Identcal to spectral clusterng Smlar to transton matrx ote that (I-αS) - s a duson kernel d j wj = exp or j σ w = w j D = S = D j = / WD w F( t + ) = αsf( t) + ( α) j= / j O K j= M w j F* = lmf( t) = ( α)( I αs) t Regularzaton Dene cost uncton Q assocated wth assgnment o class labels F F Fj Q( F) = Wj + µ F, j= D D jj = Smoothness constrant ensures classcaton does not change much between nearby ponts Fttng constrant ensures classcaton does not devatedmuch rom ntal assgnment F* optmzes soluton to the regularzed ramework Fttng Smoothness Q F F= F* = F* SF* + µ ( F* ) = µ F* SF* = + µ + µ α = + µ F* = ( α)( I αs)

10 Kernel Methods Revew Graph Laplacan has egenvectors φ φ, egenvalues λ λ Smallest egenvalues correspond to smoothest egenvectors hese egenvectors most useul or classcaton

11 Kernels by Spectral ransorm Sem-supervsed learnng creates a smooth uncton over unlabeled ponts Generally, smooth () (j) or pars wth large W j L = Wj ( ( ) ( j)) = α λ, j = = Derent weghtngs (.e. spectral transorms) o Laplacan egenvalues leads to derent smoothness measures We want a kernel K that respects smoothness Dene usng egenvectors o Laplacan (φ) and egenvalues o K (µ) K = = µ φ φ Can also dene n terms o a spectral transorm o Laplacan egenvalues K = = r( λ ) φ φ ypes o ransorms r(λ ) s a non-negatve and decreasng transorm Regularze d Laplacan Duson Kernel - step Random Walk p - step Random Walk Inverse Cosne Step Functon r( λ) = λ + ε σ r( λ) = exp λ r( λ) = ( α λ), α p r( λ) = ( α λ), α r( λ) = cos( λπ / 4) r( λ) = λ λ Reverses order o egenvalues, so smooth egenvectors have larger egenvalues n K Is there an optmal transorm? cut

12 Kernel Algnment Assess tness o a kernel to tranng labels Emprcal kernel algnment compares kernel matrx K tr or tranng data to target matrx or tranng data j = y =y j, otherwse j =- K M, = tr r( M) F F A ˆ, ( Ktr, ) = Frobenus Product K, K, tr tr F Algnment measure computes cosne between K tr and Fnd the optmal spectral transormaton r(λ ) usng the kernel algnment noton F QCQP Kernel algnment between K tr and s a convex uncton o kernel egenvalues µ o assumpton on parametrc orm o transorm r(λ ) eed K to be postve sem-dente Restrct egenvalues o K to be Leads to computatonally ecent Quadratcally Constraned Quadratc Program Mnmze convex quadratc uncton over smaller easble regon Both objectve uncton and constrants are quadratc Complexty comparable to lnear programs

13 Constrants We would lke to keep decreasng order on spectral transormaton Smooth unctons are preerred bgger egenvalues or smoother egenvectors Constant egenvectors act as a bas term n the graph kernel λ =, correspondng egenvector φ s constant eed not constran bas terms max Aˆ( K, ) K subject to K = µ K K = φφ µ µ µ, =... n, φ not constant tr K, tr F = + Summary Unsupervsed learnng nvolves spreadng normaton rom labeled nodes to unlabeled nodes Multple ormulatons wth derent nterpretatons Clamped verson equvalent to Markov random walk Harmonc soluton equvalent to electrcal network Unclamped verson equvalent to duson kernel Kernel methods use optmally smoothng spectral transorms o the data Algn kernel to labeled tranng data or optmal perormance

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real