Predicting Graph Labels using Perceptron. Shuang Song

Size: px

Start display at page:

Download "Predicting Graph Labels using Perceptron. Shuang Song"

Laura Harris
5 years ago
Views:

1 Predicting Graph Labels using Perceptron Shuang Song

2 Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction on a graph with a perceptron M. Herbster, and M. Pontil, NIPS 20, 2006

3 Outline 1. Problem Setting 2. Perceptron 3. Properties of Graphs 4. Bound # of mistakes

4 Problem Setting Known graph, unknown labels on vertices eg. Advertisement service on web page eg. Digit recognition task on USPS (graph is built using NN)

6 Problem Setting Given a graph G = (V, E) where V = {1,, n} for t = 1,, l nature selects v t V learner predicts y t {1, 1} nature reveals y t {1, 1} if y t y t, mistakes = mistakes + 1 minimize mistakes

7 Node Predict Nature Mistakes

8 easy hard

9 Problem Setting Implicit assumption: adjacent nodes have similar labels The nature can be adversarial, and the learner can always make mistake; yet if the nature is regular and simple, then it is possible for the learner to make only a few mistake. Bound mistakes using complexity of nature s labelling Assume graph is connected, unweighted

10 What algorithm are we going to use?

11 Perceptron simply linear classification assume linearly separable with margin 1

12 Perceptron: algorithm data: x 1, y 1,, (x l, y l ) H 1, 1 l Initial w 1 = 0 H For t = 1,, l receive x t H predict y t = sign( w t, x t ) receive y t {1, 1} if y t y t mistake = mistake + 1 w t+1 = w t + y t x t else w t+1 = w t

13 Perceptron: mistake bound l Theorem: given a sequence x t, y t t=1 H { 1,1}, and M as the set of trails in which the perceptron predicted incorrectly, then for all w M w 2 max x t 2 t M 1,1 n st. w t = y t, t = 1,, l norm is taken w.r.t. the inner product of H

14 Perceptron: how to use For us, what is the inner product? what is x t? We would want a x t that captures the structure of the whole graph.

15 Perceptron: algorithm data: x 1, y 1,, (x l, y l ) H 1, 1 l Initial w 1 = 0 H For t = 1,, l receive x t H predict y t = sign( w t, x t ) receive y t {1, 1} if y t y t mistake = mistake + 1 w t+1 = w t + y t x t else w t+1 = w t For us, what is x t? What is the inner product? We would want a x t that captures the structure of the whole graph.

16 Properties of Graphs Graph Laplacian L = D A, where A is adjacency matrix and D = diag(d 1,, d n ) Inner product: f, g = f T Lg, f, g R n Semi-norm: f 2 = f, f = f i f j 2 i,j E

17 Properties of Graphs Norm measures smoothness or complexity of a labelling g: g 2 = 3 4 g 2 = 12 4

18 Properties of Graphs g 2 = 1 4 g 2 = 9 4

19 Properties of Graphs Eigenvalue λ i and eigenvector u i of L: Connected 0 = λ 1 < λ 2 λ 3 λ n with u 1 as constant vector H = span u 2,, u n = {g: g T u 1 = 0} = {g: n i=1 g i = 0} Semi-norm becomes norm

20 Properties of Graphs Pseudoinverse K = L + = n i=2 λ i 1 u i u i T It is the reproducing kernel of H: g H, K i = K(:, i) K i, g = g i K t 2 = K tt measures the remoteness of vertex t and it decreases with connectivity

22 This will be our feature of a vertex, i.e., x t = K t

23 Properties of Graphs For p V, define Distance (of two vertices): d p, q = min P(p, q) where P is a path from p to q Eccentricity (of a vertex): ρ p = max q V d(p, q) Diameter (of a graph): D G = max p ρ p

24 Bound M w 2 max t M x t 2 We want to know what w 2 and max t M x t 2 is with the properties of graph

25 Bound # of mistakes: w 2 Firstly we look at w: w 1, +1 n w 2 = w i w j 2 i,j E smoothness or complexity w 2 = 4 (# edges spanning different labels)

26 Bound # of mistakes: x t 2 Then we look at x t = K t : K t 2 = K tt. So we want to bound K tt Theorem: For a connected graph G with Laplacian kernel K, K tt min 1 λ 2, ρ t, t V 2nd smallest eigenvalue eccentricity: ρ t = max min q V P(p, q)

27 Bound # of mistakes : x t 2 Proof: K tt 1 λ 2 g T Lg λ 2 g T g, g H Taking g = K t, K tt λ 2 g p 2 λ 2 K tt 2 K tt ρ t If g t > 0, then s, s.t. g s < 0 path P from t to s, s.t. E P ρ t

28 Bound # of mistakes : x t 2 g i g j g t g s > g t i,j E(P) n n By n i=1 a 2 i i=1 a i for non negative {a i }, we have i,j E(P) 2 i,j E P g i g j g i g j E(P) i,j E P ρ t 2 g i g j 2 g t 2 ρ t 2

29 Bound # of mistakes : x t 2 Taking g = K t, we have K 2 2 t = i,j E(G) K ti K tj and K 2 t = K t, K t = K tt 2 K 2 K tt = K ti K tj tt i,j E(G) ρ t K tt 1 λ 2 and K tt ρ t

30 Bound # of mistakes M w 2 max t M x t 2 4 # edges spanning different labels min 1 λ 2, ρ t

31 Further improvement Noisy samples: M 2 M M w + w 2 X M M w w 2 X 2 + w 4 X 4 Bound K pp using resistance K pp r(p, q) max p,q V 2 4

CS 590D Lecture Notes

CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable