Multilayer Perceptron (MLP)

Size: px

Start display at page:

Download "Multilayer Perceptron (MLP)"

Belinda Oliver
6 years ago
Views:

1 Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea 1 / 20

2 Outlne Perceptron: A sngle layer neural network. Multlayer perceptron (MLP): A multlayer extenson of perceptron. Unversal approxmaton Error back-propagaton (BP) algorthm 2 / 20

3 Lnear Classfcaton Let a real-valued functon f : X R m R be a dscrmnant functon. In bnary classfcaton, the nput x X s assgned to the postve class f f (x) 0, and otherwse to the negatve class. Lnear classfcaton consders a lnear dscrmnant functon whch has the form f (x) = w x + b, where (w, b) R m R (weght vector, bas) are the parameters that control the functon. Decson rule s gven by sgn (f (x)), { 1 f f (x) 0 sgn (f (x)) = 1 otherwse 3 / 20

4 Separatng Hyperplane A separatng hyperplane s defned by w x + b = 0. The nput space X s splt nto two parts by the hyperplane The separatng hyperplane s an affne subspace of dmenson m 1 whch dvdes the space nto two half spaces whch corresponds to the nputs of the two dstnct classes. 4 / 20

5 Separatng Hyperplane: Geometrc Vew 5 / 20

6 Perceptron Proposed by Rosenblatt n 1956 The frst teratve algorthm for learnng lnear classfcaton A sngle-layer neural network wth threshold actvaton functon: y = sgn ( w x + b ) On-lne and mstake-drven procedure: The weght vector s updated each tme a tranng pont s msclassfed Perceptron Convergence: The algorthm s guaranteed to converge when data are lnearly separable 6 / 20

7 Lnearly Separable Two classes of patterns are lnearly separable f they can be separated by a lnear hyperplane. In other words, there exsts a hyperplane whch separates two classes. 7 / 20

8 Perceptron Crteron Suppose that target values {y t } take ether 1 or -1: { 1 f x C1 y t = 1 f x C 2 What we want here s to fnd a w such that { w x t > 0 for x t C 1 w x t < 0 for x t C 2, whch s dentcal to w x t y t > 0 x t. 8 / 20

9 Perceptron Crteron (Cont d) The perceptron crteron leads to the followng objectve functon E(w) = w x t y t, x t M where M s the set of vectors x t whch are msclassfed by the current weght vector. The gradent of E(w) s E w = x t y t. x t M 9 / 20

10 Perceptron Learnng: A Basc Idea If correct, do not move. If not correct, move t to the left. Perceptron Learnng If the pattern s correctly classfed, do nothng. If not, w = η x t y t. x t M 10 / 20

11 Perceptron Learnng: Algorthm Outlne 1. Get a tranng sample 2. Check to see f t s msclassfed 2.1 If classfed correctly, do nothng 2.2 If classfed ncorrectly, update w by w (k+1) = w (k) + ηx ty t 3. Repeat steps 1 and 2 untl convergence 11 / 20

12 Perceptron Convergence Theorem The perceptron classfer mnmzes the error probablty, whle MMSE classfer does not. One can easly see that the perceptron learnng reduce the error w (k+1) x ty t = w (k) x ty t x t y t 2 w (k) x ty t. x t M Theorem If classes C 1 and C 2 are lnearly separable, then the perceptron rule converges n a fnte number of steps to a separatng hyperplane. 12 / 20

13 McCulloch-Ptts Model y ϕ( ) v Σ θ v = w 1 x w m x m θ y = ϕ(v) ϕ( ): squashng functon (hard-lmter, logstc, tanh) w 1 w m x 1 x m 13 / 20

14 MLP: Structure Informaton Error Output layer Hdden layer Input layer 14 / 20

15 Tran MLP: Error Back-propagaton Error functon (sum of squared errors) s gven by where e j = d j y (L) j. E = 1 ej 2, 2 We can use the gradent method to derve an updatng algorthm for tranng MLPs. Any problem? Weghts connectng hdden layer and output layer, can be easly updated, usng the gradent method. What about the rest of weghts? Use the dea of error back-propagaton! j 15 / 20

16 MLP Output Layer e + + y (L) d ϕ( ) + v (L) w (L) j y (L 1) 1 y (L 1) m 16 / 20

17 Updatng Fnal Layer Weghts The nput-output relaton s gven by v (L) = j w (L) j y (L 1) j, y (L) = ϕ(v (L) ). Compute the gradent E w (L) j = E v (L) v (L) w (L) j, where E v (L) v (L) w (L) j = E e e y (L) = y (L 1) j. y (L) v (L) ( ) = e ϕ v (L), 17 / 20

18 Updatng Fnal Layer Weghts (Cont d) Defne the local gradent δ (L) = E v (L) Then the updatng algorthm for w (L) j ( ) = e ϕ v (L). s gven by w (L) j (t + 1) = w (L) j (t) η E w (L) j = w (L) j (t) + η δ (L) (t) y (L 1) (t). Ths updatng rule s true for every layer, but the problem wll le n computng the local gradent, except for the fnal layer. We need a recurson for the local gradent the key dea of BP j 18 / 20

19 Recurson for the local gradent δ (L 1) = E v (L 1) = E y (L 1) y (L 1) v (L 1) = E y (L 1) ( ) ϕ v (L 1). E y (L 1) = k = k e k e k = y (L 1) k ) e k ϕ ( v (L) k w (L) k e k e k v (L) k = k v (L) k y (L 1) δ (L) k w (L) k. Then, the recurson for the local gradent s descrbed by δ (L 1) = E = y (L 1) ( k δ (L) k w (L) k ( ) ϕ v (L 1) ) ( ) ϕ v (L 1). 19 / 20

20 Algorthm Outlne: BP j (t + 1) = w (l) j (t) + η δ (l) (t) y (l 1) j (t). w (l) δ (l) (t) = ϕ ( v (l) e (L) (t) ϕ ( ) (t) k δ(l+1) k ) v (L) (t) (t)w (l+1) k (t) hdden neuron output neuron 20 / 20

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer