Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b

Size: px

Start display at page:

Download "Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b"

Lydia Long
6 years ago
Views:

1 Logistic Regressio

2 Step : Fuctio Set We wat to fid P w,b C x σ z = + exp z If P w,b C x.5, output C Otherwise, output C 2 z P w,b C x = σ z z = w x + b = w i x i + b i z Fuctio set: f w,b x = P w,b C x Icludig all differet w ad b

3 Step : Fuctio Set x x i w w i z i w i x i z z f w,b x P w,b C x x I w I b Sigmoid Fuctio z e z z z

4 Logistic Regressio Step : f w,b x = σ i w i x i + b Output: betwee ad Liear Regressio f w,b x = w i x i + b i Output: ay value Step 2: Step 3:

5 Step 2: Goodess of a Fuctio Traiig Data x x 3 x N C C C 2 C Assume the data is geerated based o f w,b x = P w,b C x Give a set of w ad b, what is its probability of geeratig the data? L w, b = f w,b x f w,b f w,b x 3 f w,b x N The most likely w * ad b * is the oe with the largest L w, b. w, b = arg max w,b L w, b

6 x x 3 x x 3 C C C 2 y = y 2 = y 3 = L w, b = f w,b x f w,b f w,b x 3 y : for class, for class 2 w, b = arg max w,b L w, b w =, b = arg mi w,b ll w, b ll w, b = lf w,b x lf w,b l f w,b x 3 y lf x + y l f x y 2 lf + y 2 l f y 3 lf x 3 + y 3 l f x 3

7 Step 2: Goodess of a Fuctio L w, b = f w,b x f w,b f w,b x 3 f w,b x N ll w, b = lf w,b x + lf w,b + l f w,b x 3 y : for class, for class 2 = y lf w,b x + y l f w,b x Cross etropy betwee two Beroulli distributio Distributio p: p x = = y p x = = y cross etropy Distributio q: q x = = f x q x = = f x H p, q = p x l q x x

8 Step : Step 2: Logistic Regressio f w,b x = σ w i x i + b Output: betwee ad Traiig data: x, y y : for class, for class 2 i Liear Regressio f w,b x = w i x i + b i Output: ay value Traiig data: x, y y : a real umber L f = C f x, y L f = 2 f x y 2 Cross etropy: C f x, y = y lf x + y l f x Questio: Why do t we simply use square error as liear regressio?

9 Step 3: Fid the best fuctio ll w, b = f w,b x x i y lf w,b x + y l f w,b x lf w,b x lσ z z = σ z = lf w,b x z σ z z z = σ z σ z z = x i σ z σ z σ z z f w,b x = σ z = Τ + exp z z = w x + b = w i x i + b i

10 Step 3: Fid the best fuctio ll w, b = f w,b x x i y lf w,b x f w,b x x i + y l f w,b x l f w,b x = l f w,b x z z z = x i l σ z z = σ z σ z z = σ z σ z σ z f w,b x = σ z = Τ + exp z z = w x + b = w i x i + b i

11 Step 3: Fid the best fuctio ll w, b = f w,b x x i y lf w,b x f w,b x x i + y l f w,b x = y f w,b x x i y f w,b x x i = y y f w,b x f w,b x + y f w,b x x i = y f w,b x x i Larger differece, larger update w i w i η y f w,b x x i

12 Step : Step 2: Logistic Regressio f w,b x = σ w i x i + b Output: betwee ad Traiig data: x, y y : for class, for class 2 i Liear Regressio f w,b x = w i x i + b i Output: ay value Traiig data: x, y y : a real umber L f = C f x, y L f = 2 f x y 2 Logistic regressio: w i w i η y f w,b x x i Step 3: Liear regressio: w i w i η y f w,b x x i

13 Logistic Regressio + Square Error Step : Step 2: f w,b x = σ w i x i + b i Traiig data: x, y, y : for class, for class 2 L f = 2 f w,b x y 2 Step 3: (f w,b (x) y) 2 = 2 f w,b x y f w,b x z z = 2 f w,b x y f w,b x f w,b x x i y = If f w,b x = (close to target) LΤ = If f w,b x = (far from target) LΤ =

14 Logistic Regressio + Square Error Step : Step 2: f w,b x = σ w i x i + b i Traiig data: x, y, y : for class, for class 2 L f = 2 f w,b x y 2 Step 3: (f w,b (x) y) 2 = 2 f w,b x y f w,b x z z = 2 f w,b x y f w,b x f w,b x x i y = If f w,b x = (far from target) LΤ = If f w,b x = (close to target) LΤ =

15 Cross Etropy v.s. Square Error Cross Etropy Total Loss Square Error digs/papers/v9/gloro ta/glorota.pdf w w 2

16 Discrimiative v.s. Geerative P C x = σ w x + b directly fid w ad b Fid μ, μ 2, Σ w T = μ μ 2 T Σ Will we obtai the same set of w ad b? b = 2 μ T Σ μ + 2 μ2 T Σ 2 μ 2 + l N N 2 The same model (fuctio set), but differet fuctio is selected by the same traiig data.

17 Geerative v.s. Discrimiative Geerative Discrimiative All: total, hp, att, sp att, de, sp de, speed 73% accuracy 79% accuracy

18 Geerative v.s. Discrimiative Example Traiig Data X 4 X 4 X 4 Class Class 2 Class 2 Class 2 Testig Data Class? Class 2? How about Naïve Bayes? P x C i = P x C i P C i

19 Geerative v.s. Discrimiative Example Traiig Data X 4 X 4 X 4 Class Class 2 Class 2 Class 2 P C = 3 P x = C = P = C = P C 2 = 2 3 P x = C 2 = 3 P = C 2 = 3

20 Traiig Data X 4 X 4 X 4 Testig Data Class Class 2 <.5 P C x = Class 2 Class 2 3 P x C P C P x C P C + P x C 2 P C P C = 3 P x = C = P = C = P C 2 = 2 3 P x = C 2 = 3 P = C 2 = 3

21 Geerative v.s. Discrimiative Beefit of geerative model With the assumptio of probability distributio, less traiig data is eeded With the assumptio of probability distributio, more robust to the oise Priors ad class-depedet probabilities ca be estimated from differet sources.

22 Multi-class Classificatio (3 classes as example) [Bishop, P29-2] C : C 2 : C 3 : w, b w 2, b 2 w 3, b 3 z = w x + b z 2 = w 2 x + b 2 z 3 = w 3 x + b 3 Probability: > y i > σ i y i = yi P Ci x z z 2 z 3 3 e Softmax e e e z 2 e z 2 e z 3 3 j.5 z j e.88 y.2 y y 2 3 e e e z z z j 3 j 3 j e e e z z z j j j

23 Multi-class Classificatio (3 classes as example) x z = w x + b z 2 = w 2 x + b 2 z 3 = w 3 x + b 3 y = Softmax y y y 2 y 3 Cross Etropy 3 i= y i ly i If x class If x class 2 If x class 3 y = y = ŷ ŷ ŷ 2 ŷ 3 target

24 Limitatio of Logistic Regressio x w w 2 z b w x w2 x2 z Iput Feature Label x Class 2 b y Class y.5 Class2 y.5 Ca we? y.5 y <.5 Class Class y <.5 y.5 Class 2 x

25 Limitatio of Logistic Regressio No, we ca t x w w 2 z y b x x

26 Limitatio of Logistic Regressio Feature Trasformatio x : distace to x x : distace to 2 Not always easy to fid a good trasformatio x 2 x

27 Limitatio of Logistic Regressio Cascadig logistic regressio models x z x z y z2 Feature Trasformatio Classificatio (igore bias i this figure)

28 x =.73 x =.27 x z x x =.27 x =.5 x z2 =.5 =.27 =.27 =.73 x

29 x =.73 x =.27 x w z y x =.27 x =.5 w 2 x =.5 =.27 (.73,.5) =.27 =.73 (.27,.27) (.5,.73) x x

30 Deep Learig! z z z z Neuro Neural Network

31 Referece Bishop: Chapter 4.3

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT