Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Size: px
Start display at page:

Download "Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5"

Transcription

1 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27

2 Quiz question For a test instance (x, y) and a naïve Bayes classifier ˆP, which of the following statements is true? (A) c ˆP(y = c x) = 1 (B) c ˆP(x c) = 1 (C) c ˆP(x c)ˆp(c) = 1 Machine Learning: Chenhao Tan Boulder 2 of 27

3 Overview Objective function Gradient Descent Stochastic Gradient Descent Machine Learning: Chenhao Tan Boulder 3 of 27

4 Reminder: Logistic Regression 1 P(Y = 0 X) = 1 + exp [β 0 + i β ix i ] P(Y = 1 X) = exp [β 0 + i β ix i ] 1 + exp [β 0 + i β ix i ] (1) (2) Discriminative prediction: P(y x) Classification uses: sentiment analysis, spam detection What we didn t talk about is how to learn β from data Machine Learning: Chenhao Tan Boulder 4 of 27

5 Objective function Outline Objective function Gradient Descent Stochastic Gradient Descent Machine Learning: Chenhao Tan Boulder 5 of 27

6 Objective function Logistic Regression: Objective Function Maximize likelihood Obj log P(Y X, β) = j = j log P(y (j) x (j), β)) y (j) ( β 0 + i β i x (j) i ) log [ 1 + exp ( β 0 + i β i x (j) i )] Machine Learning: Chenhao Tan Boulder 6 of 27

7 Objective function Logistic Regression: Objective Function Minimize negative log likelihood (loss) L log P(Y X, β) = j = j log P(y (j) x (j), β)) y (j) ( β 0 + i β i x (j) i ) + log [ 1 + exp ( β 0 + i β i x (j) i )] Machine Learning: Chenhao Tan Boulder 7 of 27

8 Objective function Logistic Regression: Objective Function Minimize negative log likelihood (loss) L log P(Y X, β) = j = j log P(y (j) x (j), β)) y (j) ( β 0 + i β i x (j) i ) + log [ 1 + exp ( β 0 + i β i x (j) i )] Training data {(x, y)} are fixed. Objective function is a function of β... what values of β give a good value? Machine Learning: Chenhao Tan Boulder 7 of 27

9 Objective function Logistic Regression: Objective Function Minimize negative log likelihood (loss) L log P(Y X, β) = j = j log P(y (j) x (j), β)) y (j) ( β 0 + i β i x (j) i ) + log [ 1 + exp ( β 0 + i β i x (j) i )] Training data {(x, y)} are fixed. Objective function is a function of β... what values of β give a good value? β = arg min L (β) β Machine Learning: Chenhao Tan Boulder 7 of 27

10 Objective function Convexity L (β) is convex for logistic regression. Proof. Logistic loss yv + log(1 + exp(v)) is convex. Composition with linear function maintains convexity. Sum of convex functions is convex. Machine Learning: Chenhao Tan Boulder 8 of 27

11 Gradient Descent Outline Objective function Gradient Descent Stochastic Gradient Descent Machine Learning: Chenhao Tan Boulder 9 of 27

12 Gradient Descent Convexity Convex function Doesn t matter where you start, if you go down along the gradient Machine Learning: Chenhao Tan Boulder 10 of 27

13 Gradient Descent Convexity Convex function Doesn t matter where you start, if you go down along the gradient Gradient! Machine Learning: Chenhao Tan Boulder 10 of 27

14 Gradient Descent Convexity It would have been much harder if this is not convex. Machine Learning: Chenhao Tan Boulder 11 of 27

15 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Machine Learning: Chenhao Tan Boulder 12 of 27

16 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27

17 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective 0 Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27

18 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective 0 Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27

19 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective 0 1 Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27

20 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective 0 1 Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27

21 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27

22 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27

23 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27

24 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β β l+1 j = β l j η L β j Machine Learning: Chenhao Tan Boulder 12 of 27

25 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β β l+1 j = β l j η L β j Luckily, (vanilla) logistic regression is convex Machine Learning: Chenhao Tan Boulder 12 of 27

26 Gradient Descent Gradient for Logistic Regression To ease notation, let s define π i = exp βt x i 1 + exp β T x i (3) Our objective function is L = i log p(y i x i ) = i L i = i { log π i if y i = 1 log(1 π i ) if y i = 0 (4) Machine Learning: Chenhao Tan Boulder 13 of 27

27 Gradient Descent Taking the Derivative Apply chain rule: L β j = i L i ( β) β j = i 1 π i π i ( ) 1 1 π i π i β j β j if y i = 1 if y i = 0 (5) If we plug in the derivative, we can merge these two cases π i β j = π i (1 π i )x j, (6) L i β j = (y i π i )x j. (7) Machine Learning: Chenhao Tan Boulder 14 of 27

28 Gradient Descent Gradient for Logistic Regression Gradient Update β L ( β) = [ L ( β) ],..., L ( β) β 0 β n (8) β η β L ( β) (9) β i β i η L ( β) β i (10) Machine Learning: Chenhao Tan Boulder 15 of 27

29 Gradient Descent Gradient for Logistic Regression Gradient Update β L ( β) = [ L ( β) ],..., L ( β) β 0 β n (8) β η β L ( β) (9) β i β i η L ( β) β i (10) Why are we subtracting? What would we do if we wanted to do ascent? Machine Learning: Chenhao Tan Boulder 15 of 27

30 Gradient Descent Gradient for Logistic Regression Gradient Update β L ( β) = [ L ( β) ],..., L ( β) β 0 β n (8) η: step size, must be greater than zero β η β L ( β) (9) β i β i η L ( β) β i (10) Machine Learning: Chenhao Tan Boulder 15 of 27

31 Gradient Descent Choosing Step Size Objective Parameter Machine Learning: Chenhao Tan Boulder 16 of 27

32 Gradient Descent Choosing Step Size Objective Parameter Machine Learning: Chenhao Tan Boulder 16 of 27

33 Gradient Descent Choosing Step Size Objective Parameter Machine Learning: Chenhao Tan Boulder 16 of 27

34 Gradient Descent Choosing Step Size Objective Parameter Machine Learning: Chenhao Tan Boulder 16 of 27

35 Gradient Descent Remaining issues When to stop? What if β keeps getting bigger? Machine Learning: Chenhao Tan Boulder 17 of 27

36 Gradient Descent Regularized Conditional Log Likelihood Unregularized Regularized [ ] β = arg min ln p(y (j) x (j), β) β [ ] β = arg min ln p(y (j) x (j), β) + 1 β 2 µ i (11) β 2 i (12) Machine Learning: Chenhao Tan Boulder 18 of 27

37 Gradient Descent Regularized Conditional Log Likelihood Unregularized Regularized [ ] β = arg min ln p(y (j) x (j), β) β [ ] β = arg min ln p(y (j) x (j), β) + 1 β 2 µ i (11) β 2 i (12) µ is regularization parameter that trades off between likelihood and having small parameters Machine Learning: Chenhao Tan Boulder 18 of 27

38 Outline Objective function Gradient Descent Stochastic Gradient Descent Machine Learning: Chenhao Tan Boulder 19 of 27

39 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming Machine Learning: Chenhao Tan Boulder 20 of 27

40 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming Hard to compute true gradient L (β) E x [ L (β, x)] (13) Average over all observations Machine Learning: Chenhao Tan Boulder 20 of 27

41 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming Hard to compute true gradient L (β) E x [ L (β, x)] (13) Average over all observations What if we compute an update just from one observation? Machine Learning: Chenhao Tan Boulder 20 of 27

42 Stochastic Gradient Descent Getting to Union Station Pretend it s a pre-smartphone world and you want to get to Union Station Machine Learning: Chenhao Tan Boulder 21 of 27

43 Stochastic Gradient for Regularized Regression L = log p(y x; β) µ j β 2 j (14) Machine Learning: Chenhao Tan Boulder 22 of 27

44 Stochastic Gradient for Regularized Regression L = log p(y x; β) µ j β 2 j (14) Taking the derivative (with respect to example x i ) L β j = (y i π i )x j + µβ j (15) Machine Learning: Chenhao Tan Boulder 22 of 27

45 Stochastic Gradient for Logistic Regression Given a single observation x i chosen at random from the dataset, β j β j η ( µβ j x ij [y i π i ] ) (16) Machine Learning: Chenhao Tan Boulder 23 of 27

46 Example Documents β[j] =β[j] + η(y i π i )x i β = β bias = 0, β A = 0, β B = 0, β C = 0, β D = 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D You first see the positive example. First, compute π 1 Machine Learning: Chenhao Tan Boulder 24 of 27

47 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D You first see the positive example. First, compute π 1 π 1 = Pr(y 1 = 1 x 1 ) = exp βt x i 1+exp β T x i = Machine Learning: Chenhao Tan Boulder 24 of 27

48 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D You first see the positive example. First, compute π 1 π 1 = Pr(y 1 = 1 x 1 ) = exp βt x i 1+exp β T x i = exp 0 exp 0+1 = 0.5 Machine Learning: Chenhao Tan Boulder 24 of 27

49 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D π 1 = 0.5 What s the update for β bias? Machine Learning: Chenhao Tan Boulder 24 of 27

50 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β bias? β bias = β bias + η (y 1 π 1 ) x 1,bias = ( ) 1.0 Machine Learning: Chenhao Tan Boulder 24 of 27

51 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β bias? β bias = β bias + η (y 1 π 1 ) x 1,bias = ( ) 1.0 =0.5 Machine Learning: Chenhao Tan Boulder 24 of 27

52 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? Machine Learning: Chenhao Tan Boulder 24 of 27

53 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? β A = β A + η (y 1 π 1 ) x 1,A = ( ) 4.0 Machine Learning: Chenhao Tan Boulder 24 of 27

54 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? β A = β A + η (y 1 π 1 ) x 1,A = ( ) 4.0 =2.0 Machine Learning: Chenhao Tan Boulder 24 of 27

55 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? Machine Learning: Chenhao Tan Boulder 24 of 27

56 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? β B = β B + η (y 1 π 1 ) x 1,B = ( ) 3.0 Machine Learning: Chenhao Tan Boulder 24 of 27

57 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? β B = β B + η (y 1 π 1 ) x 1,B = ( ) 3.0 =1.5 Machine Learning: Chenhao Tan Boulder 24 of 27

58 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? Machine Learning: Chenhao Tan Boulder 24 of 27

59 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? β C = β C + η (y 1 π 1 ) x 1,C = ( ) 1.0 Machine Learning: Chenhao Tan Boulder 24 of 27

60 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? β C = β C + η (y 1 π 1 ) x 1,C = ( ) 1.0 =0.5 Machine Learning: Chenhao Tan Boulder 24 of 27

61 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? Machine Learning: Chenhao Tan Boulder 24 of 27

62 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? β D = β D + η (y 1 π 1 ) x 1,D = ( ) 0.0 Machine Learning: Chenhao Tan Boulder 24 of 27

63 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? β D = β D + η (y 1 π 1 ) x 1,D = ( ) 0.0 =0.0 Machine Learning: Chenhao Tan Boulder 24 of 27

64 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D Now you see the negative example. What s π 2? Machine Learning: Chenhao Tan Boulder 24 of 27

65 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D Now you see the negative example. What s π 2? π 2 = Pr(y 2 = 1 x 2 ) = exp βt x i 1+exp β T x i = exp{ } exp{ }+1 = Machine Learning: Chenhao Tan Boulder 24 of 27

66 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D Now you see the negative example. What s π 2? π 2 = Pr(y 2 = 1 x 2 ) = exp βt x i 1+exp β T x i = exp{ } exp{ }+1 = 0.97 Machine Learning: Chenhao Tan Boulder 24 of 27

67 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D Now you see the negative example. What s π 2? π 2 = 0.97 What s the update for β bias? Machine Learning: Chenhao Tan Boulder 24 of 27

68 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β bias? β bias = β bias + η (y 2 π 2 ) x 2,bias = ( ) 1.0 Machine Learning: Chenhao Tan Boulder 24 of 27

69 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β bias? β bias = β bias + η (y 2 π 2 ) x 2,bias = ( ) 1.0 =-0.47 Machine Learning: Chenhao Tan Boulder 24 of 27

70 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? Machine Learning: Chenhao Tan Boulder 24 of 27

71 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? β A = β A + η (y 2 π 2 ) x 2,A = ( ) 0.0 Machine Learning: Chenhao Tan Boulder 24 of 27

72 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? β A = β A + η (y 2 π 2 ) x 2,A = ( ) 0.0 =2.0 Machine Learning: Chenhao Tan Boulder 24 of 27

73 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? Machine Learning: Chenhao Tan Boulder 24 of 27

74 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? β B = β B + η (y 2 π 2 ) x 2,B = ( ) 1.0 Machine Learning: Chenhao Tan Boulder 24 of 27

75 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? β B = β B + η (y 2 π 2 ) x 2,B = ( ) 1.0 =0.53 Machine Learning: Chenhao Tan Boulder 24 of 27

76 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? Machine Learning: Chenhao Tan Boulder 24 of 27

77 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? β C = β C + η (y 2 π 2 ) x 2,C = ( ) 3.0 Machine Learning: Chenhao Tan Boulder 24 of 27

78 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? β C = β C + η (y 2 π 2 ) x 2,C = ( ) 3.0 =-2.41 Machine Learning: Chenhao Tan Boulder 24 of 27

79 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? Machine Learning: Chenhao Tan Boulder 24 of 27

80 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? β D = β D + η (y 2 π 2 ) x 2,D = ( ) 4.0 Machine Learning: Chenhao Tan Boulder 24 of 27

81 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? β D = β D + η (y 2 π 2 ) x 2,D = ( ) 4.0 =-3.88 Machine Learning: Chenhao Tan Boulder 24 of 27

82 Algorithm 1. Initialize a vector β to be all zeros 2. For t = 1,..., T For each example x i, y i and feature j: Compute π i Pr(y i = 1 x i) Set β j = β j η(µβ j (y i π i)x i) 3. Output the parameters β 1,..., β d. Machine Learning: Chenhao Tan Boulder 25 of 27

83 Algorithm 1. Initialize a vector β to be all zeros 2. For t = 1,..., T For each example x i, y i and feature j: Compute π i Pr(y i = 1 x i) Set β j = β j η(µβ j (y i π i)x i) 3. Output the parameters β 1,..., β d. Any issues? Machine Learning: Chenhao Tan Boulder 25 of 27

84 Algorithm 1. Initialize a vector β to be all zeros 2. For t = 1,..., T For each example x i, y i and feature j: Compute π i Pr(y i = 1 x i) Set β j = β j η(µβ j (y i π i)x i) 3. Output the parameters β 1,..., β d. Any issues? Inefficiency due to updates on every j even if x ij = 0. Lazy sparse updates in the readings. Machine Learning: Chenhao Tan Boulder 25 of 27

85 Proofs about Stochastic Gradient Depends on convexity of objective and how close ɛ you want to get to actual answer Best bounds depend on changing η over time and per dimension (not all features created equal) Machine Learning: Chenhao Tan Boulder 26 of 27

86 Running time Number of iterations to get to accuracy: L (β) L (β ) ɛ Gradient descent: If L is strongly convex: O(log(1/ɛ)) iterations Stochastic gradient descent: If L is strongly convex: O(1/ɛ) iterations Machine Learning: Chenhao Tan Boulder 27 of 27

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction

More information

Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:

More information

Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Classification: Logistic Regression from Data Machine Learning: Alvin Grissom II University of Colorado Boulder Slides adapted from Emily Fox Machine Learning: Alvin Grissom II Boulder Classification:

More information

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Logistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE

Logistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE Logistic Regression Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE Introduction to Data Science Algorithms Boyd-Graber and Paul Logistic

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Logistic Regression. Stochastic Gradient Descent

Logistic Regression. Stochastic Gradient Descent Tutorial 8 CPSC 340 Logistic Regression Stochastic Gradient Descent Logistic Regression Model A discriminative probabilistic model for classification e.g. spam filtering Let x R d be input and y { 1, 1}

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning Gaussian Naïve Bayes Big Picture Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie Solving Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 12 Slides adapted from Matt Nedrich and Trevor Hastie Jordan Boyd-Graber Boulder Solving Regression 1 of 17 Roadmap We talked

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Elman Mansimov 1 September 24, 2015 1 Modified based on Shenlong Wang s and Jake Snell s tutorials, with additional contents borrowed from Kevin Swersky and Jasper Snoek

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Variations of Logistic Regression with Stochastic Gradient Descent

Variations of Logistic Regression with Stochastic Gradient Descent Variations of Logistic Regression with Stochastic Gradient Descent Panqu Wang(pawang@ucsd.edu) Phuc Xuan Nguyen(pxn002@ucsd.edu) January 26, 2012 Abstract In this paper, we extend the traditional logistic

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan Boulder 1 of 39 HW1 turned in HW2 released Office

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Logistic Regression Trained with Different Loss Functions. Discussion

Logistic Regression Trained with Different Loss Functions. Discussion Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Lecture 2: Logistic Regression and Neural Networks

Lecture 2: Logistic Regression and Neural Networks 1/23 Lecture 2: and Neural Networks Pedro Savarese TTI 2018 2/23 Table of Contents 1 2 3 4 3/23 Naive Bayes Learn p(x, y) = p(y)p(x y) Training: Maximum Likelihood Estimation Issues? Why learn p(x, y)

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Deconstructing Data Science

Deconstructing Data Science Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 9: Logistic regression Feb 22, 2016 Generative vs. Discriminative models Generative models specify a joint distribution over the labels

More information

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48 Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University, 1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,

More information

Classification Based on Probability

Classification Based on Probability Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely

More information

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Perceptron (Theory) + Linear Regression

Perceptron (Theory) + Linear Regression 10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods

Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 12 Announcements HW3 due date as posted. make sure

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information