Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
|
|
- Grace Penelope Taylor
- 6 years ago
- Views:
Transcription
1 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27
2 Quiz question For a test instance (x, y) and a naïve Bayes classifier ˆP, which of the following statements is true? (A) c ˆP(y = c x) = 1 (B) c ˆP(x c) = 1 (C) c ˆP(x c)ˆp(c) = 1 Machine Learning: Chenhao Tan Boulder 2 of 27
3 Overview Objective function Gradient Descent Stochastic Gradient Descent Machine Learning: Chenhao Tan Boulder 3 of 27
4 Reminder: Logistic Regression 1 P(Y = 0 X) = 1 + exp [β 0 + i β ix i ] P(Y = 1 X) = exp [β 0 + i β ix i ] 1 + exp [β 0 + i β ix i ] (1) (2) Discriminative prediction: P(y x) Classification uses: sentiment analysis, spam detection What we didn t talk about is how to learn β from data Machine Learning: Chenhao Tan Boulder 4 of 27
5 Objective function Outline Objective function Gradient Descent Stochastic Gradient Descent Machine Learning: Chenhao Tan Boulder 5 of 27
6 Objective function Logistic Regression: Objective Function Maximize likelihood Obj log P(Y X, β) = j = j log P(y (j) x (j), β)) y (j) ( β 0 + i β i x (j) i ) log [ 1 + exp ( β 0 + i β i x (j) i )] Machine Learning: Chenhao Tan Boulder 6 of 27
7 Objective function Logistic Regression: Objective Function Minimize negative log likelihood (loss) L log P(Y X, β) = j = j log P(y (j) x (j), β)) y (j) ( β 0 + i β i x (j) i ) + log [ 1 + exp ( β 0 + i β i x (j) i )] Machine Learning: Chenhao Tan Boulder 7 of 27
8 Objective function Logistic Regression: Objective Function Minimize negative log likelihood (loss) L log P(Y X, β) = j = j log P(y (j) x (j), β)) y (j) ( β 0 + i β i x (j) i ) + log [ 1 + exp ( β 0 + i β i x (j) i )] Training data {(x, y)} are fixed. Objective function is a function of β... what values of β give a good value? Machine Learning: Chenhao Tan Boulder 7 of 27
9 Objective function Logistic Regression: Objective Function Minimize negative log likelihood (loss) L log P(Y X, β) = j = j log P(y (j) x (j), β)) y (j) ( β 0 + i β i x (j) i ) + log [ 1 + exp ( β 0 + i β i x (j) i )] Training data {(x, y)} are fixed. Objective function is a function of β... what values of β give a good value? β = arg min L (β) β Machine Learning: Chenhao Tan Boulder 7 of 27
10 Objective function Convexity L (β) is convex for logistic regression. Proof. Logistic loss yv + log(1 + exp(v)) is convex. Composition with linear function maintains convexity. Sum of convex functions is convex. Machine Learning: Chenhao Tan Boulder 8 of 27
11 Gradient Descent Outline Objective function Gradient Descent Stochastic Gradient Descent Machine Learning: Chenhao Tan Boulder 9 of 27
12 Gradient Descent Convexity Convex function Doesn t matter where you start, if you go down along the gradient Machine Learning: Chenhao Tan Boulder 10 of 27
13 Gradient Descent Convexity Convex function Doesn t matter where you start, if you go down along the gradient Gradient! Machine Learning: Chenhao Tan Boulder 10 of 27
14 Gradient Descent Convexity It would have been much harder if this is not convex. Machine Learning: Chenhao Tan Boulder 11 of 27
15 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Machine Learning: Chenhao Tan Boulder 12 of 27
16 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27
17 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective 0 Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27
18 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective 0 Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27
19 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective 0 1 Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27
20 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective 0 1 Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27
21 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27
22 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27
23 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β Objective Parameter Undiscovered Country Machine Learning: Chenhao Tan Boulder 12 of 27
24 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β β l+1 j = β l j η L β j Machine Learning: Chenhao Tan Boulder 12 of 27
25 Gradient Descent Gradient Descent (non-convex) Goal Optimize loss function with respect to variables β β l+1 j = β l j η L β j Luckily, (vanilla) logistic regression is convex Machine Learning: Chenhao Tan Boulder 12 of 27
26 Gradient Descent Gradient for Logistic Regression To ease notation, let s define π i = exp βt x i 1 + exp β T x i (3) Our objective function is L = i log p(y i x i ) = i L i = i { log π i if y i = 1 log(1 π i ) if y i = 0 (4) Machine Learning: Chenhao Tan Boulder 13 of 27
27 Gradient Descent Taking the Derivative Apply chain rule: L β j = i L i ( β) β j = i 1 π i π i ( ) 1 1 π i π i β j β j if y i = 1 if y i = 0 (5) If we plug in the derivative, we can merge these two cases π i β j = π i (1 π i )x j, (6) L i β j = (y i π i )x j. (7) Machine Learning: Chenhao Tan Boulder 14 of 27
28 Gradient Descent Gradient for Logistic Regression Gradient Update β L ( β) = [ L ( β) ],..., L ( β) β 0 β n (8) β η β L ( β) (9) β i β i η L ( β) β i (10) Machine Learning: Chenhao Tan Boulder 15 of 27
29 Gradient Descent Gradient for Logistic Regression Gradient Update β L ( β) = [ L ( β) ],..., L ( β) β 0 β n (8) β η β L ( β) (9) β i β i η L ( β) β i (10) Why are we subtracting? What would we do if we wanted to do ascent? Machine Learning: Chenhao Tan Boulder 15 of 27
30 Gradient Descent Gradient for Logistic Regression Gradient Update β L ( β) = [ L ( β) ],..., L ( β) β 0 β n (8) η: step size, must be greater than zero β η β L ( β) (9) β i β i η L ( β) β i (10) Machine Learning: Chenhao Tan Boulder 15 of 27
31 Gradient Descent Choosing Step Size Objective Parameter Machine Learning: Chenhao Tan Boulder 16 of 27
32 Gradient Descent Choosing Step Size Objective Parameter Machine Learning: Chenhao Tan Boulder 16 of 27
33 Gradient Descent Choosing Step Size Objective Parameter Machine Learning: Chenhao Tan Boulder 16 of 27
34 Gradient Descent Choosing Step Size Objective Parameter Machine Learning: Chenhao Tan Boulder 16 of 27
35 Gradient Descent Remaining issues When to stop? What if β keeps getting bigger? Machine Learning: Chenhao Tan Boulder 17 of 27
36 Gradient Descent Regularized Conditional Log Likelihood Unregularized Regularized [ ] β = arg min ln p(y (j) x (j), β) β [ ] β = arg min ln p(y (j) x (j), β) + 1 β 2 µ i (11) β 2 i (12) Machine Learning: Chenhao Tan Boulder 18 of 27
37 Gradient Descent Regularized Conditional Log Likelihood Unregularized Regularized [ ] β = arg min ln p(y (j) x (j), β) β [ ] β = arg min ln p(y (j) x (j), β) + 1 β 2 µ i (11) β 2 i (12) µ is regularization parameter that trades off between likelihood and having small parameters Machine Learning: Chenhao Tan Boulder 18 of 27
38 Outline Objective function Gradient Descent Stochastic Gradient Descent Machine Learning: Chenhao Tan Boulder 19 of 27
39 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming Machine Learning: Chenhao Tan Boulder 20 of 27
40 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming Hard to compute true gradient L (β) E x [ L (β, x)] (13) Average over all observations Machine Learning: Chenhao Tan Boulder 20 of 27
41 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming Hard to compute true gradient L (β) E x [ L (β, x)] (13) Average over all observations What if we compute an update just from one observation? Machine Learning: Chenhao Tan Boulder 20 of 27
42 Stochastic Gradient Descent Getting to Union Station Pretend it s a pre-smartphone world and you want to get to Union Station Machine Learning: Chenhao Tan Boulder 21 of 27
43 Stochastic Gradient for Regularized Regression L = log p(y x; β) µ j β 2 j (14) Machine Learning: Chenhao Tan Boulder 22 of 27
44 Stochastic Gradient for Regularized Regression L = log p(y x; β) µ j β 2 j (14) Taking the derivative (with respect to example x i ) L β j = (y i π i )x j + µβ j (15) Machine Learning: Chenhao Tan Boulder 22 of 27
45 Stochastic Gradient for Logistic Regression Given a single observation x i chosen at random from the dataset, β j β j η ( µβ j x ij [y i π i ] ) (16) Machine Learning: Chenhao Tan Boulder 23 of 27
46 Example Documents β[j] =β[j] + η(y i π i )x i β = β bias = 0, β A = 0, β B = 0, β C = 0, β D = 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D You first see the positive example. First, compute π 1 Machine Learning: Chenhao Tan Boulder 24 of 27
47 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D You first see the positive example. First, compute π 1 π 1 = Pr(y 1 = 1 x 1 ) = exp βt x i 1+exp β T x i = Machine Learning: Chenhao Tan Boulder 24 of 27
48 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D You first see the positive example. First, compute π 1 π 1 = Pr(y 1 = 1 x 1 ) = exp βt x i 1+exp β T x i = exp 0 exp 0+1 = 0.5 Machine Learning: Chenhao Tan Boulder 24 of 27
49 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D π 1 = 0.5 What s the update for β bias? Machine Learning: Chenhao Tan Boulder 24 of 27
50 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β bias? β bias = β bias + η (y 1 π 1 ) x 1,bias = ( ) 1.0 Machine Learning: Chenhao Tan Boulder 24 of 27
51 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β bias? β bias = β bias + η (y 1 π 1 ) x 1,bias = ( ) 1.0 =0.5 Machine Learning: Chenhao Tan Boulder 24 of 27
52 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? Machine Learning: Chenhao Tan Boulder 24 of 27
53 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? β A = β A + η (y 1 π 1 ) x 1,A = ( ) 4.0 Machine Learning: Chenhao Tan Boulder 24 of 27
54 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? β A = β A + η (y 1 π 1 ) x 1,A = ( ) 4.0 =2.0 Machine Learning: Chenhao Tan Boulder 24 of 27
55 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? Machine Learning: Chenhao Tan Boulder 24 of 27
56 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? β B = β B + η (y 1 π 1 ) x 1,B = ( ) 3.0 Machine Learning: Chenhao Tan Boulder 24 of 27
57 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? β B = β B + η (y 1 π 1 ) x 1,B = ( ) 3.0 =1.5 Machine Learning: Chenhao Tan Boulder 24 of 27
58 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? Machine Learning: Chenhao Tan Boulder 24 of 27
59 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? β C = β C + η (y 1 π 1 ) x 1,C = ( ) 1.0 Machine Learning: Chenhao Tan Boulder 24 of 27
60 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? β C = β C + η (y 1 π 1 ) x 1,C = ( ) 1.0 =0.5 Machine Learning: Chenhao Tan Boulder 24 of 27
61 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? Machine Learning: Chenhao Tan Boulder 24 of 27
62 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? β D = β D + η (y 1 π 1 ) x 1,D = ( ) 0.0 Machine Learning: Chenhao Tan Boulder 24 of 27
63 Example Documents β[j] =β[j] + η(y i π i )x i β = 0, 0, 0, 0, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? β D = β D + η (y 1 π 1 ) x 1,D = ( ) 0.0 =0.0 Machine Learning: Chenhao Tan Boulder 24 of 27
64 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D Now you see the negative example. What s π 2? Machine Learning: Chenhao Tan Boulder 24 of 27
65 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D Now you see the negative example. What s π 2? π 2 = Pr(y 2 = 1 x 2 ) = exp βt x i 1+exp β T x i = exp{ } exp{ }+1 = Machine Learning: Chenhao Tan Boulder 24 of 27
66 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D Now you see the negative example. What s π 2? π 2 = Pr(y 2 = 1 x 2 ) = exp βt x i 1+exp β T x i = exp{ } exp{ }+1 = 0.97 Machine Learning: Chenhao Tan Boulder 24 of 27
67 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D Now you see the negative example. What s π 2? π 2 = 0.97 What s the update for β bias? Machine Learning: Chenhao Tan Boulder 24 of 27
68 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β bias? β bias = β bias + η (y 2 π 2 ) x 2,bias = ( ) 1.0 Machine Learning: Chenhao Tan Boulder 24 of 27
69 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β bias? β bias = β bias + η (y 2 π 2 ) x 2,bias = ( ) 1.0 =-0.47 Machine Learning: Chenhao Tan Boulder 24 of 27
70 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? Machine Learning: Chenhao Tan Boulder 24 of 27
71 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? β A = β A + η (y 2 π 2 ) x 2,A = ( ) 0.0 Machine Learning: Chenhao Tan Boulder 24 of 27
72 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β A? β A = β A + η (y 2 π 2 ) x 2,A = ( ) 0.0 =2.0 Machine Learning: Chenhao Tan Boulder 24 of 27
73 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? Machine Learning: Chenhao Tan Boulder 24 of 27
74 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? β B = β B + η (y 2 π 2 ) x 2,B = ( ) 1.0 Machine Learning: Chenhao Tan Boulder 24 of 27
75 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β B? β B = β B + η (y 2 π 2 ) x 2,B = ( ) 1.0 =0.53 Machine Learning: Chenhao Tan Boulder 24 of 27
76 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? Machine Learning: Chenhao Tan Boulder 24 of 27
77 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? β C = β C + η (y 2 π 2 ) x 2,C = ( ) 3.0 Machine Learning: Chenhao Tan Boulder 24 of 27
78 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β C? β C = β C + η (y 2 π 2 ) x 2,C = ( ) 3.0 =-2.41 Machine Learning: Chenhao Tan Boulder 24 of 27
79 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? Machine Learning: Chenhao Tan Boulder 24 of 27
80 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? β D = β D + η (y 2 π 2 ) x 2,D = ( ) 4.0 Machine Learning: Chenhao Tan Boulder 24 of 27
81 Example Documents β[j] =β[j] + η(y i π i )x i β =.5, 2, 1.5, 0.5, 0 y 1 =1 A A A A B B B C (Assume step size η = 1.0.) y 2 =0 B C C C D D D D What s the update for β D? β D = β D + η (y 2 π 2 ) x 2,D = ( ) 4.0 =-3.88 Machine Learning: Chenhao Tan Boulder 24 of 27
82 Algorithm 1. Initialize a vector β to be all zeros 2. For t = 1,..., T For each example x i, y i and feature j: Compute π i Pr(y i = 1 x i) Set β j = β j η(µβ j (y i π i)x i) 3. Output the parameters β 1,..., β d. Machine Learning: Chenhao Tan Boulder 25 of 27
83 Algorithm 1. Initialize a vector β to be all zeros 2. For t = 1,..., T For each example x i, y i and feature j: Compute π i Pr(y i = 1 x i) Set β j = β j η(µβ j (y i π i)x i) 3. Output the parameters β 1,..., β d. Any issues? Machine Learning: Chenhao Tan Boulder 25 of 27
84 Algorithm 1. Initialize a vector β to be all zeros 2. For t = 1,..., T For each example x i, y i and feature j: Compute π i Pr(y i = 1 x i) Set β j = β j η(µβ j (y i π i)x i) 3. Output the parameters β 1,..., β d. Any issues? Inefficiency due to updates on every j even if x ij = 0. Lazy sparse updates in the readings. Machine Learning: Chenhao Tan Boulder 25 of 27
85 Proofs about Stochastic Gradient Depends on convexity of objective and how close ɛ you want to get to actual answer Best bounds depend on changing η over time and per dimension (not all features created equal) Machine Learning: Chenhao Tan Boulder 26 of 27
86 Running time Number of iterations to get to accuracy: L (β) L (β ) ɛ Gradient descent: If L is strongly convex: O(log(1/ɛ)) iterations Stochastic gradient descent: If L is strongly convex: O(1/ɛ) iterations Machine Learning: Chenhao Tan Boulder 27 of 27
Introduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationClassification: Logistic Regression from Data
Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationClassification: Logistic Regression from Data
Classification: Logistic Regression from Data Machine Learning: Alvin Grissom II University of Colorado Boulder Slides adapted from Emily Fox Machine Learning: Alvin Grissom II Boulder Classification:
More informationLogistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE
Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLogistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE
Logistic Regression Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE Introduction to Data Science Algorithms Boyd-Graber and Paul Logistic
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationLogistic Regression. Stochastic Gradient Descent
Tutorial 8 CPSC 340 Logistic Regression Stochastic Gradient Descent Logistic Regression Model A discriminative probabilistic model for classification e.g. spam filtering Let x R d be input and y { 1, 1}
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationSolving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie
Solving Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 12 Slides adapted from Matt Nedrich and Trevor Hastie Jordan Boyd-Graber Boulder Solving Regression 1 of 17 Roadmap We talked
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline
More informationOptimization for Machine Learning
Optimization for Machine Learning Elman Mansimov 1 September 24, 2015 1 Modified based on Shenlong Wang s and Jake Snell s tutorials, with additional contents borrowed from Kevin Swersky and Jasper Snoek
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationVariations of Logistic Regression with Stochastic Gradient Descent
Variations of Logistic Regression with Stochastic Gradient Descent Panqu Wang(pawang@ucsd.edu) Phuc Xuan Nguyen(pxn002@ucsd.edu) January 26, 2012 Abstract In this paper, we extend the traditional logistic
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan Boulder 1 of 39 HW1 turned in HW2 released Office
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationOptimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade
Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationLecture 2: Logistic Regression and Neural Networks
1/23 Lecture 2: and Neural Networks Pedro Savarese TTI 2018 2/23 Table of Contents 1 2 3 4 3/23 Naive Bayes Learn p(x, y) = p(y)p(x y) Training: Maximum Likelihood Estimation Issues? Why learn p(x, y)
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationDeconstructing Data Science
Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 9: Logistic regression Feb 22, 2016 Generative vs. Discriminative models Generative models specify a joint distribution over the labels
More informationLogistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationLogistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48
Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationApplied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson
Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationLecture 3 - Linear and Logistic Regression
3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMachine Learning (CSE 446): Multi-Class Classification; Kernel Methods
Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 12 Announcements HW3 due date as posted. make sure
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More information