Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE

Size: px

Start display at page:

Download "Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE"

Myrtle James
5 years ago
Views:

1 Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 1 of 5

2 What are we talking about? Statistical classification: p(y x) y is typically a Bernoulli or multinomial outcome Classification uses: ad placement, spam detection Building block of other machine learning methods INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 2 of 5

3 Logistic Regression: Definition Weight vector β i Observations X i Bias β 0 (like intercept in linear regression) 1 P(Y = 0 X) = 1 + exp β 0 + i β (1) ix i P(Y = 1 X) = exp β 0 + i β ix i 1 + exp β 0 + i β (2) ix i For shorthand, we ll say that P(Y = 0 X) = σ( (β 0 + β i X i )) (3) Where σ(z) = i P(Y = 1 X) = 1 σ( (β 0 + β i X i )) (4) 1 1+exp[ z] INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 3 of 5 i

4 What s this exp doing? Exponential exp[x] is shorthand for e x e is a special number, about Logistic e x is the limit of compound interest formula as compounds become infinitely small It s the function whose derivative is itself The logistic function is σ(z) = 1 1+e z Looks like an S Always between 0 and 1. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 5

What s this exp doing? Exponential exp[x] is shorthand for e x e is a special number, about 2.

5 What s this exp doing? Exponential exp[x] is shorthand for e x e is a special number, about Logistic e x is the limit of compound interest formula as compounds become infinitely small It s the function whose derivative is itself The logistic function is σ(z) = 1 1+e z Looks like an S Always between 0 and 1. Allows us to model probabilities Different from linear regression INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 5

6 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β Example 1: Empty Document? X = {} What does Y = 1 mean? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

7 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β Example 1: Empty Document? X = {} P(Y = 0) = P(Y = 1) = 1 1+exp[0.1] = exp[0.1] 1+exp[0.1] = What does Y = 1 mean? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

8 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β What does Y = 1 mean? Example 1: Empty Document? X = {} P(Y = 0) = 1 1+exp[0.1] = 0.48 P(Y = 1) = exp[0.1] 1+exp[0.1] = 0.52 Bias β 0 encodes the prior probability of a class INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

9 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β Example 2 X = {Mother, Nigeria} What does Y = 1 mean? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

10 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β What does Y = 1 mean? Example 2 X = {Mother, Nigeria} P(Y = 0) = P(Y = 1) = 1 1+exp[ ] = exp[ ] 1+exp[ ] = Include bias, and sum the other weights INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

11 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β What does Y = 1 mean? Example 2 X = {Mother, Nigeria} P(Y = 0) = 0.11 P(Y = 1) = exp[ ] = exp[ ] 1+exp[ ] = Include bias, and sum the other weights INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

12 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β Example 3 X = {Mother, Work, Viagra, Mother} What does Y = 1 mean? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

13 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β What does Y = 1 mean? Example 3 X = {Mother, Work, Viagra, Mother} P(Y = 0) = 1 1+exp[ ] = P(Y = 1) = exp[ ] 1+exp[ ] = Multiply feature presence by weight INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

14 Logistic Regression Example feature coefficient weight bias β viagra β mother β work β nigeria β What does Y = 1 mean? Example 3 X = {Mother, Work, Viagra, Mother} P(Y = 0) = 1 1+exp[ ] = 0.60 P(Y = 1) = exp[ ] 1+exp[ ] = 0.30 Multiply feature presence by weight INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 5

15 Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ABC INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 1 of 4

16 Logistic Regression: Objective Function l lnp(y X,β) = lnp(y (j) x (j),β) (1) j = y β (j) 0 + β i x (j) i ln 1 + exp β 0 + j i i β i x (j) i (2) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 2 of 4

17 Logistic Regression: Objective Function l lnp(y X,β) = lnp(y (j) x (j),β) (1) j = y β (j) 0 + β i x (j) i ln 1 + exp β 0 + j i i β i x (j) i Training data (y,x) are fixed. Objective function is a function of β... what values of β give a good value. (2) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 2 of 4

18 Convexity Convex function Doesn t matter where you start, if you walk up objective INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 3 of 4

19 Convexity Convex function Doesn t matter where you start, if you walk up objective Gradient! INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 3 of 4

20 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

21 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

22 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country 0 Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

23 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country 0 Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

24 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country 1 0 Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

25 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country 1 0 Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

26 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

27 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

28 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

29 Gradient Ascent (non-convex) Goal Optimize log likelihood with respect to variables β Objective Undiscovered Country Parameter Luckily, (vanilla) logistic regression is convex INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 4

30 Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM WILLIAM COHEN INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 1 of 10

31 Gradient for Logistic Regression To ease notation, let s define π i = expβt x i 1 + expβ T x i (1) Our objective function is logπ i if y i = 1 l = logp(y i x i ) = l i = log(1 π i ) if y i = 0 i i i (2) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 2 of 10

32 Taking the Derivative Apply chain rule: l β j = i l i ( β) = β j i 1 πi π i β if y j i = π i π i β if y j i = 0 (3) If we plug in the derivative, we can merge these two cases π i β j = π i (1 π i )x j, (4) l i β j = (y i π i )x j. (5) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 3 of 10

33 Gradient for Logistic Regression Gradient β l( β) = l( β) β 0,..., l( β) β n (6) Update β η β l( β) (7) β i β i + η l( β) β i (8) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 10

34 Gradient for Logistic Regression Gradient β l( β) = l( β) β 0,..., l( β) β n (6) Update β η β l( β) (7) β i β i + η l( β) β i (8) Why are we adding? What would well do if we wanted to do descent? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 10

35 Gradient for Logistic Regression Gradient β l( β) = l( β) β 0,..., l( β) β n (6) Update η: step size, must be greater than zero β η β l( β) (7) β i β i + η l( β) β i (8) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 10

36 Gradient for Logistic Regression Gradient β l( β) = l( β) β 0,..., l( β) β n (6) Update β η β l( β) (7) β i β i + η l( β) β i (8) NB: Conjugate gradient is usually better, but harder to implement INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 4 of 10

37 Choosing Step Size Objective Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 10

38 Choosing Step Size Objective Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 10

39 Choosing Step Size Objective Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 10

40 Choosing Step Size Objective Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 10

41 Choosing Step Size Objective Parameter INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 5 of 10

42 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 6 of 10

43 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming Hard to compute true gradient l(β) x [ l(β,x)] (9) Average over all observations INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 6 of 10

44 Approximating the Gradient Our datasets are big (to fit into memory)... or data are changing / streaming Hard to compute true gradient l(β) x [ l(β,x)] (9) Average over all observations What if we compute an update just from one observation? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 6 of 10

45 Getting to Union Station Pretend it s a pre-smartphone world and you want to get to Union Station INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 7 of 10

46 Stochastic Gradient for Logistic Regression Given a single observation x i chosen at random from the dataset, β j β j + η[y i π i ]x i,j (10) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 8 of 10

47 Stochastic Gradient for Logistic Regression Given a single observation x i chosen at random from the dataset, Examples in class. β j β j + η[y i π i ]x i,j (10) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 8 of 10

48 Algorithm 1 Initialize a vector B to be all zeros 2 For t = 1,...,T For each example x i,y i and feature j: Compute π i Pr(y i = 1 x i ) Set β[j] = β[j] + λ(y i π i )x i 3 Output the parameters β 1,...,β d. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 9 of 10

49 Wrapup Logistic Regression: Regression for outputting Probabilities Intuitions similar to linear regression We ll talk about feature engineering for both next time INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression 10 of 10

Introduction to Machine Learning

Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction