Maximum Likelihood Estimation

Size: px

Start display at page:

Download "Maximum Likelihood Estimation"

Myles Harris
6 years ago
Views:

1 Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9

2 Why MLE? Before: Dstrbuton + Parameter x Now: x + Dstrbuton Parameter (Much more realstc) But: Says nothng about how good a ft a dstrbuton s INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 9

3 Lkelhood Lkelhood s p;θ ) We want estmate of θ that best explans data we seen I.e., Maxmum Lkelhood Estmate (MLE) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 9

4 Lkelhood The lkelhood functon refers to the PMF (dscrete) or PDF (contnuous). For dscrete dstrbutons, the lkelhood of x s P(X = x). For contnuous dstrbutons, the lkelhood of x s the densty f). We wll often refer to lkelhood rather than probablty/mass/densty so that the term apples to ether scenaro. INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 9

5 Optmzng Unconstraned Functons Suppose we wanted to optmze l = x 2 2x + 2 (1) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 5 of 9

6 Optmzng Unconstraned Functons Suppose we wanted to optmze l = x 2 2x + 2 (1) l = 2x 2 (2) x INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 5 of 9

7 Optmzng Unconstraned Functons l =0 x (3) 2x 2 =0 (4) x = 1 (5) (Should also check that second dervatve s negatve) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 6 of 9

8 Optmzng Constraned Functons Theorem: Lagrange Multpler Method Gven functons f 1,...x n ) and g 1,...x n ), the crtcal ponts of f restrcted to the set g = 0 are solutons to equatons: f x 1,...x n ) =λ g x 1,...x n ) g 1,...x n ) = 0 Ths s n + 1 equatons n the n + 1 varables x 1,...x n,λ. INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 7 of 9

9 Lagrange Example Maxmze l,y) = xy subject to the constrant 20x + 10y = 200. Compute dervatves INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 8 of 9

10 Lagrange Example Maxmze l,y) = xy subject to the constrant 20x + 10y = 200. Compute dervatves l y x = 1 2 x l x y = 1 2 y g x = 20 g y = 10 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 8 of 9

11 Lagrange Example Maxmze l,y) = xy subject to the constrant 20x + 10y = 200. Compute dervatves l y x = 1 2 x l x y = 1 2 Create new systems of equatons y g x = 20 g y = 10 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 8 of 9

12 Lagrange Example Maxmze l,y) = xy subject to the constrant 20x + 10y = 200. Compute dervatves l y x = 1 2 x l x y = 1 2 Create new systems of equatons y 20x + 10y = 200 g x = 20 g y = 10 1 y 2 x = 20λ 1 x 2 y = 10λ INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 8 of 9

13 Lagrange Example Dvdng the frst equaton by the second gves us y x = 2 (6) whch means y = 2x, pluggng ths nto the constrant equaton gves: 20x + 10(2x) = 200 x = 5 y = 10 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 9 of 9

14 Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 4

15 Contnuous Dstrbuton: Gaussan Recall the densty functon 1 f) = exp µ)2 2πσ 2 2σ 2 (1) Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 4

16 Contnuous Dstrbuton: Gaussan Recall the densty functon 1 f) = exp µ)2 2πσ 2 2σ 2 (1) Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s l(µ,σ) N logσ N 2 log(2π) 1 2σ 2 µ) 2 (2) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 4

17 Contnuous Dstrbuton: Gaussan Recall the densty functon 1 f) = exp µ)2 2πσ 2 2σ 2 (1) Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s l(µ,σ) N logσ N 2 log(2π) 1 2σ 2 µ) 2 (2) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 4

18 Contnuous Dstrbuton: Gaussan Recall the densty functon 1 f) = exp µ)2 2πσ 2 2σ 2 (1) Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s l(µ,σ) N logσ N 2 log(2π) 1 2σ 2 µ) 2 (2) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 4

19 Contnuous Dstrbuton: Gaussan Recall the densty functon 1 f) = exp µ)2 2πσ 2 2σ 2 (1) Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s l(µ,σ) N logσ N 2 log(2π) 1 2σ 2 µ) 2 (2) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 4

20 MLE of Gaussan µ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (3) l µ =0 + 1 σ 2 µ) (4) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 4

21 MLE of Gaussan µ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (3) l µ =0 + 1 σ 2 µ) (4) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 4

22 MLE of Gaussan µ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (3) l µ =0 + 1 σ 2 µ) (4) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 4

23 MLE of Gaussan µ Solve for µ: l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (3) l µ =0 + 1 σ 2 µ) (4) 0 = 1 σ 2 µ) (5) 0 = x Nµ (6) µ = x N (7) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 4

24 MLE of Gaussan µ Solve for µ: l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (3) l µ =0 + 1 σ 2 µ) (4) 0 = 1 σ 2 µ) (5) 0 = x Nµ (6) µ = x N Consstent wth what we sad before (7) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 4

25 MLE of Gaussan σ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (8) l σ = N σ σ 3 µ) 2 (9) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 4

26 MLE of Gaussan σ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (8) l σ = N σ σ 3 µ) 2 (9) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 4

27 MLE of Gaussan σ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (8) l σ = N σ σ 3 µ) 2 (9) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 4

28 MLE of Gaussan σ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (8) l σ = N σ σ 3 µ) 2 (9) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 4

29 MLE of Gaussan σ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (8) l σ = N σ σ 3 µ) 2 (9) Solve for σ: 0 = n σ + 1 σ 3 µ) 2 (10) N σ = 1 σ 3 µ) 2 (11) σ 2 = µ) 2 (12) N INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 4

30 MLE of Gaussan σ l(µ,σ) = N logσ N 2 log(2π) 1 2σ 2 µ) 2 (8) l σ = N σ σ 3 µ) 2 (9) Solve for σ: 0 = n σ + 1 σ 3 µ) 2 (10) N σ = 1 σ 3 µ) 2 (11) σ 2 = µ) 2 (12) N Consstent wth what we sad before INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 4

31 Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 8

32 Optmzng Constraned Functons Theorem: Lagrange Multpler Method Gven functons f 1,...x n ) and g 1,...x n ), the crtcal ponts of f restrcted to the set g = 0 are solutons to equatons: f x 1,...x n ) =λ g x 1,...x n ) g 1,...x n ) = 0 Ths s n + 1 equatons n the n + 1 varables x 1,...x n,λ. INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 8

33 Lagrange Example Maxmze l,y) = xy subject to the constrant 20x + 10y = 200. Compute dervatves INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 8

34 Lagrange Example Maxmze l,y) = xy subject to the constrant 20x + 10y = 200. Compute dervatves l y x = 1 2 x l x y = 1 2 y g x = 20 g y = 10 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 8

35 Lagrange Example Maxmze l,y) = xy subject to the constrant 20x + 10y = 200. Compute dervatves l y x = 1 2 x l x y = 1 2 Create new systems of equatons y g x = 20 g y = 10 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 8

36 Lagrange Example Maxmze l,y) = xy subject to the constrant 20x + 10y = 200. Compute dervatves l y x = 1 2 x l x y = 1 2 Create new systems of equatons y 20x + 10y = 200 g x = 20 g y = 10 1 y 2 x = 20λ 1 x 2 y = 10λ INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 8

37 Lagrange Example Dvdng the frst equaton by the second gves us y x = 2 (1) whch means y = 2x, pluggng ths nto the constrant equaton gves: 20x + 10(2x) = 200 x = 5 y = 10 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 8

38 Dscrete Dstrbuton: Multnomal Recall the mass functon (N s total number of observatons, x s the number for each cell, θ probablty of cell) p( x θ ) = N! x θ x (2)! Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 5 of 8

39 Dscrete Dstrbuton: Multnomal Recall the mass functon (N s total number of observatons, x s the number for each cell, θ probablty of cell) p( x θ ) = N! x θ x (2)! Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s l( θ ) log(n!) log!) + x logθ (3) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 5 of 8

40 Dscrete Dstrbuton: Multnomal Recall the mass functon (N s total number of observatons, x s the number for each cell, θ probablty of cell) p( x θ ) = N! x θ x (2)! Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s l( θ ) log(n!) log!) + x logθ (3) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 5 of 8

41 Dscrete Dstrbuton: Multnomal Recall the mass functon (N s total number of observatons, x s the number for each cell, θ probablty of cell) p( x θ ) = N! x θ x (2)! Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s l( θ ) log(n!) log!) + x logθ (3) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 5 of 8

42 Dscrete Dstrbuton: Multnomal Recall the mass functon (N s total number of observatons, x s the number for each cell, θ probablty of cell) p( x θ ) = N! x θ x (2)! Takng the log makes math easer, doesn t change answer (monotonc) If we observe x 1...x N, then log lkelhood s l( θ ) log(n!) log!) + x logθ (3) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 5 of 8

43 MLE of Multnomal θ l( θ ) =log(n!) log!) + x logθ + λ 1 θ (4) (5) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 6 of 8

44 MLE of Multnomal θ l( θ ) =log(n!) log!) + x logθ + λ 1 θ (4) (5) Where dd ths come from? Constrant that θ must be a dstrbuton. INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 6 of 8

45 MLE of Multnomal θ l( θ ) =log(n!) log!) + x logθ + λ 1 θ (4) (5) l θ = x θ λ l λ = 1 θ INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 6 of 8

46 MLE of Multnomal θ l( θ ) =log(n!) log!) + x logθ + λ 1 θ (4) (5) l θ = x θ λ l λ = 1 θ INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 6 of 8

47 MLE of Multnomal θ l( θ ) =log(n!) log!) + x logθ + λ 1 θ (4) (5) l θ = x θ λ l λ = 1 θ INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 6 of 8

48 MLE of Multnomal θ l( θ ) =log(n!) log!) + x logθ + λ 1 θ (4) (5) l θ = x θ λ l λ = 1 θ INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 6 of 8

49 MLE of Multnomal θ We have system of equatons θ 1 = x 1 λ (6).. (7) θ K = x K λ (8) θ =1 (9) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 7 of 8

50 MLE of Multnomal θ We have system of equatons θ 1 = x 1 λ (6).. (7) θ K = x K λ (8) θ =1 (9) So let s substtute the frst K equatons nto the last: x λ = 1 (10) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 7 of 8

51 MLE of Multnomal θ We have system of equatons θ 1 = x 1 λ (6).. (7) θ K = x K λ (8) θ =1 (9) So let s substtute the frst K equatons nto the last: x λ = 1 (10) So λ = x = N, INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 7 of 8

52 MLE of Multnomal θ We have system of equatons θ 1 = x 1 λ (6).. (7) θ K = x K λ (8) θ =1 (9) So let s substtute the frst K equatons nto the last: x λ = 1 (10) So λ = x = N, and θ = x N INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 7 of 8

53 Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 6

54 Bg Pctures Ran through several common examples For exstng dstrbutons you can (and should) look up MLE For new models, you can t (foreshadowng of later n class) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 6

55 Bg Pctures Ran through several common examples For exstng dstrbutons you can (and should) look up MLE For new models, you can t (foreshadowng of later n class) Classfcaton models Unsupervsed models (Expectaton-Maxmzaton) Not always so easy INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 2 of 6

56 Classfcaton Classfcaton can be vewed as p(y x,θ ) Have x,y, need θ Dscoverng θ s also problem of MLE INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 3 of 6

57 Clusterng Clusterng can be vewed as p z,θ ) Have x, need z,θ z s guessed at teratvely (Expectaton) θ estmated to maxmze lkelhood (Maxmzaton) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 4 of 6

58 Not always so easy: Bas An estmator s based f ˆθ θ We won t prove t, but the estmate for varance s based Comes from estmatng µ, so need to shrnk varance ˆσ 2 = 1 N 1 µ) 2 (1) INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 5 of 6

59 Not always so easy: Intractable Lkelhoods Not always possble to solve for optmal estmator Use gradent optmzaton (we ll see ths for logstc regresson) Use other approxmatons (e.g., Monte Carlo samplng) Whole subfeld of statstcs / nformaton scence INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 6 of 6

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume