Maximum Likelihood (ML), Expecta6on Maximiza6on (EM)

Size: px

Start display at page:

Download "Maximum Likelihood (ML), Expecta6on Maximiza6on (EM)"

Alice Andrews
6 years ago
Views:

1 Maximum Likelihood (ML), Expecta6on Maximiza6on (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, ProbabilisAc RoboAcs

2 Outline Maximum likelihood (ML) Priors, and maximum a posteriori (MAP) Cross- validaaon ExpectaAon MaximizaAon (EM)

3 Thumbtack Let θ = P(up), 1- θ = P(down) How to determine θ? Empirical esamate: 8 up, 2 down à

4 htp://web.me.com/todd6ton/site/classroom_blog/entries/ 2009/10/7_A_Thumbtack_Experiment.html

5 Maximum Likelihood θ = P(up), 1- θ = P(down) Observe: Likelihood of the observaaon sequence depends on θ: Maximum likelihood finds à extrema at θ = 0, θ = 1, θ = 0.8 à InspecAon of each extremum yields θ ML = 0.8

6 Maximum Likelihood More generally, consider binary- valued random variable with θ = P(1), 1- θ = P(0), assume we observe n 1 ones, and n 0 zeros Likelihood: DerivaAve: Hence we have for the extrema: n1/(n0+n1) is the maximum = empirical counts.

7 Log- likelihood The funcaon is a monotonically increasing funcaon of x Hence for any (posiave- valued) funcaon f: Oaen more convenient to opamize log- likelihood rather than likelihood Example:

8 Log- likelihood ß à Likelihood Reconsider thumbtacks: 8 up, 2 down Likelihood Log- likelihood Not Concave Concave DefiniAon: A funcaon f is concave if and only Concave funcaons are generally easier to maximize then non- concave funcaons

9 Concavity and Convexity f is concave if and only f is convex if and only x 1 x 2 x 1 x 2 λx 2 +(1-λ)x 2 Easy to maximize λ x 2 +(1-λ)x 2 Easy to minimize

10 ML for MulAnomial Consider having received samples

11 ML for Fully Observed HMM Given samples Dynamics model: ObservaAon model: à Independent ML problems for each and each

12 ML for ExponenAal DistribuAon Source: wikipedia Consider having received samples 3.1, 8.2, 1.7 ll

13 ML for ExponenAal DistribuAon Source: wikipedia Consider having received samples

14 Uniform Consider having received samples

15 ML for Gaussian Consider having received samples

16 ML for CondiAonal Gaussian Equivalently: More generally:

17 ML for CondiAonal Gaussian

18 ML for CondiAonal MulAvariate Gaussian

19 Aside: Key IdenAAes for DerivaAon on Previous Slide

20 ML EsAmaAon in Fully Observed Linear Gaussian Bayes Filter Segng Consider the Linear Gaussian segng: Fully observed, i.e., given à Two separate ML esamaaon problems for condiaonal mulavariate Gaussian: 1: 2:

21 Priors Thumbtack Let θ = P(up), 1- θ = P(down) How to determine θ? ML esamate: 5 up, 0 down à Laplace esamate: add a fake count of 1 for each outcome

22 Priors Thumbtack AlternaAvely, consider θ to be random variable Prior P(θ) = C θ(1- θ) Measurements: P( x θ ) Posterior: Maximum A Posterior (MAP) esamaaon à = find θ that maximizes the posterior

23 Priors Beta DistribuAon Figure source: Wikipedia

24 Priors Dirichlet DistribuAon Generalizes Beta distribuaon MAP esamate corresponds to adding fake counts n 1,, n K

25 MAP for Mean of Univariate Gaussian Assume variance known. (Can be extended to also find MAP for variance.) Prior:

26 MAP for Univariate CondiAonal Linear Gaussian Assume variance known. (Can be extended to also find MAP for variance.) Prior: [Interpret!]

27 MAP for Univariate CondiAonal Linear Gaussian: Example TRUE --- Samples. ML --- MAP ---

28 Cross ValidaAon Choice of prior will heavily influence quality of result Fine- tune choice of prior through cross- validaaon: 1. Split data into training set and validaaon set 2. For a range of priors, Train: compute θ MAP on training set Cross-validate: evaluate performance on validation set by evaluating the likelihood of the validation data under θ MAP just found 3. Choose prior with highest validaaon score For this prior, compute θ MAP on (training+validation) set Typical training / validaaon splits: 1- fold: 70/30, random split 10- fold: paraaon into 10 sets, average performance for each set being the validaaon set and the other 9 being the training set

29 Outline Maximum likelihood (ML) Priors, and maximum a posteriori (MAP) Cross- validaaon Expecta6on Maximiza6on (EM)

30 Mixture of Gaussians Generally: Example: ML ObjecAve: given data z (1),, z (m) Setting derivatives w.r.t. θ, µ, Σ equal to zero does not enable to solve for their ML estimates in closed form We can evaluate funcaon à we can in principle perform local opamizaaon. In this lecture: EM algorithm, which is typically used to efficiently opamize the objecave (locally)

31 ExpectaAon MaximizaAon (EM) Example: Model: Goal: Given data z (1),, z (m) (but no x (i) observed) Find maximum likelihood esamates of μ 1, μ 2 EM basic idea: if x (i) were known à two easy-to-solve separate ML problems EM iterates over E-step: For i=1,,m fill in missing data x (i) according to what is most likely given the current model ¹ M-step: run ML for completed data, which gives new model ¹

32 EM DerivaAon EM solves a Maximum Likelihood problem of the form: µ: parameters of the probabilisac model we try to find x: unobserved variables z: observed variables Jensen s Inequality

33 Jensen s inequality Illustration: P(X=x 1 ) = 1-λ, P(X=x 2 ) = λ x 1 x 2 E[X] = λx 1 +(1-λ)x 2

34 EM DerivaAon (ctd) Jensen s Inequality: equality holds when is an affine function. This is achieved for EM Algorithm: Iterate 1. E- step: Compute 2. M- step: Compute M-step optimization can be done efficiently in most cases E-step is usually the more expensive step It does not fill in the missing data x with hard values, but finds a distribution q(x)

35 EM DerivaAon (ctd) M- step objecave is upper- bounded by true objecave M- step objecave is equal to true objecave at current parameter esamate à Improvement in true objective is at least as large as improvement in M-step objective

36 EM 1- D Example iteraaons EsAmate 1- d mixture of two Gaussians with unit variance: one parameter μ; μ 1 = μ - 7.5, μ 2 = μ + 7.5

37 EM for Mixture of Gaussians X ~ MulAnomial DistribuAon, P(X=k ; θ) = θ k Z ~ N(μ k, Σ k ) Observed: z (1), z (2),, z (m)

38 EM for Mixture of Gaussians E- step: M-step:

39 ML ObjecAve HMM Given samples Dynamics model: ObservaAon model: ML objecave: à No simple decomposiaon into independent ML problems for each and each à No closed form soluaon found by segng derivaaves equal to zero

40 EM for HMM M- step à θ and γ computed from soa counts

41 EM for HMM E- step No need to find condiaonal full joint Run smoother to find:

42 ML ObjecAve for Linear Gaussians Linear Gaussian segng: Given ML objecave: EM- derivaaon: same as HMM

43 EM for Linear Gaussians E- Step Forward: Backward:

44 EM for Linear Gaussians M- step [Updates for A, B, C, d. TODO: Fill in once found/derived.]

45 EM for Linear Gaussians The Log- likelihood When running EM, it can be good to keep track of the log- likelihood score it is supposed to increase every iteraaon

46 EM for Extended Kalman Filter Segng As the linearizaaon is only an approximaaon, when performing the updates, we might end up with parameters that result in a lower (rather than higher) log- likelihood score à SoluAon: instead of updaang the parameters to the newly esamated ones, interpolate between the previous parameters and the newly esamated ones. Perform a line- search to find the segng that achieves the highest log- likelihood score

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and