Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis L. Scharf Let x deote a radom variable whose pdf p θ x is parameterized by the ukow parameter θ. For example cosider Fig. that shows two desities oe with θ ad the other with θ 2. p µ x p µ2 x bx x Fig.. Typical desity fuctios. Suppose that x is observed. Based o the prior model p θ x Fig. we ca say that x is more probably observed whe θ = θ 2 tha whe θ = θ. More geerally there may be a uique value of θ for which x is more probably observed tha for ay other. We call this value of θ as the maximum likelihood estimate ad deote it by θ: TexPoit fots used i EMF. [ ] Read the TexPoit maual before you delete this box.: θ = arg max p θ x. θ The fuctio lθ x = p θ x is called the likelihood fuctio ad its logarithm Lθ x = lp θ x 2 is called the log-likelihood fuctio. Whe Lθ x is cotiuously differetiable i θ the the ML estimate may be determied by differetiatig the log-likelihood fuctio. The ML estimate is the called the root of the ML equatio: θ Lθ x = θ lp θ x = 0. 3
2 We will assume that there is oly oe value of θ for which the derivative is 0. Example : What is the ML estimate of θ whe we observe y = θ + r where r Nµ σ 2? First we compute Lθ y = lp θ y. We ote that whe y = θ + r y is a ormal radom variable Nθ + µ σ 2. We have which implies that θ lp θy = θ { l + y µ + θ 2 2πσ 2 2 2σ 2 = 4σ 2 θ y µ + θ2 = y µ + θ 2σ2 = 0 θ = y µ. } Example 2: What is the ML estimate of θ whe we observe y = θ + r where r Nµ σ 2 I? I other words the oise vector r cosists of i.i.d idepedet idetically distributed radom variables each distributed as Nµ σ 2. From our observatio model we ote that y Nθ + µ σ 2 I ad y is also a collectio of idepedet rvs. Hece the joit desity of y is give by the product of the margials. We have lp θ y = l exp y i µ + θ 2 2πσ 2 2 2σ 2 = l exp y i µ + θ 2 2πσ 2 2 2σ 2 This leads to Fially we obtai = l + 2πσ 2 θ lp θy = 4σ 2 = 2σ 2 = 0. θ = 2 y i µ + θ 2 2σ 2 θ y i µ + θ 2 y i µ θ y i µ..
3 A very importat poit: So far θ is determiistic but ukow but θ is a radom variable. Example 3: What is the mea of θ? E θ = E y i µ = = = θ. Ey i µ µ + θ µ A estimator with the property that E θ = θ is said to be a ubiased estimate. Example 4: What is the variace of θ? For ow assume µ = 0. 2 E θ 2 = E y i = E y 2 i 2. This is because all the cross-terms yield Ey i y j i j which is 0 sice y i are i.i.d. It is straightforward to show that 2 var θ = E θ 2 σ 2 E θ =. Whe = we are reduced to the first example. The estimate i the first example is also ubiased ad has variace σ 2. As we add more observatios > the variace ucertaity i the estimate scales dow by. Ad goes to 0 as. I this case the ML estimate has the property that its variace goes to 0 as. Example 5: What is the distributio desity of θ? Ca be easily geeralized.
4 Example 6: What is the ML estimate of θ R r R m Nµ Σ? Clearly A R m. whe we observe y = Aθ + r where pr = 2π 2 Σ 2 exp 2 r µt Σ r µ. 4 From the observatio model y Aθ is distributed as Nµ Σ. We have the log-likelihood fuctio: lp θ y = exp 2π 2 Σ 2 2 y Aθ µt Σ y Aθ µ which leads to θ lp θy = θ y + Aθ + µt Σ y + Aθ + µ = y + Aθ + µ T Σ A + y + Aθ + µ T Σ A = 0. 0 = y + Aθ + µ T Σ A 0 = A T Σ y + Aθ + µ A T Σ y A T Σ µ = A T Σ Aθ θ = A T Σ A A T Σ y A T Σ µ. Recall Ax + b T CDx + e x = Dx + e T C T A + Ax + b T CD. Example 7: Is the above estimate ubiased? Example 8: What is the variace of the above estimate? Example 9: What is the distributio of the above desity?
5 Lecture 2: Wedesday Example 0: Cosider r i i =... be i.i.d Nµ σ 2. What is the ML estimate of µ ad σ 2? It is straightforward to show that µ = r i i.e. ML estimate µ is the sample mea. Note that this estimate is idepedet of the variace. For the variace we ote that the log-likelihood fuctio is x l p xr... r = l p x r i x Solvig for x yields = ri µ 2 l exp x 2πx 2x = l l x r i µ 2 x 2π 2x = 2x r i µ 2 2 x x = 2x + r i µ 2 = 0. 2x 2 2x + r i µ 2 = 0 2x 2 r i µ 2 = x 2 x x = σ 2 = r i µ 2. Whe the mea is ukow ad is itself estimate by µ the variace estimate is x = σ 2 = r i µ 2.
6 Example : Is the above estimate ubiased? Let s check. E σ2 = E r i µ 2 = E r i µ + µ µ 2 = E ri µ 2 + 2r i µµ µ + µ µ 2 = E r i µ 2 + 2E r i µµ µ + = σ 2 2E µ µ r i µ + σ2 = σ 2 2E µ µ µ µ + σ2 = σ 2 2 σ2 + σ2 = σ 2 σ2 = σ2. E µ µ 2 So σ 2 comes out be biased. We ca resolve this problem by ormalizig this estimate as s 2 = σ 2 = r i µ 2 which becomes a ubiased estimate of the variace. A. MLE for radom parameters I case of radom parameters the likelihood fuctio is to be modified as px θ which is the coditioal desity of the observatios x give the ukow radom parameter θ. Everythig else remais the same. The log-likelihood fuctio the becomes l px θ.
7 II. MAXIMUM APOSTERIORI PROBABILITY MAP ESTIMATE Whe a prior o the ukow radom parameter θ is kow the priciple of ML ca be exteded as maximizig the joi desity fuctio l px θ. We write the log-likelihood as: l px θ = l px θpθ = l px θ + l pθ ad the estimate of θ is determied by maximizig the above. The above ca also be viewed as maximizig l px θ = l pθ xpx = l pθ x + l px ad the estimate is the solutio of l pθ x + l px = l pθ x = 0 θ θ where px is the margial desity of x after itegratig θ out of the joit desity. The coditioal desity pθ x is called the posterior probability desity of θ give x. I summary whe px θ ad pθ are kow or ca be computed we use the first maximizatio; o the other had whe pθ x is kow or ca be computed we opt for the secod maximizatio. Notice that i geeral the coditioal desities are kow or easily computable ad the joit desities are much harder to compute. Comparig MLE to MAP the two estimates are the same whe l pθ = 0 or whe pθ θ is idepedet of θ. MLE ad MAP are poit estimates i.e. they estimate the parameter θ but do ot provide e.g. a cofidece iterval for θ.
8 III. BAYESIAN ESTIMATORS Mother Nature coducts a radom experimet that geerates a parameter θ from a probability desity fuctio pθ. This parameter θ the codes or parameterizes the coditioal or measuremet desity px θ. A radom experimet geerates a measuremet x from px θ. The problem is to estimate θ from x. We deote the estimate by θx. The Bayesia setup cosists of the followig otio. Loss fuctio: The quality of the estimate θx is measured by a real-valued loss fuctio. Some examples are: Quadratic loss fuctio: Lθ θx = [θ θx] T [θ θx]. Biary 0 loss fuctio: Lθ θx = 0 if θx = θ ad otherwise. Risk: The risk ca be defied as the average loss fuctio over the desity px θ. The risk basically addresses what is the average loss or risk associate with the estimate θx. Mathematically Rθ θ = E x Lθ θx = Lθ θxpx θdx. The otatio E x idicates that the expectatio is over the distributio of the radom measuremet x with θ fixed. Bayes risk: Bayes risk is the risk averaged over the prior distributio o θ: R θ = E θ Rθ θ = Rθ θpθdθ = Lθ θx px θdxpθdθ. }{{} pxθdxdθ Bayes Risk estimator: The Bayes risk estimator miimizes the Bayes risk: θ B = arg mi Rθ θ i.e. the value of θ that miimizes the Bayes risk.