Estimation Theory Fredrik Rusek. Chapter 11

Size: px

Start display at page:

Download "Estimation Theory Fredrik Rusek. Chapter 11"

Adele Little
5 years ago
Views:

1 Estimation Theory Fredrik Rusek Chapter 11

2 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply an MMSE estimator to deterministic parameters Recall the form of the Bayesian estimator for DC-levels in WGN Compute the MSE for a given value of A

3 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply an MMSE estimator to deterministic parameters Recall the form of the Bayesian estimator for DC-levels in WGN α<1 Compute the MSE for a given value of A

4 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply an MMSE estimator to deterministic parameters Recall the form of the Bayesian estimator for DC-levels in WGN α<1 Compute the MSE for a given value of A Variance smaller than classical estimator Large bias for large A

5 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply an MMSE estimator To deterministic parameters Recall the form of the Bayesian estimator for DC-levels in WGN α<1 Compute the MSE for a given value of A MSE for Bayesian is smaller for A close to the prior mean, but larger far away

6 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters However, the BMSE is smaller To deterministic parameters

7 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters However, the BMSE is smaller To deterministic parameters

8 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters However, the BMSE is smaller To deterministic parameters

9 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters However, the BMSE is smaller To deterministic parameters

10 Risk Functions To deterministic parameters p(θ) θ p(x θ) x Estimator θ Error: ε = θ - θ

11 Risk Functions To arameters p(θ) θ p(x θ) x Estimator θ Error: ε = θ - θ The MMSE estimator minimizes Bayes Risk where the cost function is

12 Risk Functions To arameters p(θ) θ p(x θ) x Estimator θ Error: ε = θ - θ The MMSE estimator minimizes Bayes Risk where the cost function is

13 Risk Functions To arameters An estimator that minimizez Bayes risk, for some cost, is termed a Bayes estimator p(θ) θ p(x θ) x Estimator θ Error: ε = θ - θ The MMSE estimator minimizes Bayes Risk where the cost function is

14 Let us now optimize for different To arameters For a quadratic cost, we already know that

15 Let us now optimize for different Bayes risk equals

16 Let us now optimize for different Bayes risk equals Minimize this to minimize Bayes risk

17 Let us now optimize for different Bayes risk equals

18 Let us now optimize for different Bayes risk equals

19 Let us now optimize for different Bayes risk equals We need depends on θ, but the limits of the integral Not standard differential

20 Interlude: Leibnitz s rule (very useful)

21 Leibnitz s rule (very useful) We have:

22 Leibnitz s rule (very useful)

23 Leibnitz s rule (very useful) u = θ φ 2 (u)= θ

24 Leibnitz s rule (very useful)

25 Leibnitz s rule (very useful)

26 Leibnitz s rule (very useful)

27 Leibnitz s rule (very useful) Lower limit does not depend on u: u = θ

28 Leibnitz s rule (very useful)

29 Leibnitz s rule (very useful)

30 Leibnitz s rule (very useful)

31 Let us now optimize for different Bayes risk equals We need depends on θ, but the limits of the integral Not standard differential

32 Let us now optimize for different Bayes risk equals

33 Let us now optimize for different Bayes risk equals θ is the median of the posterior

34 Let us now optimize for different Bayes risk equals θ = median

35 Let us now optimize for different Bayes risk equals θ = median

36 Let us now optimize for different Bayes risk equals θ = median θ = arg max

37 Let us now optimize for different Bayes risk equals θ = median θ = arg max Let δ->0: θ = arg max (maximum a posterori (MAP)) θ = arg max

38 Gausian posterior What is relation between mean, median and max? θ = median θ = arg max

39 Gausian posterior What is relation between mean, median and max? θ = median Gaussian posterior makes the three risk functions identical θ = arg max

40 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses

41 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses The parameters θ 2. θ N are nuisance parameters, but we can integrate them away

42 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses The parameters θ 2. θ N are nuisance parameters, but we can integrate them away The estimator is

43 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses The parameters θ 2. θ N are nuisance parameters, but we can integrate them away The estimator is

44 Extension to vector parameter Suppose we have a vector parameter of unknowns θ Consider estimation of θ 1. It still holds that the MAP estimator uses The parameters θ 2. θ N are nuisance parameters, but we can integrate them away The estimator is

45 Extension to vector parameter In vector form

46 Extension to vector parameter Observations Classical approach (non-bayesian): We must estimate all unknown paramters jointly, except if..what holds????

47 Extension to vector parameter Observations Classical approach (non-bayesian): We must estimate all unknown paramters jointly, except if Fisher information is diagonal Vector MMSE estimator minimizes the MSE for each component of the unknown vector parameter θ, i.e.,

48 Performance of MMSE estimator

49 Performance of MMSE estimator Function of x

50 Performance of MMSE estimator Bayes rule MMSE estimator

51 Performance of MMSE estimator By definition

52 Performance of MMSE estimator

53 Performance of MMSE estimator Element [1,1] of

54 Additive property Independent observations x 1,x 2 Estimate θ Assume that x 1,x 2, θ are jointly Gaussian Theorem 10.2

55 Additive property Independent observations x 1,x 2 Estimate θ Assume that x 1,x 2, θ are jointly Gaussian Typo in book, should include means as well Independent observations

56 Additive property Independent observations x 1,x 2 Estimate θ Assume that x 1,x 2, θ are jointly Gaussian MMSE estimate can be updated sequentially!!!

57 MAP estimator n

58 MAP estimator Benefits compared with MMSE Not needed (typically hard to find) Optimization generally easier than finding the conditional expectation n

59 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once

60 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014

61 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014 Now consider a title game in Observe Y=y1, where y1=win Two hypotheses: H1: Aljechin defends title H2: Carlsen defends title

62 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014 Now consider a title game in Observe Y=y1, where y1=win Two hypotheses: H1: Aljechin defends title H2: Carlsen defends title Given the above statistics f(y1 H1)>f(y1 H2)

63 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014 Now consider a title game in Observe Y=y1, where y1=win Two hypotheses: H1: Aljechin defends title H2: Carlsen defends title Given the above statistics f(y1 H1)>f(y1 H2) ML rule: Aljechin takes title (although he died in 1946)

64 MAP vs ML estimator Alexander Aljechin ( ) became world chess champion 1927 (by defeating Capablanca) Aljechin defended his title twice, and regained it once Magnus Calrsen became world champion 2013, and defended the title Once in 2014 Now consider a title game in Observe Y=y1, where y1=win Two hypotheses: H1: Aljechin defends title H2: Carlsen defends title Given the above statistics f(y1 H1)>f(y1 H2) MAP rule: f(h1)=0, -> Carlsen defends title

65 Example DC-level in white noise, uniform prior U[-A 0,A 0 ] The posterior is We got stuck here: Cannot put the denominator in closed form Cannot integrate the nominator Lets try with the MAP estimator

66 Example DC-level in white noise, uniform prior U[-A 0,A 0 ] The posterior is Denominator: Does not depend on A -> irrelevant

67 Example DC-level in white noise, uniform prior U[-A 0,A 0 ] The posterior is Denominator: Does not depend on A -> irrelevant We need to maximize the nominator

68 Example DC-level in white noise, uniform prior U[-A 0,A 0 ]

69 Example DC-level in white noise, uniform prior U[-A 0,A 0 ]

70 Example DC-level in white noise, uniform prior U[-A 0,A 0 ] MAP estimator can be found! Lesson learned (generally true) MAP is easier to find than MMSE

71 Element-wise MAP for vector-valued parameter No-integration-needed benefit gone

72 Element-wise MAP for vector-valued parameter No-integration-needed benefit gone The estimator Minimizes the hit-or-miss risk for each I, where δ->0

73 Element-wise MAP for vector-valued parameter No-integration-needed benefit gone Let us now define another risk function Easy to prove that as δ->0, Bayes risk is minimized by the vector-map-estimator

74 Element-wise MAP and vector valued MAP are not the same Vector-valued MAP solution Element-wise MAP solution

75 Two properties of vector-map For jointly Gaussian x and θ, the conditional mean E(θ x) coincides with the peak of p(θ x). Hence, the vector-map and the MMSE coincide.

76 Two properties of vector-map For jointly Gaussian x and θ, the conditional mean E(θ x) coincides with the peak of p(θ x). Hence, the vector-map and the MMSE coincide. Invariance does not hold for MAP (as opposed to MLE)

77 Invariance Why does invariance hold for MLE? With α=g(θ), it holds that p(x α) = p θ (x g -1 (α))

78 Invariance Why does invariance hold for MLE? With α=g(θ), it holds that p(x α) = p θ (x g -1 (α)) However, MAP involves the prior, and it doesn t hold that p α (α)=p θ (g -1 (α)), since the two distributions are related through the Jacobian

79 Example Exponential Inverse gamma

80 Example Exponential Inverse gamma MAP

81 Example Exponential Inverse gamma MAP

82 Example Exponential Inverse gamma MAP

83 Example Consider estimation of? (holds for MLE)

84 Example Consider estimation of? (holds for MLE)

85 Example Consider estimation of? (holds for MLE)

86 Example Consider estimation of? (holds for MLE)

87 Example Consider estimation of? (holds for MLE)

88 Example Consider estimation of.

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood