Lecture 8: Information Theory and Statistics

Size: px
Start display at page:

Download "Lecture 8: Information Theory and Statistics"

Transcription

1 Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University December 23, / 50 I-Hsiang Wang IT Lecture 8 Part II

2 1 Hypothesis Testing 2 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 2 / 50 I-Hsiang Wang IT Lecture 8 Part II

3 1 Hypothesis Testing 2 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 3 / 50 I-Hsiang Wang IT Lecture 8 Part II

4 Basic Setup We begin with the simplest setup binary hypothesis testing: 1 Two hypotheses regarding the observation X, indexed by θ {0, 1}: H 0 : X P 0 (Null Hypothesis, θ = 0) H 1 : X P 1 (Alternative Hypothesis, θ = 1) 2 Goal: design a decision making algorithm ϕ : X {0, 1}, x ˆθ, to choose one of the two hypotheses, based on the observed realization of X, so that a certain cost (or risk) is minimized. 3 A popular measure of the cost is based on probability of errors: Probability of false alarm (false positive; type I error): α ϕ P FA (ϕ) P {H 1 is chosen H 0 }. Probability of miss detection (false negative; type II error): β ϕ P MD (ϕ) P {H 0 is chosen H 1 }. 4 / 50 I-Hsiang Wang IT Lecture 8 Part II

5 Deterministic Testing Algorithm Decision Regions X Observation Space A 1 ( ) Acceptance Region of H 1. A 0 ( ) Acceptance Region of H 0. A test ϕ : X {0, 1} is equivalently characterized by its corresponding acceptance (decision) regions: ) { 1 A θ (ϕ) ϕ (ˆθ x X : ϕ (x) = ˆθ }, ˆθ = 0, 1. Hence, the two types of probability of error can be equivalently represented as α ϕ = β ϕ = x A 1 (ϕ) x A 0(ϕ) P 0 (x) = ϕ (x) P 0 (x), x X P 1 (x) = (1 ϕ (x)) P 1 (x). x X When the context is clear, we often drop the dependency on the test ϕ when dealing with acceptance regions Aˆθ. 5 / 50 I-Hsiang Wang IT Lecture 8 Part II

6 Likelihood Ratio Test Definition 1 (Likelihood Ratio Test) A (deteministic) likelihood ratio test (LRT) is a test ϕ τ, parametrized by constants τ > 0 (called threshold), defined as follows: { 1 if P 1 (x) > τp 0 (x) ϕ τ (x) = 0 if P 1 (x) τp 0 (x). For x supp P0, the likelihood ratio L (x) P 1(x) P 0(x). Hence, LRT is a thresholding algorithm on likelihood ratio L (x). Remark: For computational convenience, often one deals with log likelihood ratio (LLR) log (L(x)) = log (P 1 (x)) log (P 0 (x)). 6 / 50 I-Hsiang Wang IT Lecture 8 Part II

7 Trade-Off Between α (P FA ) and β (P MD ) Theorem 1 (Neyman-Pearson Lemma) For a likelihood ratio test ϕ τ and another deterministic test ϕ, α ϕ α ϕτ = β ϕ β ϕτ. pf: Observe x X, 0 (ϕ τ (x) ϕ (x)) (P 1 (x) τp 0 (x)), because if P 1 (x) τp 0 (x) > 0 = ϕ τ (x) = 1 = (ϕ τ (x) ϕ (x)) 0. if P 1 (x) τp 0 (x) 0 = ϕ τ (x) = 0 = (ϕ τ (x) ϕ (x)) 0. Summing over all x X, we get 0 (1 β ϕτ ) (1 β ϕ ) τ (α ϕτ α ϕ ) = (β ϕ β ϕτ ) + τ (α ϕ α ϕτ ). Since τ > 0, from above we conclude that α ϕ α ϕτ = β ϕ β ϕτ. 7 / 50 I-Hsiang Wang IT Lecture 8 Part II

8 (P MD ) (P MD ) (P FA ) 1 (P FA ) Question: What is the optimal trade-off curve? What is the optimal test achieving the curve? 8 / 50 I-Hsiang Wang IT Lecture 8 Part II

9 Randomized Testing Algorithm Randomized tests include deterministic tests as special cases. Definition 2 (Randomized Test) A randomized test decides ˆθ = 1 with probability ϕ (x) and ˆθ = 0 with probability 1 ϕ (x), where ϕ is a mapping ϕ : X [0, 1]. Note: A randomized test is characterized by ϕ, as in deterministic tests. Definition 3 (Randomized LRT) A randomized likelihood ratio test (LRT) is a test ϕ τ,γ, parametrized by cosntants τ > 0 and γ (0, 1), defined as follows: 1 if P 1 (x) > τp 0 (x) ϕ τ,γ (x) = γ if P 1 (x) = τp 0 (x). 0 if P 1 (x) < τp 0 (x) 9 / 50 I-Hsiang Wang IT Lecture 8 Part II

10 Randomized LRT Achieves the Optimal Trade-Off Consider the following optimization problem: Neyman-Pearson Problem minimize ϕ:x [0,1] subject to β ϕ α ϕ α Theorem 2 (Neyman-Pearson) A randomized LRT ϕ τ,γ with the parameters (τ, γ ) satisfying α = α ϕτ,γ, attains optimality for the Neyman-Pearson Problem. 10 / 50 I-Hsiang Wang IT Lecture 8 Part II

11 pf: First argue that for any α (0, 1), one can find (τ, γ ) such that α = α ϕτ,γ = ϕ τ,γ (x) P 0 (x) x X = P 0 (x) + γ P 0 (x) x: L(x)>τ x: L(x)=τ For any test ϕ, due to a similar argument as in Theorem 1, we have x X, (ϕ τ,γ (x) ϕ (x)) (P 1 (x) τ P 0 (x)) 0. Summing over all x X, similarly we get ( βϕ β ϕτ,γ ) + τ ( α ϕ α ϕτ,γ ) 0 Hence, for any feasible test ϕ with α ϕ α = α ϕτ,γ, its probability of type II error β ϕ β ϕτ,γ. 11 / 50 I-Hsiang Wang IT Lecture 8 Part II

12 Bayesian Setup Sometimes prior probabilities of the two hypotheses are known: π θ P {H θ is true}, θ = 0, 1, π 0 + π 1 = 1. In this sense, one can view the index Θ as a (binary) random variable with (prior) distribution P {Θ = θ} = π θ, for θ = 0, 1. With prior probabilities, it then makes sense to talk about the average probability of error for a test ϕ, or more generally, the average cost (risk): { P e (ϕ) π 0 α ϕ + π 1 β ϕ = E Θ,X [1 Θ ˆΘ }], ] R (ϕ) E Θ,X [r Θ, ˆΘ. The Bayesian hypothesis testing problem is to test the two hypotheses with knowledge of prior probabilities so that the average probability of error (or in general, a risk function) is minimized. 12 / 50 I-Hsiang Wang IT Lecture 8 Part II

13 Minimizing Bayes Risk Consider the following problem of minimizing Bayes risk. minimize ϕ:x [0,1] with known Bayesian Problem R (ϕ) E Θ,X [r Θ, ˆΘ (π 0, π 1 ) and r θ,ˆθ ] Theorem 3 (LRT is an Optimal Bayesian Test) Assume r 0,0 < r 0,1 and r 1,1 < r 1,0. A deterministic LRT ϕ τ threshold τ = (r 0,1 r 0,0 ) π 0 (r 1,0 r 1,1 ) π 1 attains optimality for the Bayesian Problem. with 13 / 50 I-Hsiang Wang IT Lecture 8 Part II

14 pf: R (ϕ) = r 0,0 π 0 P 0 (x) (1 ϕ (x)) + r 0,1 π 0 P 0 (x) ϕ (x) x X x X + r 1,0 π 1 P 1 (x) (1 ϕ (x)) + r 1,1 π 1 P 1 (x) ϕ (x) x X x X = r 0,0 π 0 + (r 0,1 r 0,0 ) π 0 P 0 (x) ϕ (x) x X + r 1,0 π 1 + (r 1,1 r 1,0 ) π 1 P 1 (x) ϕ (x) = x X x X ( ) {}}{ [ (r0,1 r 0,0 ) π 0 P 0 (x) (r 1,1 r 1,0 ) π 1 P 1 (x) ] ϕ (x) + r 0,0 π 0 + r 1,0 π 1. For each x X, we shall choose ϕ (x) [0, 1] such that ( ) is minimized. It is then obvious that we should choose { 1 if (r 0,1 r 0,0 ) π 0 P 0 (x) (r 1,1 r 1,0 ) π 1 P 1 (x) < 0 ϕ (x) = 0 if (r 0,1 r 0,0 ) π 0 P 0 (x) (r 1,1 r 1,0 ) π 1 P 1 (x) / 50 I-Hsiang Wang IT Lecture 8 Part II

15 Discussions For binary hypothesis testing problems, the likelihood ratio L (x) P1(x) P 0 (x) turns out to be a sufficient statistics. Moreover, a likelihood ratio test (LRT) is optimal both in the Bayesian and Neyman-Pearson settings. Extensions include M-ary hypothesis testing Minimax risk optimization (with unknown prior) Composite hypothesis testing, etc. Here we do not pursue these directions further. Instead, we would like to explore the asymptotic behavior of hypothesis testing, and the connection with information theoretic tools. 15 / 50 I-Hsiang Wang IT Lecture 8 Part II

16 1 Hypothesis Testing 2 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 16 / 50 I-Hsiang Wang IT Lecture 8 Part II

17 i.i.d. Observations So far we focus on the general setting where the observation space X can be arbitrary alphabets. In the following, we consider product space X n, length-n observation sequence X n drawn i.i.d. from one of the two distributions, and the two hypotheses are H 0 : X i i.i.d. P 0, i = 1, 2,..., n H 1 : X i i.i.d. P 1, i = 1, 2,..., n The corresponding probability of errors are denoted by α (n) P (n) FA P {H 1 is chosen H 0 } β (n) P (n) MD P {H 0 is chosen H 1 } Throughout the lecture we assume X = {a 1, a 2,..., a d } is a finite set. 17 / 50 I-Hsiang Wang IT Lecture 8 Part II

18 LRT under i.i.d. Observation (1) With i.i.d. observation, the likelihood ratio of a sequence x n X n is L (x n ) = n i=1 P 1 (x i ) P = 0(x i) a X ( ) N(a x n ) P1 (a) ( P 0(a) = P1 P a X 0(a)) nπ(a xn ), where N (a x n ) # of a s in x n and π (a x n ) 1 n N (a xn ) is the relative frequency of occurrence of symbol a in sequence x n. Note: From the above manipulation, we see that the collection of relative frequency of occurrence (as a X -dim probabilty vector), Π x n [ π (a 1 x n ) π (a 2 x n ) π (a d x n ) ] T, called the type of sequence x n, is a sufficient statistics for all the previously mentioned hypothesis testing problems. 18 / 50 I-Hsiang Wang IT Lecture 8 Part II

19 LRT under i.i.d. Observation (2) Let us further manipulate the LRT, by taking log likelihood ratio: L (x n ) τ n log (L (x n )) log τ n nπ (a x n ) log a X ( P1(a) P 0 (a) π (a x n ) log a X π (a x n ) log a X ) log τ n ) ( π(a x n ) P 0(a) ( π(a x n ) P 1 (a) ) 1 n log τ n D (Π x n P 0 ) D (Π x n P 1 ) 1 n log τ n 19 / 50 I-Hsiang Wang IT Lecture 8 Part II

20 Hypothesis Testing Probability Simplex Observation Space X P (X ) P1 A1 ( ) Acceptance Region of H1. (n) Acceptance F1 A0 ( ) P0 (n) Acceptance F0 Region of H0. Acceptance Region of H0. {xn Ai = decide Hi } 20 / 50 Region of H1. I-Hsiang Wang { (n) Πxn Fi IT Lecture 8 Part II } = decide Hi.

21 Probability Simplex P (X ) P 1 F (n) 1 P F (n) 0 P 0 By Sanov s Theorem, we know that α (n) = P n 0 ( F (n) 1 ) ( 2 nd(p P 0 ), β (n) = P n 1 F (n) 0 ) 2 nd(p P 1 ). 21 / 50 I-Hsiang Wang IT Lecture 8 Part II

22 Asymptotic Behaviors 1 Neyman-Pearson: β (n, ε) min ϕ n :X n [0,1] It turns out that for all ε (0, 1), lim n β (n) ϕ n, subject to α (n) ϕ n ε. { 1 n log β (n, ε) } = D (P 0 P 1 ) { 2 Bayesian: P e (n) min ϕ n:x n [0,1] π 0 α (n) ϕ n + π 1 β (n) ϕ n }. It turns out that { lim 1 n n log P e (n) } = D (P λ P 0 ) = D (P λ P 1 ) where P λ (a) (P 0(a)) 1 λ (P 1 (a)) λ x X (P0(x))1 λ (P 1(x)) λ, a X, and λ (0, 1) such that D (P λ P 0 ) = D (P λ P 1 ) 22 / 50 I-Hsiang Wang IT Lecture 8 Part II

23 in Neyman-Pearson Setup Theorem 4 (Chernoff-Stein) { For all ε (0, 1), lim 1 n n log β (n, ε) } = D (P 0 P 1 ). pf: We shall prove the achievability and the converese part separately. Achievability: construct a sequence of tests {ϕ n } with α (n) ϕ n ε for { } n sufficiently large, such that lim inf 1 n n log β(n) ϕ n D (P 0 P 1 ). Converse: for any sequence of tests {ϕ n } with α (n) ϕ n ε for n { } sufficiently large, show that lim sup 1 n log β(n) ϕ n D (P 0 P 1 ). n We use method of types to prove both the achievability and the converse. Alternatively, Chapter 11.8 of Cover&Thomas[1] uses a kind of weak typicality to prove the theorem. 23 / 50 I-Hsiang Wang IT Lecture 8 Part II

24 Achievability: Consider a deterministic test ϕ n (x n ) = 1 {D (Π x n P 0 ) δ n }, δ n 1 n ( log 1 ε + d log(n + 1)). In other words, it determines H 1 if D (Π x n P 0 ) δ n, and H 0 otherwise. Check the probability of Type I error : By Prop. 4 in Part I, we have α (n) ϕ n ( = P i.i.d. {D (Π X n P 0 ) δ n } 2 n Xi P 0 where (a) is due to our construction. δ n d log(n+1) n ) (a) = ε, Analyze the probability of Type II error : ((b) is due to Prop. 3 in Part I) β (n) ϕ n = P n 1 (T n (Q)) (b) Q P n : D(Q P 0 )<δ n P n 2 nd n, where D n Q P n : D(Q P 0 )<δ n 2 nd(q P 1) min {D (Q P 1 )}. Q P n : D(Q P 0 )<δ n Since lim n δ n = 0, we have lim n D n = D (P 0 P 1 ), and achievability is done. 24 / 50 I-Hsiang Wang IT Lecture 8 Part II

25 Converse: We prove the converse for deterministic tests. Extension to randomized tests is left as en exercise (HW6). Let A (n) i {x n ϕ n (x n ) = i}, the acceptance region of H i, for i = 0, 1. Let B (n) {x n D (Π x n P 0) < ε n}, ε n 2d log(n+1). By Prop. 4, we have n ) P n 0 (B (n) = 1 P i.i.d. {D (Π X n P 0 ) ε n } Xi P 0 ( 1 2 n ε n d log(n+1) ) n = 1 2 d log(n+1) 1 as n. ) ( ) Hence, for sufficiently large n, both P n 0 (B (n) and P n 0 A (n) 0 > 1 ε, and ( ) ) ( ) ( ) P n 0 B (n) A (n) 0 = P n 0 (B (n) + P n 0 A (n) 0 P n 0 B (n) A (n) 0 > 2 (1 ε) 1 = 1 2ε. Note B (n) = T n (Q). Hence Q n P n, D (Q n P 0) < ε n Q P n: D(Q P 0 )<ε n such that ( ) P n 0 T n (Q n ) A (n) 0 > (1 2ε) P n 0 (T n (Q n )). (1) 25 / 50 I-Hsiang Wang IT Lecture 8 Part II

26 Key Observation : Note that the probability of each sequence in the same type class is the same, under any product distribution. Hence, (1) is equivalent to T n (Q n ) A (n) > (1 2ε) T n (Q n ), ( ) which implies P n 1 T n (Q n) A (n) 0 > (1 2ε) P n 1 (T n (Q n)). 0 Hence, for sufficiently large n, Q n P n with D (Q n P 0 ) < ε n such that ( ) ( ) P n 1 A (n) 0 P n 1 T n (Q n) A (n) 0 > (1 2ε) P n 1 (T n (Q n)) where (c) is due to Prop. 3. (c) (1 2ε) P n 1 2 nd(q n P 1 ), Finally, as lim εn = 0, we have lim D (Qn P1) = D (P0 P1), and the n n converse proof is done. 26 / 50 I-Hsiang Wang IT Lecture 8 Part II

27 in Bayesian Setup Theorem 5 (Chernoff) where { lim 1 n n log P e (n) } = D (P λ P 0 ) = D (P λ P 1 ) ( P λ (a) = max log λ [0,1] (P 0(x)) 1 λ (P 1(x)) λ x X }{{} Chernoff Information CI (P 0, P 1 ) (P0(a))1 λ (P 1(a)) λ x X (P 0 (x)) 1 λ (P 1 (x)) λ, a X, and λ (0, 1) such that D (P λ P 0 ) = D (P λ P 1 ). Note: The optimal Bayesian test (for minimizing P e ) is the maximum a posterier (MAP) test: ϕ MAP (x n ) = 1 {π 1 P n 1 (xn ) π 0 P n 0 (xn )}. 1 ) 27 / 50 I-Hsiang Wang IT Lecture 8 Part II

28 Only intercept at one point, and it is in [0, 1] D(P λ P 0 ) D(P λ P 1 ) λ 28 / 50 I-Hsiang Wang IT Lecture 8 Part II

29 D(P λ P 0 ) D(P λ P 1 ) min{d(p λ P 0 ),D(P λ P 0 )} λ 29 / 50 I-Hsiang Wang IT Lecture 8 Part II

30 pf: The proof is based on application of large deviation in analyzing the optimal test, MAP: ϕ MAP (x n ) = 1 {π 1 P 1 (x n ) π 0 P 0 (x n )}. Analysis of error probabilities of MAP test : ( ) ( α (n) = P n 0, β (n) = P n 1 where F (n) 1 F (n) 0 ), { F (n) 1 Q P (X ) D (Q P 0) D (Q P 1) 1 log π 0 n { F (n) 0 Q P (X ) D (Q P 0 ) D (Q P 1 ) 1 log π 0 n : By Sanov s Theorem, we have π 1 } π 1 } } } lim { 1n log n α(n) = min D (Q P 0), lim { 1n log Q F 1 n β(n) = min D (Q P 1), Q F 0 where F 1 {Q P (X ) D (Q P 0 ) D (Q P 1 ) 0} and F 0 {Q P (X ) D (Q P 0) D (Q P 1) 0}.,. 30 / 50 I-Hsiang Wang IT Lecture 8 Part II

31 Exponents : Characterizing the two exponents is equivalent to solving the two (convex) optimization problems: min D (Q P 0 ) Q F 1 min D (Q P 1 ) Q F 0 minimize (Q 1,...,Q d ) d l=1 Q l log Q l P 0 (a l ) minimize (Q 1,...,Q d ) d l=1 Q l log Q l P 1 (a l ) subject to d l=1 Q l log P 1(a l ) P 0 (a l ) 0 Q l 0, l = 1,..., d d l=1 Q l = 1 subject to d l=1 Q l log P 1(a l ) P 0 (a l ) 0 Q l 0, l = 1,..., d d l=1 Q l = 1 It turns out that both problems have a common optimal solution P λ (a) = (P 0(a)) 1 λ (P 1 (a)) λ x X (P 0 (x)) 1 λ (P 1 (x)) λ, a X, with λ [0, 1] such that D (P λ P 0) = D (P λ P 1). 31 / 50 I-Hsiang Wang IT Lecture 8 Part II

32 Hence, both types of error probabilities have the same exponent, and so does the average error probability. This completes the proof of the first part. Chernoff Information : To show that ( CI (P 0, P 1 ) max log λ [0,1] simply observe that a X D (P λ P 0 ) = D (P λ P 1 ) 1 (P 0 (x)) 1 λ (P 1 (x)) λ x X (P 0 (a)) 1 λ (P 1 (a)) λ (log P 0 (a) log P 1 (a)) = 0 ) = D (P λ P 0 ), 1 D (P λ P 0) = D (P λ P 1) = log (P 0 (x)) 1 λ (P 1 (x)) λ. Proof complete. x X 32 / 50 I-Hsiang Wang IT Lecture 8 Part II

33 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 1 Hypothesis Testing 2 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 33 / 50 I-Hsiang Wang IT Lecture 8 Part II

34 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Parametric In this lecture we focus on parametric estimation, where samples of data are assumed to be drawn from a family of distributions on alphabet X {P θ P (X ) θ Θ}, where θ is called the parameter and Θ is the parameter set. (In this lecture, we mainly focus on X = R or R n where P θ are densities.) Such parametric framework is useful when one is familiar with certain properties of the data and has a good statistical model for the data. The parameter set Θ is hence fixed, not scaling with the samples of data. In contrast, if such knowledge about the underlying data is not sufficient, the non-parametric framework might be more suitable. 34 / 50 I-Hsiang Wang IT Lecture 8 Part II

35 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Outline Parametric estimation itself is a vast area. In this lecture we shall go through some basic results and then draw some connections between estimation theory and information theory. Topics to be discussed in this lecture: 1 Performance Evaluation of Estimators Bias, mean squared error, and Cramér-Rao lower bound Risk function optimization 2 Maximum Likelihood Estimator (MLE) 3 Asymptotic Evaluation Consistency Efficiency 4 Bayesian Estimators 35 / 50 I-Hsiang Wang IT Lecture 8 Part II

36 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 1 Hypothesis Testing 2 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 36 / 50 I-Hsiang Wang IT Lecture 8 Part II

37 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Estimator, Bias, Mean Squared Error Definition 4 (Estimator) Consider X P θ randomly generates the observed sample x, where θ Θ is an unknown parameter lies in the parameter set Θ. An estimator of θ based on observed x is a mapping ϕ : X Θ, x ˆθ. An estimator of function z (θ) is a mapping ζ : X z (Θ), x ẑ. For the case X = R or R n, it is reasonable to consider the following two measures of performance of estimators. Definition 5 (Bias, Mean Squared Error) For an estimator ϕ (x) of θ, Bias θ (ϕ) E Pθ [ϕ (X)] θ, MSE θ (ϕ) E Pθ [ ϕ (X) θ 2]. 37 / 50 I-Hsiang Wang IT Lecture 8 Part II

38 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Risk Function Fact 1 (MSE = Variance +(Bias) 2 ) For an estimator ϕ (x) of θ, MSE θ (ϕ) = Var Pθ [ϕ (X)] + (Bias θ (ϕ)) 2. pf: MSE θ (ϕ) E Pθ [ ϕ (X) θ 2 ] = E Pθ [ (ϕ (X) EPθ [ϕ (X)] + E Pθ [ϕ (X)] θ) 2] = Var Pθ [ϕ (X)] + (Bias θ (ϕ)) 2 + 2Bias θ (ϕ) E Pθ [ϕ (X) E Pθ [ϕ (X)]]. }{{} 0 MSE is a special case of the risk function of an estimator. Definition 6 (Risk Function) Let r : Θ Θ R denote the risk (cost) of estimating θ with ˆθ. The risk function of an estimator ϕ is defined as R θ (ϕ) E Pθ [r (θ, ϕ(x))]. 38 / 50 I-Hsiang Wang IT Lecture 8 Part II

39 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators With risk functions as the performance measures of estimators, it is then possible to ask the following questions: What is the best estimator that minimizes the risk? What is the minimum risk? But, the questions are not explicit: optimal in what sense? Minimax: the worst-case risk (over Θ) is minimized Bayesian: with prior distribution {π (θ) θ Θ}, the expected risk (Bayes risk) is minimized In the following, we do not pursue these directions further (detailed treatment can be found in decision theory). Instead, we provide a parameter-dependent lower bound on the MSE of unbiased estimators, namely, Cramér-Rao Inequality. Later, we shall also briefly introduce results in the Bayesian setup. 39 / 50 I-Hsiang Wang IT Lecture 8 Part II

40 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Lower Bound on MSE of Unbiased Estimators Below we deal with densities and hence change notation from P θ to f θ. Definition 7 (Fisher Information) The Fisher Information of θ is defined as J (θ) E fθ [ ( θ ln f θ(x) ) 2 ]. Definition 8 (Unbiased Estimator) An estimator ϕ is unbiased if Bias θ (ϕ) = 0 for all θ Θ. Now we are ready to state the theorem. Theorem 6 (Cramér-Rao) For any unbiased estimator ϕ, we have MSE θ (ϕ) 1 J(θ), θ Θ. 40 / 50 I-Hsiang Wang IT Lecture 8 Part II

41 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Proof of Cramér-Rao Inequality pf: The proof is essentially an application of Cauchy-Schwarz inequality. Let us begin with the observation that J (θ) = Var fθ [s θ (X)], where s θ (X) θ ln f θ(x) = 1 f θ (X) θ f θ(x), because E fθ [s θ (X)] = f 1 θ (x) f θ (x) θ f θ(x) dx = f θ(x) dx = 0. = d d θ Hence, by Cauchy-Schwarz inequality, we have θ f θ(x) dx (Cov fθ (s θ (X), ϕ (X))) 2 Var fθ [s θ (X)] Var fθ [ϕ (X)]. Since Bias θ (ϕ) = 0, we have MSE θ (ϕ) = Var fθ [ϕ (X)], and hence MSE θ (ϕ) J (θ) (Cov fθ (s θ (X), ϕ (X))) / 50 I-Hsiang Wang IT Lecture 8 Part II

42 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators It remains to prove that Cov fθ (s θ (X), ϕ (X)) = 1: {}}{ Cov fθ (s θ (X), ϕ (X)) = E fθ [s θ (X) ϕ (X)] E fθ [s θ (X)] E fθ [ϕ (X)] = E fθ [s θ (X) ϕ (X)] [ ] 1 = E fθ f θ (X) θ f θ(x)ϕ (X) f θ(x)ϕ (x) dx = d d θ = d d θ E f θ [ϕ (X)] (a) = d d θ θ = 1, where the (a) holds because ϕ is unbiased. The proof is complete. Remark: Cramér-Rao inequality can be extended to vector estimators, biased estimators, estimator of a function of θ, etc / 50 I-Hsiang Wang IT Lecture 8 Part II

43 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Extensions of Cramér-Rao Inequality Below we list some extensions and leave the proofs as exercises. Exercise 1 (Cramér-Rao Inequality for Unbiased Functional Estimators) Prove that for any unbiased estimator ζ of z (θ), MSE θ (ζ) 1 J(θ) ( d d θ z (θ) ) 2. Exercise 2 (Cramér-Rao Inequality for Biased Estimators) Prove that for any estimator ϕ of the parameter θ, MSE θ (ϕ) 1 J(θ) ( 1 + d d θ Bias θ (ϕ)) 2 + (Biasθ (ϕ)) 2. Exercise 3 (Attainment of Cramér-Rao) Show that the necessary and sufficient condition for an unbiased estimator ϕ to attain the Cramér-Rao lower bound is that, there exists some function g such that for all x, g (θ) (ϕ (x) θ) = θ ln f θ (x). 43 / 50 I-Hsiang Wang IT Lecture 8 Part II

44 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators More on Fisher Information Fisher Information plays a key role in Cramér-Rao lower bound. We make further remarks about it. 1 J (θ) E fθ [(s θ (X)) 2] = Var fθ [s θ (X)], where the score of θ, s θ (X) θ ln f θ(x) = 1 f θ (X) θ f θ(x) is zero-mean. 2 Suppose X i i.i.d. f θ, then for the estimation problem with observation X n, its Fisher information J n (θ) = nj (θ), where J (θ) is the Fisher information when the observation is just X f θ. 3 For an exponential family {f θ θ Θ}, it can be shown that [ ] J (θ) = E 2 fθ θ ln f 2 θ (X), which makes computation of J (θ) simpler. 44 / 50 I-Hsiang Wang IT Lecture 8 Part II

45 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 1 Hypothesis Testing 2 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators 45 / 50 I-Hsiang Wang IT Lecture 8 Part II

46 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Maximum Likelihood Estimator Maximum Likelihood Estimator (MLE) is a widely used estimator. Definition 9 (Maximum Likelihood Estimator) The Maximum Likelihood Estimator (MLE) for estimating θ from a randomly drawn X P θ is defined as ϕ MLE (x) arg max {P θ (x)}. θ Θ Here P θ (x) is called the likelihood function. Exercise 4 (MLE of Gaussian with Unknown Mean and Variance) i.i.d. Consider X i N ( µ, σ 2) for i = 1, 2,..., n, where θ ( µ, σ 2) denote the unknown parameter. Let x 1 n n i=1 x i. Show that ϕ MLE (x n ) = ( x, 1 n n i=1 (x i x) 2). 46 / 50 I-Hsiang Wang IT Lecture 8 Part II

47 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Asymptotic Evaluations: Consistency In the following we consider observation of n i.i.d. drawn samples i.i.d. X i P θ, i = 1,..., n, and give two ways of evaluating the performance of a sequence of estimators {ϕ n (x n ) n N} as n. Definition 10 (Consistency) A sequence of estimators {ζ n (x n ) n N} is consistent if ε > 0, lim n P X i i.i.d. P θ { ζ n (X n ) z (θ) < ε} = 1, θ Θ. In other words, ζ n (X n ) p z (θ) for all θ Θ. Theorem 7 (MLE is Consistent) For a family of densities {f θ θ Θ}, under some regularity conditions on f θ (x), the plug-in estimator z (ϕ MLE (x n )) is a consistent estimator of z (θ), where z is a continuous function of θ. 47 / 50 I-Hsiang Wang IT Lecture 8 Part II

48 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Asymptotic Evaluations: Efficiency Motivated by Cramér-Rao inequality, we would like to see if the lower bound is asymptotically attainable. Definition 11 (Efficiency) A sequence of estimators {ζ n (x n ) n N} is asymptotically efficient if ( n (ζn (X n ) z (θ)) d ( ) 1 N 0, d J(θ) d θ z (θ)) 2 as n. Theorem 8 (MLE is Asymptotically Efficient) For a family of densities {f θ θ Θ}, under some regularity conditions on f θ (x), the plug-in estimator z (ϕ n (x n )) is an asymptotically efficient estimator of z (θ), where z is a continuous function of θ. 48 / 50 I-Hsiang Wang IT Lecture 8 Part II

49 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators Bayesian Estimators In the Bayesian setting, prior distribution of the parameter, π (θ) for θ Θ, is known, and hence the joint distribution of (θ, X) π (θ) P θ (x). Hence the goal is to find an estimator that minimize the Bayes Risk, defined as the average risk function, averaged over random θ: R (ϕ) E θ π [R θ (ϕ)] = E (θ,x) π Pθ [r (θ, ϕ (X))]. The optimal Bayesian estimator is ϕ ( ) arg min {R (ϕ)}. ϕ: X Θ Below we give some examples of Bayesian estimators, for several kinds of risks: 0-1 risk, squared-error risk, and absolute-error risk. 49 / 50 I-Hsiang Wang IT Lecture 8 Part II

50 Performance Evaluation of Estimators MLE, Asympototics, and Bayesian Estimators ( risk r θ, ˆθ ) { = 1 θ ˆθ } : This kind of risk is reasonable for finite X, and the Bayes risk is the same as average probability of error. The optimal Bayesian estimator is ϕ MAP (x) = arg max {π (θ) P θ (x)}, θ Θ called maximum a posterior probability (MAP) estimator. ( 2 Squared-error risk r θ, ˆθ ) = θ ˆθ 2 : The optimal Bayesian estimator is ϕ MMSE (x) = E θ π(θ X=x) [θ X = x], where π (θ X = x) π(θ)p θ (x) θ Θ π(θ)p θ(x) is the a posterior probability. ( 3 Absolute-error risk r θ, ˆθ ) = θ ˆθ : The optimal Bayesian estimator is the median of π (θ X = x). 50 / 50 I-Hsiang Wang IT Lecture 8 Part II

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection theory 101 ELEC-E5410 Signal Processing for Communications Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off

More information

Detection theory. H 0 : x[n] = w[n]

Detection theory. H 0 : x[n] = w[n] Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1 Adaptive Filtering and Change Detection Bo Wahlberg (KTH and Fredrik Gustafsson (LiTH Course Organization Lectures and compendium: Theory, Algorithms, Applications, Evaluation Toolbox and manual: Algorithms,

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS Parvathinathan Venkitasubramaniam, Gökhan Mergen, Lang Tong and Ananthram Swami ABSTRACT We study the problem of quantization for

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 2 Urbauer

More information

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation & Hypothesis Testing In doing research, we are usually interested in some feature of a population distribution (which

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Graduate Econometrics I: Unbiased Estimation

Graduate Econometrics I: Unbiased Estimation Graduate Econometrics I: Unbiased Estimation Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Unbiased Estimation

More information

Lecture 12 November 3

Lecture 12 November 3 STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1

More information

Detection Theory. Composite tests

Detection Theory. Composite tests Composite tests Chapter 5: Correction Thu I claimed that the above, which is the most general case, was captured by the below Thu Chapter 5: Correction Thu I claimed that the above, which is the most general

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture notes on statistical decision theory Econ 2110, fall 2013 Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic

More information

Optimum Joint Detection and Estimation

Optimum Joint Detection and Estimation 20 IEEE International Symposium on Information Theory Proceedings Optimum Joint Detection and Estimation George V. Moustakides Department of Electrical and Computer Engineering University of Patras, 26500

More information

Lecture 5 September 19

Lecture 5 September 19 IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

EIE6207: Estimation Theory

EIE6207: Estimation Theory EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M.

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

Hypothesis Testing - Frequentist

Hypothesis Testing - Frequentist Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur Module Random Processes Version, ECE IIT, Kharagpur Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur After reading this lesson, you will learn about Hypotheses testing

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Principles of Statistics

Principles of Statistics Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for

More information

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018 Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018 Topics: consistent estimators; sub-σ-fields and partial observations; Doob s theorem about sub-σ-field measurability;

More information

44 CHAPTER 2. BAYESIAN DECISION THEORY

44 CHAPTER 2. BAYESIAN DECISION THEORY 44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Anonymous Heterogeneous Distributed Detection: Optimal Decision Rules, Error Exponents, and the Price of Anonymity

Anonymous Heterogeneous Distributed Detection: Optimal Decision Rules, Error Exponents, and the Price of Anonymity Anonymous Heterogeneous Distributed Detection: Optimal Decision Rules, Error Exponents, and the Price of Anonymity Wei-Ning Chen and I-Hsiang Wang arxiv:805.03554v2 [cs.it] 29 Jul 208 Abstract We explore

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium November 12, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Tutorial: Statistical distance and Fisher information

Tutorial: Statistical distance and Fisher information Tutorial: Statistical distance and Fisher information Pieter Kok Department of Materials, Oxford University, Parks Road, Oxford OX1 3PH, UK Statistical distance We wish to construct a space of probability

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Bayesian statistics: Inference and decision theory

Bayesian statistics: Inference and decision theory Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

21.1 Lower bounds on minimax risk for functional estimation

21.1 Lower bounds on minimax risk for functional estimation ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell, Recall The Neyman-Pearson Lemma Neyman-Pearson Lemma: Let Θ = {θ 0, θ }, and let F θ0 (x) be the cdf of the random vector X under hypothesis and F θ (x) be its cdf under hypothesis. Assume that the cdfs

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

ECE534, Spring 2018: Solutions for Problem Set #3

ECE534, Spring 2018: Solutions for Problem Set #3 ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Detection and Estimation Chapter 1. Hypothesis Testing

Detection and Estimation Chapter 1. Hypothesis Testing Detection and Estimation Chapter 1. Hypothesis Testing Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2015 1/20 Syllabus Homework:

More information

Detection & Estimation Lecture 1

Detection & Estimation Lecture 1 Detection & Estimation Lecture 1 Intro, MVUE, CRLB Xiliang Luo General Course Information Textbooks & References Fundamentals of Statistical Signal Processing: Estimation Theory/Detection Theory, Steven

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.

Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson. s s Practitioner Course: Portfolio Optimization September 24, 2008 s The Goal of s The goal of estimation is to assign numerical values to the parameters of a probability model. Considerations There are

More information

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 )

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 ) Answers to the 8th problem set The likelihood ratio with which we worked in this problem set is: Λ(x) = f(x θ = θ 1 ) L(θ 1 ) =. f(x θ = θ 0 ) L(θ 0 ) With a lower-case x, this defines a function. With

More information

Topic 15: Simple Hypotheses

Topic 15: Simple Hypotheses Topic 15: November 10, 2009 In the simplest set-up for a statistical hypothesis, we consider two values θ 0, θ 1 in the parameter space. We write the test as H 0 : θ = θ 0 versus H 1 : θ = θ 1. H 0 is

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information