Chapter 8. Hypothesis Testing. Po-Ning Chen. Department of Communications Engineering. National Chiao-Tung University. Hsin Chu, Taiwan 30050

Size: px
Start display at page:

Download "Chapter 8. Hypothesis Testing. Po-Ning Chen. Department of Communications Engineering. National Chiao-Tung University. Hsin Chu, Taiwan 30050"

Transcription

1 Chapter 8 Hypothesis Testing Po-Ning Chen Department of Communications Engineering National Chiao-Tung University Hsin Chu, Taiwan 30050

2 Error exponent and divergence II:8-1 Definition 8.1 (exponent) Arealnumbera is said to be the exponent for a sequence of non-negative quantities {a n } n 1 converging to zero, if ( a = lim 1 ) n n log a n. In operation, exponent is an index for the exponential rate-of-convergence for sequence a n. We can say that for any γ>0, as n large enough. e n(a+γ) a n e n(a γ), Recall that in proving the channel coding theorem, the probability of decoding error for channel block codes can be made arbitrarily close to zero when the rate of the codes is less than channel capacity. Actually, this result can be mathematically written as: P e ( C n ) 0asn, provided R = lim sup n (1/n)log C n <C,where C n is the optimal code for block length n. From the theorem, we only know the decoding error will vanish as block length increases; but, it does not reveal how fast the decoding error approaches zero.

3 Error exponent and divergence II:8-2 In other words, we do not know the rate-of-convergence of the decoding error. Sometimes, this information is very important, especially for one to decide the sufficient block length to achieve some error bound. The first step of investigating the rate-of-convergence of the decoding error is to compute its exponent, if the decoding error decays to zero exponentially fast (it indeed does for memoryless channels.) This exponent, as a function of the rate, is in fact called the channel reliability function, and will be discussed in the next chapter. For the hypothesis testing problems, the type II error probability of fixed test level also decays to zero as the number of observations increases. As it turns out, its exponent is the divergence of the null hypothesis distribution against alternative hypothesis distribution.

4 Stein s lemma II:8-3 Lemma 8.2 (Stein s lemma) For a sequence of i.i.d. observations X n which is possibly drawn from either null hypothesis distribution P X n or alternative hypothesis distribution P ˆXn, the type II error satisfies ( ε (0, 1)) lim 1 n n log β n(ε) =D(P X P ˆX), where βn (ε) =min α n ε β n,andα n and β n represent the type I and type II errors respectively. Proof: [1. Forward Part] In the forward part, we prove that there exists an acceptance region for null hypothesis such that lim inf 1 n n log β n(ε) D(P X P ˆX). step 1: divergence typical set. For any δ>0, define divergence typical set as { A n (δ) = x n : 1 n log P Xn(x n } ) P ˆXn(x n ) D(P X P ˆX) <δ. Note that in divergence typical set, P ˆXn(x n ) P X n(x n )e n(d(p X P ˆX) δ).

5 Stein s lemma II:8-4 step 2: computation of type I error. By weak law of large number, P X n(a n (δ)) 1. Hence, for sufficiently large n. α n = P X n(a c n(δ)) <ε, step 3: computation of type II error. β n (ε) = P ˆXn(A n (δ)) = P ˆXn(x n ) x n A n (δ) x n A n (δ) e n(d(p X P ˆX) δ) P X n(x n )e n(d(p X P ˆX) δ) P X n(x n ) x n A n (δ) e n(d(p X P ˆX) δ) (1 α n ). Hence, 1 n log β n(ε) D(P X P ˆX) δ + 1 n log(1 α n),

6 Stein s lemma II:8-5 which implies lim inf 1 n n log β n(ε) D(P X P ˆX) δ. The above inequality is true for any δ>0. Therefore lim inf n 1 n log 2 β n (ε) D(P X P ˆX). [2. Converse Part] In the converse part, we will prove that for any acceptance region for null hypothesis B n satisfying the type I error constraint, i.e., its type II error β n (B n ) satisfies lim sup n α n (B n )=P X n(b c n) ε, 1 n log β n(b n ) D(P X P ˆX).

7 Stein s lemma II:8-6 Hence, β n (B n )=P ˆXn(B n ) P ˆXn(B n A n (δ)) P ˆXn(x n ) x n B n A n (δ) x n B n A n (δ) P X n(x n )e n(d(p X P ˆX)+δ) = e n(d(p X P ˆX)+δ) P X n(b n A n (δ)) e n(d(p X P ˆX)+δ) (1 P X n(bn) c P X n (A c n(δ))) e n(d(p X P ˆX)+δ) (1 α n (B n ) P X n (A c n (δ))) e n(d(p X P ˆX)+δ) (1 ε P X n (A c n(δ))). 1 n log β n(b n ) D(P X P ˆX)+δ + 1 n log (1 ε P X n (Ac n(δ))), which implies that lim sup 1 n n log β n(b n ) D(P X P ˆX)+δ. The above inequality is true for any δ>0. Therefore, lim sup n 1 n log β n(b n ) D(P X P ˆX).

8 Composition of sequence of i.i.d. observations II:8-7 Stein s lemma gives the exponent of the type II error probability for fixed test level. As a result, this exponent, which is the divergence of null hypothesis distribution against alternative hypothesis distribution, is independent of the type I error bound ε for i.i.d. observations. Specifically under i.i.d. environment, the probability for each sequence of x n depends only on its composition, which is defined as an X -dimensional vector, and is of the form (#1(x n ) ), #2(xn ),..., #k(xn ), n n n where X = {1, 2,...,k}, and#i(x n ) is the number of occurrences of symbol i in x n. The probability of x n is therefore can be written as P X n(x n )=P X (1) #1(xn) P X (2) #2(xn) P X (k) #k(xn). Note that #1(x n )+ +#k(x n )=n. Since the composition of sequence decides its probability deterministically, all sequences with the same composition should have the same statistical property, and hence should be treated the same when processing.

9 Composition of sequence of i.i.d. observations II:8-8 Instead of manipulating the sequences of observations based on the typical-setlike concept, we may focus on their compositions. As it turns out, such approach yields simpler proofs and better geometrical explanations for theories under i.i.d. environment. (It needs to be pointed out that for cases when composition alone can not decide the probability, this viewpoint does not seem to be effective.) Lemma 8.3 (polynomial bound on number of composition) The number of compositions increases polynomially fast, while the number of possible sequences increases exponentially fast. Proof: Let P n denotes the set of all possible compositions. P n (n +1) X

10 Composition of sequence of i.i.d. observations II:8-9 Lemma 8.4 (probability of sequences of the same composition) The probability of the sequences of composition C with respect to distribution P X n satisfies 1 (n +1) X e nd(p C P X ) P X n(c) e nd(p C P X ), where P C is the composition distribution for composition C, andc (by abusing notation without ambiguity) is also used to represent the set of all sequences (in X n ) of composition C. Theorem 8.5 (Sanov s Theorem) Let E n be the set that consists of all compositions over finite alphabet X, whose composition distribution belongs to P. Fix a sequence of product distribution P X n = n i=1 P X. Then, lim inf n 1 n log P X n(e n) inf P C P D(P P X), where P C is the composition distribution for composition C. If, in addition, for every distribution P in P, there exists a sequence of composition distributions P C1,P C2,P C3,... Psuch that lim sup n D(P Cn P X )=D(P P X ), then lim sup n 1 n log P X n(e n) inf P P D(P C P X ).

11 Geometrical interpretation for Sanov s theorem II:8-10 P min P 1 P 2 P X P P D(P P X) The geometric meaning for Sanov s theorem.

12 Geometrical interpretation for Sanov s theorem II:8-11 Example 8.6 Question: One wants to roughly estimate the probability that the average of the throws is greater or equal to 4, when tossing a fair dice n times. He observes that whether the requirement is satisfied only depends on the compositions of the observations. Let E n be the set of compositions which satisfy the requirement. { } 6 E n = C : ip C (i) 4. i=1 To minimize D(P C P X )forc E n, we can use the Lagrange multiplier technique (since divergence is convex with respect to the first argument.) with the constraints on P C being: 6 6 ip C (i) =k and P C (i) =1 for k =4, 5, 6,...,n. i=1 So it becomes to minimize: ( 6 6 ) ( 6 ) P C (i)log P C(i) P i=1 X (i) + λ 1 ip C (i) k + λ 2 P C (i) 1. i=1 i=1 i=1

13 Geometrical interpretation for Sanov s theorem II:8-12 By taking the derivatives, we found that the minimizer should be of the form for λ 1 is chosen to satisfy P C (i) = e λ 1 i 6 j=1 eλ 1 j, 6 ip C (i) =k. (8.1.1) i=1 Since the above is true for all k 4, it suffices to take the smallest one as our solution, i.e., k =4. Finally, by solving (8.1.1) for k = 4 numerically, the minimizer should be P C =(0.1031, , , , , ), and the exponent of the desired probability is D(P C P X )= nat. Consequently, P X n(e n ) e n.

14 Divergence typical set on composition II:8-13 Divergence typical set in Stein s lemma: { A n (δ) = x n : 1 n log P Xn(x n } ) P ˆXn(x n ) D(P X P ˆX) <δ. T n (δ) ={x n X n where C x n represents the composition of x n. P X n(t n (δ)) 1isjustifiedby 1 P X n(t n (δ)) = {C : D(P C P X )>δ} {C : D(P C P X )>δ} {C : D(P C P X )>δ} : D(P Cx n P X ) δ}, P X n(c) e nd(p C P X ), from Lemma 8.4. e nδ (n +1) X e nδ, cf. Lemma 8.3.

15 Universal source coding on composition II:8-14 Universal code for i.i.d. source: as n goes to infinity. 1 n f n : X n {0, 1} i i=1 x n X n P X n(x n )l(f n (x n )) H(X), Example 8.7 (universal encoding using compositions) Binary-index the compositions using log 2 (n +1) X bits, and denote this binary index for composition C by a(c). Let C x n denote the composition with respect to x n, i.e. x n C x n. Binary-index the elements in C using n H(P C ) bits, and denote this binary index for elements in C by b(c x n). For each composition C, we know that the number of sequence x n in C is at most 2 n H(P C) (Here, H(P C ) is measured in bits. I.e., the logarithmic base in entropy is 2. See the proof of Lemma 8.4).

16 Universal source coding on composition II:8-15 Define a universal encoding function f n as f n (x n ) = concatenation{a(c x n),b(c x n)}. Then this encoding rule is a universal code for all i.i.d. sources.

17 Universal source coding on composition II:8-16 Proof: l n = P X n(x n )l(a(c x n)) + P X n(x n )l(b(c x n)) x n X n x n X n P X n(x n ) log 2 (n +1) X + P X n(x n ) n H(P Cx n) x n X n x n X n X log 2 (n +1)+ P X n(c) n H(P C ). {C} 1 n l n X log 2(n +1) n + {C} P X n(c)h(p C ).

18 Universal source coding on composition = P X n(c)h(p C ) {C} {C T n (δ)} P X n(c)h(p C )+ {C T n (δ)} max H(P C)+ {C : D(P C P X ) δ/ log(2)} max {C : D(P C P X ) δ/ log(2)} H(P C)+ (From Lemma 8.4) max {C : D(P C P X ) δ/ log(2)} H(P C)+ max H(P C)+ {C : D(P C P X ) δ/ log(2)} P X n(c)h(p C ) {C : D(P C P X )>δ/ log(2)} {C : D(P C P X )>δ/ log(2)} {C : D(P C P X )>δ/ log(2)} {C : D(P C P X )>δ/ log(2)} max H(P C)+(n +1) X e nδ log 2 X, {C : D(P C P X ) δ/ log(2)} P X n(c)h(p C ) 2 nd(p C P X ) H(P C ), e nδ H(P C ) e nδ log 2 X II:8-17 where the second term of the last step vanishes as n. (Note that when base-2 logarithm is taken in divergence instead of natural logarithm, the range [0,δ]in

19 Universal source coding on composition II:8-18 T n (δ) should be replaced by [0,δ/log(2)].) It remains to show that max {C : D(P C P X ) δ/ log(2)} H(P C) H(X)+γ(δ), where γ(δ) only depends on δ, and approaches zero as δ 0....

20 Likelihood ratio versus divergence II:8-19 Recall that the Neyman-Pearson lemma indicates that the optimal test for two hypothesis is of the form P X n(x n ) P ˆXn(x n ) > < τ. (8.1.2) This is the likelihood ratio test and the quantity P X n(x n )/P ˆXn(x n ) is called the likelihood ratio. If a log operation is performed on both sides of (8.1.2), the test remains.

21 Likelihood ratio versus divergence II:8-20 log P X n(xn ) P ˆXn(x n ) = n i=1 = a X = a X = n a X [ = n Hence, (8.1.2) is equivalent to log P X(x i ) P ˆX(x i ) [#a(x n )] log P X(a) P ˆX(a) [np Cx n(a)] log P X(a) P ˆX(a) P Cx n(a)log P X(a) P Cx n(a) P Cx n(a) P ˆX(a) P Cx n(a)log P C x n(a) P ˆX(a) a X a X = n [ D(P Cx n P ˆX) D(P Cx n P X ) ] P Cx n(a)log P ] C x n(a) P X (a) D(P Cx n P ˆX) D(P Cx n P X ) > < 1 n log τ. (8.1.3) This equivalence means that for hypothesis testing, selection of the acceptance region can be made upon compositions instead of observations.

22 Likelihood ratio versus divergence II:8-21 In other words, the optimal decision function can be defined as: 0, if composition C is classified to belong to null hypothesis φ(c) = according to (8.1.3); 1, otherwise.

23 Exponent of Bayesian cost II:8-22 Randomization is of no help to Bayesian test. { φ(x n 0, with probability η; ) = 1, with probability 1 η; satisfies π 0 ηp X n(x n )+π 1 (1 η)p ˆXn(x n ) min{π 0 P X n(x n ),π 1 P ˆXn(x n )}. Now suppose the acceptance region for null hypothesis is A ={C : D(P C P ˆX) D(P C P X ) >τ }. Then by Sanov s theorem, the exponent of type II error, β n,is min C A D(P C P ˆX). Similarly, the exponent of type I error, α n is min C A c D(P C P X ).

24 Exponent of Bayesian cost II:8-23 Lagrange multiplier: by taking derivative of ( ) D(P X P ˆX)+λ(D(P X P ˆX) D(P X P X ) τ )+ν P X(x) 1 with respective to each P X(x), we have x X log P X(x) P +1+λlog P X(x) + ν =0. ˆX(x) P ˆX(x) Solving these equations, we obtain the optimal P X is of the form P λ X P X(x) =P λ (x) = a X 1 λ (x)p ˆX (x) PX(a)P λ 1 λ ˆX (a). The geometrical explanation for P λ is that it locates on the straight line between P X and P ˆX (in the sense of divergence measure) over the probability space.

25 Exponent of Bayesian cost II:8-24 D(P C P X )=D(P C P ˆX) τ P X D(P X P X ) P X D(P X P ˆX) P ˆX The divergence view on hypothesis testing.

26 Exponent of Bayesian cost II:8-25 When λ 0, P λ P ˆX; whenλ 1, P λ P X. Usually, P λ is named the tilted or twisted distribution. The value of λ is dependent on τ =(1/n)logτ. It is known from detection theory that the best τ for Bayes testing is π 1 /π 0, which is fixed. Therefore, τ 1 = lim n n log π 1 =0, π 0 which implies that the optimal exponent for Bayes error is the minimum of D(P λ P X ) subject to D(P λ P X )=D(P λ P ˆX), namely the mid-point (λ = 1/2) of the line segment (P X,PˆX) on probability space. This quantity is called the Chernoff bound.

27 Large deviations theory II:8-26 The large deviations theory basically consider the technique of computing the exponent of an exponentially decayed probability.

28 Tilted or twisted distribution II:8-27 Suppose the probability of a set P X (A n ) decreasing down to zero exponentially fact, and its exponent is equal to a>0. Over the probability space, let P denote the set of those distributions P X satisfying P X(A n ) exhibits zero exponent. Then applying similar concept as Sanov s theorem, we can expect that a = min P X P D(P X P X ). Now suppose the minimizer of the above function happens at f(p X) =τ for some constant τ and some differentiable function f( ), the minimizer should be of the form ( a X) P X(a) = P X(a)e λ f(p X) P X(a). P X (a )e λ f(p X) P X(a ) As a result, P X is the resultant distribution from P X exponentially twisted via the partial derivative of the function f. Note that P X is usually written as P (λ) X since it is generated by twisting P X with twisted factor λ. a X

29 Conventional twisted distribution II:8-28 The conventional definition for twisted distribution is based on the divergence function, i.e., f(p X) =D(P X P ˆX) D(P X P X ). Since D(P X P X ) P X(a) the twisted distribution becomes ( a X) P X(a) = = =log P X(a) P X (a) +1, P X (a)e λ log P (a) ˆX P X (a) a X PX 1 λ a X P X (a )e λ log P (a ˆX ) P X (a ) PX 1 α (a)p λˆx(a) (a)p λˆx(a)

30 Cramer s theorem II:8-29 Question: Consider a sequence of i.i.d. random variables, X n, and suppose that we are interested in the probability of the set { } X1 + + X n >τ. n Observe that (X X n )/n can be re-written as a P C (a). a X Therefore, the function f becomes f(p X) = a X ap X(a), and its partial derivative with respect to P X(a) isa. The resultant twisted distribution is ( a X) P (λ) X (a) = P X (a)e λa P X (a )e. λa a X So the exponent of P X n{(x X n )/n > τ} is min D(P X P X )= min {P X : D(P X P X )>τ} {P (λ) (λ) X : D(P X P X)>τ} D(P (λ) X P X).

31 Cramer s theorem II:8-30 It should be pointed out that a X P X(a )e λa is the moment generating function of P X. The conventional Cramer s result does not use the divergence. introduced the large deviation rate function, defined by Instead, it I X (x) =sup [θx log M X (θ)], (8.2.4) θ R where M X (θ) is the moment generating function of X. Using his statement, the exponent of the above probability is respectively lowerand upper bounded by inf I X(x) and inf I X(x). x τ x>τ An example on how to obtain the exponent bounds is illustrated in the next subsection.

32 Exponent and moment generating function II:8-31 A) Preliminaries : Observe that since E[X] =µ<λand E[ X µ 2 ] <, { } X1 + + X n Pr λ 0asn. n Hence, we can compute its rate of convergence (to zero). B) Upper bound of the probability : { } X1 + + X n Pr λ n = Pr{θ(X X n ) θnλ}, for any θ>0 = Pr{exp (θ(x X n )) exp (θnλ)} E [exp (θ(x X n ))] exp (θnλ) = En [exp (θx))] exp (θnλ) ( ) n MX (θ) =. exp (θλ) Hence, lim inf 1 { } n n Pr X1 + + X n >λ θλ log M X (θ). n

33 Exponent and moment generating function II:8-32 Since the above inequality holds for every θ>0, we have lim inf 1 { } n n Pr X1 + + X n >λ max n [θλ log M X(θ)] θ>0 = θ λ log M X (θ ), where θ > 0 is the optimizer of the maximum operation. (The positivity of θ can be easily verified by the concavity of the function θλ log M X (θ) in θ, and it derivative at θ =0equals(λ µ) which is strictly greater than 0.) Consequently, lim inf n 1 { n Pr X1 + + X n n } >λ θ λ log M X (θ ) = sup[θλ log M X (θ)] = I X (λ). θ R C) Lower bound of the probability :omit.

34 Theories on Large deviations II:8-33 In this section, we will derive inequalities on the exponent of the probability, Pr{Z n /n [a, b]}, which is a slight extension of the Gärtner-Ellis theorem.

35 Extension of Gärtner-Ellis upper bounds II:8-34 Definition 8.8 In this subsection, {Z n } n=1 arbitrary random variables. will denote an infinite sequence of Definition 8.9 Define ϕ n (θ) = 1 n log E [exp {θz n}] and ϕ(θ) = lim sup n ϕ n (θ). The sup-large deviation rate function of an arbitrary random sequence {Z n } n=1 is defined as Ī(x) = sup {θ R : ϕ(θ)> } [θx ϕ(θ)]. (8.3.5) The range of the supremum operation in (8.3.5) is always non-empty since ϕ(0) = 0, i.e. {θ R : ϕ(θ) > }. Hence, Ī(x) is always defined. With the above definition, the first extension theorem of Gärtner-Ellis can be proposed as follows. Theorem 8.10 For a, b Rand a b, { } 1 Zn lim sup log Pr [a, b] inf Ī(x). n n n x [a,b] The bound obtained in the above theorem is not in general tight.

36 Extension of Gärtner-Ellis upper bounds II:8-35 Example 8.11 Suppose that Pr{Z n =0} =1 e 2n,and Pr{Z n = 2n} = e 2n. Then from Definition 8.9, we have ϕ n (θ) = 1 n log E [ e ] θz n = 1 [ ] n log 1 e 2n + e (θ+1) 2 n, and ϕ(θ) = lim sup n ϕ n (θ) = Hence, {θ R: ϕ(θ) > } = R and { 0, for θ 1; 2(θ +1), for θ< 1. Ī(x) = sup[θx ϕ(θ)] θ R = sup[θx +2(θ +1)1{θ < 1)}] θ R { x, for 2 x 0; =, otherwise, where 1{ } represents the indicator function of a set.

37 Extension of Gärtner-Ellis upper bounds II:8-36 Consequently, by Theorem 8.10, { } 1 Zn lim sup log Pr [a, b] n n n inf Ī(x) x [a,b] 0, for 0 [a, b]; = b, for b [ 2, 0];, otherwise. The exponent of Pr{Z n /n [a, b]} in the above example is indeed given by { } 1 lim n n log P Zn Z n [a, b] = inf n x [a,b] I (x), where I (x) = 2, for x = 2; 0, for x =0;, otherwise. Thus, the upper bound obtained in Theorem 8.10 is not tight. (8.3.6)

38 Extension of Gärtner-Ellis upper bounds II:8-37 Definition 8.12 Define ϕ n (θ; h) = 1 n log E [exp { n θ h ( Zn n )}] and ϕ h (θ) = lim sup n ϕ n (θ; h), where h( ) is a given real-valued continuous function. The twisted sup-large deviation rate function of an arbitrary random sequence {Z n } n=1 with respect to a real-valued continuous function h( ) is defined as J h (x) = sup [θ h(x) ϕ h (θ)]. (8.3.7) {θ R : ϕ h (θ)> } Theorem 8.13 Suppose that h( ) is a real-valued continuous function. Then for a, b Rand a b, { } 1 Zn lim sup log Pr [a, b] inf J h (x). n n n x [a,b]

39 Extension of Gärtner-Ellis upper bounds II:8-38 Example 8.14 Let us, again, investigate the {Z n } n=1 Take h(x) = 1 2 (x +2)2 1. Then from Definition 8.12, we have defined in Example ϕ n (θ; h) = 1 n log E [exp {nθh(z n/n)}] = 1 n log [exp {nθ} exp {n(θ 2)} +exp{ n(θ +2)}], and ϕ h (θ) = lim sup n Hence, {θ R : ϕ n (θ; h) = { (θ +2), for θ 1; θ, for θ> 1. ϕ h (θ) > } = R and { 1 2 (x +2)2 +2, for x [ 4, 0];, otherwise. J h (x) =sup [θh(x) ϕ h (θ)] = θ R

40 Extension of Gärtner-Ellis upper bounds II:8-39 Consequently, by Theorem 8.13, { } 1 Zn lim sup log Pr [a, b] n n n inf J h (x) x [a,b] } (a +2)2 (b +2)2 min {, 2, for 4 a<b 0; 2 2 = 0, for a>0orb< 4;, otherwise. (8.3.8) For b ( 2, 0) and a [ 2 2b 4,b ), the upper bound attained in the previous example is strictly less than that given in Example 8.11, and hence, an improvement is obtained. However, for b ( 2, 0) and a< 2 2b 4, the upper bound in (8.3.8) is actually looser. Accordingly, we combine the two upper bounds from Examples 8.11 and 8.14 to get lim sup n 1 log Pr n { Zn n } [a, b] { max = inf x [a,b] } J h (x), inf Ī(x) x [a,b] 0, for 0 [a, b]; 1 2 (b +2)2 2, for b [ 2, 0];, otherwise.

41 Extension of Gärtner-Ellis upper bounds II:8-40 Theorem 8.15 For a, b Rand a b, { } 1 Zn lim sup log Pr [a, b] n n n inf x [a,b] J(x), where J(x) =sup h H Jh (x) andh is the set of all real-valued continuous functions. Example 8.16 Let us again study the {Z n } n=1 in Example 8.11 (also in Example 8.14). Suppose c>1. Take h c (x) =c 1 (x + c 2 ) 2 c, where c 1 = c + c and c 2 = 2 c +1 c +1+ c 1. Then from Definition 8.12, we have ϕ n (θ; h c ) = 1 { ( )}] [exp n log E Zn nθh c n = 1 log [exp {nθ} exp {n(θ 2)} +exp{ n(θ +2)}], n and ϕ hc (θ) = lim sup n ϕ n (θ; h c )= { (θ +2), for θ 1; θ, for θ> 1.

42 Extension of Gärtner-Ellis upper bounds II:8-41 Hence, {θ R : ϕ hc (θ) > } = R and From Theorem 8.15, J hc (x) = sup[θh c (x) ϕ hc (θ)] θ R { c1 (x + c = 2 ) 2 + c +1, for x [ 2c 2, 0];, otherwise. J(x) =sup h H J h (x) max{lim inf c J hc (x), Ī(x)} = I (x), where I (x) is defined in (8.3.6). Consequently, { } 1 Zn lim sup log Pr [a, b] inf J(x) n n n x [a,b] inf x [a,b] I (x) 0, if 0 [a, b]; = 2, if 2 [a, b] and0 [a, b];, otherwise. and a tight upper bound is finally obtained!

43 Extension of Gärtner-Ellis upper bounds II:8-42 Definition 8.17 Define ϕ h (θ) = lim inf n ϕ n (θ; h), where ϕ n (θ; h) wasdefined in Definition The twisted inf-large deviation rate function of an arbitrary random sequence {Z n } n=1 with respect to a real-valued continuous function h( ) is defined as [ ] J h (x) = sup θ h(x) ϕ h (θ). {θ R : ϕ h (θ)> } Theorem 8.18 For a, b Rand a b, { } 1 Zn lim inf log Pr [a, b] n n n inf x [a,b] J(x), where J(x) =sup h H J h (x) andh is the set of all real-valued continuous functions.

44 Extension of Gärtner-Ellis lower bounds II:8-43 Hope to know when lim sup n 1 log Pr n { Zn n } (a, b) inf x (a,b) J h (x). (8.3.9) Definition 8.19 Define the sup-gärtner-ellis set with respect to a real-valued continuous function h( ) as D h= D(θ; h) where D(θ; h) = {θ R : ϕ h (θ)> } { x R : lim sup t 0 ϕ h (θ + t) ϕ h (θ) t h(x) lim inf t 0 } ϕ h (θ) ϕ h (θ t). t Let us briefly remark on the sup-gärtner-ellis set defined above. It can be derived that the sup-gärtner-ellis set is reduced to D h= {x R : ϕ h (θ) =h(x)}, {θ R : ϕ h (θ)> } if the derivative ϕ h (θ) exists for all θ.

45 Extension of Gärtner-Ellis lower bounds II:8-44 Theorem 8.20 Suppose that h( ) is a real-valued continuous function. Then if (a, b) D h, { } 1 Zn lim sup log Pr (a, b) inf J h (x). n n n x (a,b) Example 8.21 Suppose Z n = X X n,where{x i } n i=1 are i.i.d. Gaussian random variables with mean 1 and variance 1 if n is even, and with mean 1 and variance 1 if n is odd. Then the exact large deviation rate formula Ī (x) that satisfies for all a<b, { } inf Ī 1 Zn (x) lim sup log Pr [a, b] x [a,b] n n n } 1 log Pr n { Zn lim sup (a, b) inf Ī (x) n n x (a,b) is Ī (x) = ( x 1)2. (8.3.10) 2 Case A: h(x) =x. For the affine h( ), ϕ n (θ) =θ + θ 2 /2whenn is even, and ϕ n (θ) = θ + θ 2 /2

46 Extension of Gärtner-Ellis lower bounds II:8-45 when n is odd. Hence, ϕ(θ) = θ + θ 2 /2, and ( ) ( ) D h = {v R : v =1+θ} {v R : v = 1+θ} θ>0 {v R : 1 v 1} = (1, ) (, 1). Therefore, Theorem 8.20 cannot be applied to any a and b with (a, b) [ 1, 1]. By deriving Ī(x) =sup{xθ ϕ(θ)} = θ R we obtain for any a (, 1) (1, ), 1 lim lim sup log Pr ε 0 n n lim ε 0 inf x (a ε,a+ε) θ<0 ( x 1) 2, for x > 1; 2 0, for x 1, { Zn n } (a ε, a + ε) Ī(x) = ( a 1)2, 2

47 Extension of Gärtner-Ellis lower bounds II:8-46 which can be shown tight by Theorem 8.13 (or directly by (8.3.10)). Note that the above inequality does not hold for any a ( 1, 1). To fill the gap, a different h( ) must be employed. Case B: h(x) = x. For n even, [ E [ = E = e nθh(z n/n) ] e nθ Z n/n a na ] e θx+nθa 1 2πn e (x n)2 /(2n) dx + = e nθ(θ 2+2a)/2 na + e nθ(θ+2 2a)/2 na na 1 2πn e [x n(1 θ)]2 /(2n) dx 1 2πn e [x n(1+θ)]2 /(2n) dx e θx nθa 1 2πn e (x n)2 /(2n) dx = e nθ(θ 2+2a)/2 Φ ( (θ + a 1) n ) + e nθ(θ+2 2a)/2 Φ ( (θ a +1) n ), where Φ( ) represents the unit Gaussian cdf.

48 Extension of Gärtner-Ellis lower bounds II:8-47 Similarly, for n odd, [ ] E e nθh(z n/n) = e nθ(θ+2+2a)/2 Φ ( (θ + a +1) n ) + e nθ(θ 2 2a)/2 Φ ( (θ a 1) n ). Observe that for any b R, Hence, lim n ϕ h (θ) = 1 n log Φ(b n)= 0, for b 0; b2, for b<0. 2 ( a 1)2, for θ< a 1; 2 θ[θ + 2(1 a )], for a 1 θ<0; 2 θ[θ + 2(1 + a )], for θ 0. 2

49 Extension of Gärtner-Ellis lower bounds II:8-48 Therefore, ( ) D h = {x R : x a = θ +1+ a } θ>0 ( ) {x R : x a = θ +1 a } θ<0 = (,a 1 a ) (a 1+ a,a+1 a ) (a +1+ a, ) and J h (x) = ( x a 1+ a ) 2, for a 1+ a <x<a+1 a ; 2 ( x a 1 a ) 2, for x>a+1+ a or x<a 1 a ; 2 0, otherwise. (8.3.11)

50 Extension of Gärtner-Ellis lower bounds II:8-49 We then apply Theorem 8.20 to obtain lim ε 0 lim sup n lim ε 0 inf x (a ε,a+ε) 1 log Pr n J h (x) = lim ε 0 (ε 1+ a ) 2 2 { Zn n } (a ε, a + ε) = ( a 1)2. 2 Note that the above lower bound is valid for any a ( 1, 1), and can be shown tight, again, by Theorem 8.13 (or directly by (8.3.10)). Finally, by combining the results of Cases A) and B), the true large deviation rate of {Z n } n 1 is completely characterized.

51 Extension of Gärtner-Ellis lower bounds II:8-50 Definition 8.22 Define the inf-gärtner-ellis set with respect to a real-valued continuous function h( ) as D h= D(θ; h) where D(θ; h) = Theorem 8.23 (a, b) D h, {θ R : ϕ h (θ)> } { ϕ x R : lim sup h (θ + t) ϕ h (θ) t 0 t h(x) lim inf t 0 } ϕ h (θ) ϕ h (θ t). t Suppose that h( ) is a real-valued continuous function. Then if lim inf n { 1 Zn log Pr n n } (a, b) inf x (a,b) J h(x).

52 Properties II:8-51 Property 8.24 Let Ī(x) and I(x) be the sup- and inf- large deviation rate functions of an infinite sequence of arbitrary random variables {Z n } n=1, respectively. Denote m n =(1/n)E[Z n ]. Let m = lim sup n m n and m= lim inf n m n.then Ī(x) and I(x) are both convex. Ī(x) is continuous over {x R : Ī(x) < }. Likewise, I(x) is continuous over {x R : I(x) < }. Ī(x) gives its minimum value 0 at m x m. 4. I(x) 0. But I(x) does not necessary give its minimum value at both x = m and x = m.

53 Properties II:8-52 Property 8.25 Suppose that h( ) is a real-valued continuous function. Let J h (x) and J h (x) be the corresponding twisted sup- and inf- large deviation rate functions, respectively. Denote m n (h) =E[h(Z n /n)]. Let Then m h = lim sup n m n (h) and m h = lim inf n m n(h). 1. Jh (x) 0, with equality holds if m h h(x) m h. 2. J h (x) 0, but J h (x) does not necessary give its minimum value at both x = m h and x = m h.

54 Probabilitic subexponential behavior II:8-53 subexponential behavior. a n =(1/n)exp{ 2n} and b n =(1/ n)exp{ 2n} have the same exponent, but contain different subexponential terms

55 Berry-Esseen theorem for compound i.i.d. sequence II:8-54 Berry-Esseen theorem states that the distribution of the sum of independent zero-mean random variables {X i } n i=1, normalized by the standard deviation of the sum, differs from the Gaussian distribution by at most Cr n /s 3 n,where s 2 n and r n are respectively sums of the marginal variances and the marginal absolute third moments, and C is an absolute constant. Specifically, for every a R, { } 1 Pr (X X n ) a Φ(a) s n C r n, (8.4.12) where Φ( ) represents the unit Gaussian cdf. The striking feature of this theorem is that the upper bound depends only on the variance and the absolute third moment, and hence, can provide a good asymptotic estimate based on only the first three moments. The absolute constant C is commonly 6. When {X n } n i=1 are identically distributed, in addition to independent, the absolute constant can be reduced to 3, and has been reported to be improved down to Definition: compound i.i.d. sequence. The samples that we concern in this section actually consists of two i.i.d. sequences (and, is therefore named compound i.i.d. sequence.) s 3 n

56 Berry-Esseen theorem for compound i.i.d. sequence II:8-55 Lemma 8.26 (smoothing lemma) Fix the bandlimited filtering function v T (x) = 1 cos(tx) πtx 2 = 2sin2 (Tx/2) πtx 2 = T ( ) [ ( )] Tx f 2π sinc2 = Four 1 Λ. 2π T/(2π) For any cumulative distribution function H( ) on the real line R, ( sup T (x) 1 x R 2 η 6 Tπ 2π h T ) 2π η, 2 where T (t) = [H(t x) Φ(t x)] v T (x)dx, η=sup H(x) Φ(x), x R and 1 cos(x) h(u) = u dx = π u sin(x) u +1 cos(u) u dx, if u 0; u x x 0, otherwise.

57 Berry-Esseen theorem for compound i.i.d. sequence II:8-56 Lemma 8.27 For any cumulative distribution function H( ) with characteristic function ϕ H (ζ), η 1 ( T ϕ π H (ζ) e (1/2)ζ2 dζ ζ + 12 Tπ 2π h T ) 2π η, 2 T where η and h( ) are defined in Lemma Theorem 8.28 (BE theorem for compound i.i.d. sequences) Let Y n = n i=1 X i be the sum of independent random variables, among which {X i } d i=1 are identically Gaussian distributed, and {X i } n i=d+1 necessarily Gaussian. are identically distributed but not Denote the mean-variance pair of X 1 and X d+1 by (µ, σ 2 )and(ˆµ, ˆσ 2 ), respectively. Define and ρ =E [ X 1 µ 3], ˆρ =E [ X d+1 ˆµ 3] s 2 n = Var[Y n ]=σ 2 d +ˆσ 2 (n d). Also denote the cdf of (Y n E[Y n ])/s n by H n ( ).

58 Berry-Esseen theorem for compound i.i.d. sequence II:8-57 Then for all y R, H n (y) Φ(y) C n,d 2 π (n d 1) ( 2(n d) 3 2 ) ˆρ ˆσ 2 s n, where C n,d is the unique positive number satisfying π 6 C n,d h(c n,d ) ( ) π 2(n d) 3 2 = 6π 12(n d 1) 2 ( 3 2 ) + 9 3/2 2(11 6 2), n d provided that n d 3.

59 Berry-Esseen theorem for compound i.i.d. sequence II: π 1 6 u h(u) u Function of (π/6)u h(u).

60 Berry-Esseen theorem for compound i.i.d. sequence II:8-59 By letting d = 0, the Berry-Esseen inequality for i.i.d. sequences can also be readily obtained from the previous Theorem. Corollary 8.29 (Berry-Esseen theorem for i.i.d. sequence) Let n Y n = i=1 be the sum of independent random variables with common marginal [ distribution. Denote the marginal mean and variance by (ˆµ, ˆσ 2 ). Define ˆρ =E X 1 ˆµ 3]. Also denote the cdf of (Y n nˆµ)/( nˆσ) byh n ( ). Then for all y R, X i H n (y) Φ(y) C n 2(n 1) π ( 2n 3 2 ) ˆρ ˆσ 3 n, where C n is the unique positive solution of ( ) π π 2n u h(u) = 6π 12(n 1) 2 ( 3 2 ) + 9 3/2 2(11 6 2), n provided that n 3.

61 Berry-Esseen theorem for compound i.i.d. sequence II:8-60 Let us briefly remark on the previous corollary. We observe from numericals that the quantity 2 (n 1) C n ( ) π 2n 3 2 is decreasing in n, and ranges from to (cf. The picture in slide II:8-62.) We can upperbound C n by the unique positive solution D n of π π 6 u h(u) = 6π 6 2 ( 3 2 ) + 9 3/2 2(11 6 2), n which is strictly decreasing in n. Hence, 2 (n 1) 2 (n 1) C n ( ) E n = Dn ( ), π 2n 3 2 π 2n 3 2 and the right-hand-side of the above inequality is strictly decreasing (since both D n and (n 1)/(2n 3 2) are decreasing) in n, and ranges from E 3 =4.1911,...,E 9 =2.0363,..., E 100 = to E = If the property of strictly decreasingness is preferred, one can use D n instead of C n in the Berry-Esseen inequality. Note that both C n and D n converges to as n goes to infinity.

62 Berry-Esseen theorem for compound i.i.d. sequence II:8-61 Numerical result shows that it lies below 2 when n 9, and is smaller than 1.68 as n 100. In other words, we can upperbound this quantity by 1.68 as n 100, and therefore, establish a better estimate of the original Berry-Esseen constant.

63 Berry-Esseen theorem for compound i.i.d. sequence II: C n 2 π (n 1) (2n 3 2) n The Berry-Esseen constant as a function of the sample size n. The sample size n is plotted in log-scale.

64 Generalized Neyman-Pearson Hypothesis Testing II:8-63 The general expression of the Neyman-Pearson type-ii error exponent subject to a constant bound on the type-i error has been proved for arbitrary observations. In this section, we will state the results in terms of the ε-inf/sup-divergence rates. Theorem 8.30 (Neyman-Pearson type-ii error exponent for a fixed test level) Consider a sequence of random observations which is assumed to have a probability distribution governed by either P X (null hypothesis) or P ˆX (alternative hypothesis). Then, the type-ii error exponent satisfies lim δ δ Dδ (X ˆX) lim sup n 1 n log β n(ε) D ε (X ˆX) lim D δ (X ˆX) lim inf 1 δ ε n n log β n (ε) D ε(x ˆX) where βn(ε) represents the minimum type-ii error probability subject to a fixed type-i error bound ε [0, 1). The general formula for Neyman-Pearson type-ii error exponent subject to an exponential test level has also been proved in terms of the ε-inf/sup-divergence rates.

65 Generalized Neyman-Pearson Hypothesis Testing II:8-64 Theorem 8.31 (Neyman-Pearson type-ii error exponent for an exponential test level) Fix s (0, 1) and ε [0, 1). It is possible to choose decision regions for a binary hypothesis testing problem with arbitrary datawords of blocklength n, (which are governed by either the null hypothesis distribution P X or the alternative hypothesis distribution P ˆX,) such that lim inf 1 n n log β n D ε ( ˆX (s) ˆX) and lim sup 1 n n log α n or D (1 ε) ( ˆX (s) X), (8.5.13) lim inf 1 n n log β n D ε ( ˆX (s) ˆX) and lim sup 1 n n log α n D (1 ε) ( ˆX (s) X), (8.5.14) where ˆX (s) exhibits the tilted distributions { P (s) defined by dp (s) ˆX n(xn ) = 1 Ω n (s) exp { ˆX n } n=1 } s log dp X n (x n ) dp ˆXn(x n ), dp ˆXn } (x n ) dp ˆXn(x n ). and { Ω n (s) = exp s log dp X n X n dp ˆXn Here, α n and β n are the type-i and type-ii error probabilities, respectively.

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

Second-Order Asymptotics in Information Theory

Second-Order Asymptotics in Information Theory Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015

More information

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University Section 27 The Central Limit Theorem Po-Ning Chen, Professor Institute of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 3000, R.O.C. Identically distributed summands 27- Central

More information

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels Channel Dispersion and Moderate Deviations Limits for Memoryless Channels Yury Polyanskiy and Sergio Verdú Abstract Recently, Altug and Wagner ] posed a question regarding the optimal behavior of the probability

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Lecture 21. Hypothesis Testing II

Lecture 21. Hypothesis Testing II Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric

More information

First and Last Name: 2. Correct The Mistake Determine whether these equations are false, and if so write the correct answer.

First and Last Name: 2. Correct The Mistake Determine whether these equations are false, and if so write the correct answer. . Correct The Mistake Determine whether these equations are false, and if so write the correct answer. ( x ( x (a ln + ln = ln(x (b e x e y = e xy (c (d d dx cos(4x = sin(4x 0 dx xe x = (a This is an incorrect

More information

LECTURE 10. Last time: Lecture outline

LECTURE 10. Last time: Lecture outline LECTURE 10 Joint AEP Coding Theorem Last time: Error Exponents Lecture outline Strong Coding Theorem Reading: Gallager, Chapter 5. Review Joint AEP A ( ɛ n) (X) A ( ɛ n) (Y ) vs. A ( ɛ n) (X, Y ) 2 nh(x)

More information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Masahito Hayashi 1, Marco Tomamichel 2 1 Graduate School of Mathematics, Nagoya University, and Centre for Quantum

More information

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 = Chapter 5 Sequences and series 5. Sequences Definition 5. (Sequence). A sequence is a function which is defined on the set N of natural numbers. Since such a function is uniquely determined by its values

More information

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback Vincent Y. F. Tan (NUS) Joint work with Silas L. Fong (Toronto) 2017 Information Theory Workshop, Kaohsiung,

More information

Scientific Computing

Scientific Computing 2301678 Scientific Computing Chapter 2 Interpolation and Approximation Paisan Nakmahachalasint Paisan.N@chula.ac.th Chapter 2 Interpolation and Approximation p. 1/66 Contents 1. Polynomial interpolation

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( ) Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments

More information

A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks

A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks Marco Tomamichel, Masahito Hayashi arxiv: 1208.1478 Also discussing results of: Second Order Asymptotics for Quantum

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Self-normalized Cramér-Type Large Deviations for Independent Random Variables Self-normalized Cramér-Type Large Deviations for Independent Random Variables Qi-Man Shao National University of Singapore and University of Oregon qmshao@darkwing.uoregon.edu 1. Introduction Let X, X

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

Arimoto Channel Coding Converse and Rényi Divergence

Arimoto Channel Coding Converse and Rényi Divergence Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing arxiv:uant-ph/9906090 v 24 Jun 999 Tomohiro Ogawa and Hiroshi Nagaoka Abstract The hypothesis testing problem of two uantum states is

More information

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Vincent Tan, Matt Johnson, Alan S. Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems,

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Gärtner-Ellis Theorem and applications.

Gärtner-Ellis Theorem and applications. Gärtner-Ellis Theorem and applications. Elena Kosygina July 25, 208 In this lecture we turn to the non-i.i.d. case and discuss Gärtner-Ellis theorem. As an application, we study Curie-Weiss model with

More information

Part 3.3 Differentiation Taylor Polynomials

Part 3.3 Differentiation Taylor Polynomials Part 3.3 Differentiation 3..3.1 Taylor Polynomials Definition 3.3.1 Taylor 1715 and Maclaurin 1742) If a is a fixed number, and f is a function whose first n derivatives exist at a then the Taylor polynomial

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Selected Exercises on Expectations and Some Probability Inequalities

Selected Exercises on Expectations and Some Probability Inequalities Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ

More information

Section 10: Role of influence functions in characterizing large sample efficiency

Section 10: Role of influence functions in characterizing large sample efficiency Section 0: Role of influence functions in characterizing large sample efficiency. Recall that large sample efficiency (of the MLE) can refer only to a class of regular estimators. 2. To show this here,

More information

AP Calculus Chapter 9: Infinite Series

AP Calculus Chapter 9: Infinite Series AP Calculus Chapter 9: Infinite Series 9. Sequences a, a 2, a 3, a 4, a 5,... Sequence: A function whose domain is the set of positive integers n = 2 3 4 a n = a a 2 a 3 a 4 terms of the sequence Begin

More information

ECE531 Lecture 4b: Composite Hypothesis Testing

ECE531 Lecture 4b: Composite Hypothesis Testing ECE531 Lecture 4b: Composite Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 16-February-2011 Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 1 / 44 Introduction

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

Arimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract

Arimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract Arimoto-Rényi Conditional Entropy and Bayesian M-ary Hypothesis Testing Igal Sason Sergio Verdú Abstract This paper gives upper and lower bounds on the minimum error probability of Bayesian M-ary hypothesis

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Taylor Series. richard/math230 These notes are taken from Calculus Vol I, by Tom M. Apostol,

Taylor Series.  richard/math230 These notes are taken from Calculus Vol I, by Tom M. Apostol, Taylor Series Professor Richard Blecksmith richard@math.niu.edu Dept. of Mathematical Sciences Northern Illinois University http://math.niu.edu/ richard/math230 These notes are taken from Calculus Vol

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 (continued) Lecture 8 Key points in probability CLT CLT examples Prior vs Likelihood Box & Tiao

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized

More information

ERRATA: Probabilistic Techniques in Analysis

ERRATA: Probabilistic Techniques in Analysis ERRATA: Probabilistic Techniques in Analysis ERRATA 1 Updated April 25, 26 Page 3, line 13. A 1,..., A n are independent if P(A i1 A ij ) = P(A 1 ) P(A ij ) for every subset {i 1,..., i j } of {1,...,

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Solution Set for Homework #1

Solution Set for Homework #1 CS 683 Spring 07 Learning, Games, and Electronic Markets Solution Set for Homework #1 1. Suppose x and y are real numbers and x > y. Prove that e x > ex e y x y > e y. Solution: Let f(s = e s. By the mean

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

CALCULUS JIA-MING (FRANK) LIOU

CALCULUS JIA-MING (FRANK) LIOU CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion

More information

The Maximum-Likelihood Soft-Decision Sequential Decoding Algorithms for Convolutional Codes

The Maximum-Likelihood Soft-Decision Sequential Decoding Algorithms for Convolutional Codes The Maximum-Likelihood Soft-Decision Sequential Decoding Algorithms for Convolutional Codes Prepared by Hong-Bin Wu Directed by Prof. Po-Ning Chen In Partial Fulfillment of the Requirements For the Degree

More information

MAT 135B Midterm 1 Solutions

MAT 135B Midterm 1 Solutions MAT 35B Midterm Solutions Last Name (PRINT): First Name (PRINT): Student ID #: Section: Instructions:. Do not open your test until you are told to begin. 2. Use a pen to print your name in the spaces above.

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

Universal Anytime Codes: An approach to uncertain channels in control

Universal Anytime Codes: An approach to uncertain channels in control Universal Anytime Codes: An approach to uncertain channels in control paper by Stark Draper and Anant Sahai presented by Sekhar Tatikonda Wireless Foundations Department of Electrical Engineering and Computer

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Maxim Raginsky and Igal Sason ISIT 2013, Istanbul, Turkey Capacity-Achieving Channel Codes The set-up DMC

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

18.2 Continuous Alphabet (discrete-time, memoryless) Channel 0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Dispersion of the Gilbert-Elliott Channel

Dispersion of the Gilbert-Elliott Channel Dispersion of the Gilbert-Elliott Channel Yury Polyanskiy Email: ypolyans@princeton.edu H. Vincent Poor Email: poor@princeton.edu Sergio Verdú Email: verdu@princeton.edu Abstract Channel dispersion plays

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction

More information

LECTURE NOTES 57. Lecture 9

LECTURE NOTES 57. Lecture 9 LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Lecture 2. Capacity of the Gaussian channel

Lecture 2. Capacity of the Gaussian channel Spring, 207 5237S, Wireless Communications II 2. Lecture 2 Capacity of the Gaussian channel Review on basic concepts in inf. theory ( Cover&Thomas: Elements of Inf. Theory, Tse&Viswanath: Appendix B) AWGN

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Evan Kwiatkowski, Jan Mandel University of Colorado Denver December 11, 2014 OUTLINE 2 Data Assimilation Bayesian Estimation

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

A proof of a partition conjecture of Bateman and Erdős

A proof of a partition conjecture of Bateman and Erdős proof of a partition conjecture of Bateman and Erdős Jason P. Bell Department of Mathematics University of California, San Diego La Jolla C, 92093-0112. US jbell@math.ucsd.edu 1 Proposed Running Head:

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information