Large Deviations Performance of Interval Algorithm for Random Number Generation

Size: px

Start display at page:

Download "Large Deviations Performance of Interval Algorithm for Random Number Generation"

Kristopher Hunter
6 years ago
Views:

1 Large Deviations Performance of Interval Algorithm for Random Number Generation Akisato KIMURA Tomohiko UYEMATSU February 22, 999 No. AK-TR Abstract We investigate large deviations performance of interval algorithm for random number generation. First, we show that the length of input sequence per the length of output sequence approaches to the ratio of entropies of input and output distributions almost surely. Next, we investigate large deviations performance especially for intrinsic randomness. We show that the length of output fair random bits per input sample approaches to the entropy of the input source almost surely, and we can determine the exponent in this case. Further, we consider to obtain the fixed number of fair random bits from the input sequence with fixed length. We show that the approximation error measured by the variational distance and divergence vanishes exponentially as the length of input sequence tends to infinity, if the number of output random bits per input sample is below the entropy of the source. Contrarily, the approximation error measured by the variational distance approaches to two exponentially and the approximation error measured by the divergence approaches to infinity linearly, if the number of random bits per input sample is above the entropy of the source. Dept. of Electrical and Electronic Eng., Tokyo Institute of Technology, 2-2- Ookayama, Meguro-ku, Tokyo , Japan

2 I. Introduction Random number generation is a problem of simulating some prescribed target distribution by using a given source. This problem has been investigated in computer science, and has a close relation to information theory, 2, 3]. Some practical algorithms for random number generation have been proposed so far, i.e., 3, 4, 5]. In this paper, we consider the interval algorithm proposed by Han and Hoshi 3]. Performance of the interval algorithm has already been investigated in 3, 6, 7]. Han and Hoshi 3] have showed that the expected length of input sequence per the length of output sequence can be characterized by the ratio of entropies of the input and output distributions. Uyematsu and Kanaya 6] have investigated large deviations performance of the interval algorithm where the distribution of input source is uniform. Further, Uchida and Han 7] have extended the result of Uyematsu and Kanaya to stationary ergodic Markov process. We investigate large deviations performance, where the input and output distributions is both non-uniform. First, we show that the length of input sequence per the length of output sequence approaches to the ratio of entropies of input and output distributions almost surely. Next, we investigate large deviations performance especially for intrinsic randomness. We show that the length of output fair random bits per input sample approaches to the entropy of the input source almost surely, and we can determine the exponent in this case. Further, we consider to obtain the fixed number of fair random bits from the input sequence with fixed length. We show that the approximation error measured by the variational distance and divergence vanishes exponentially as the length of input sequence tends to infinity, if the number of random bits per input sample is below the entropy of the source. Contrarily, the approximation error measured by the variational distance approaches to two exponentially and the approximation error measured by the divergence approaches to infinity linearly, if the number of random bits per input sample is above the entropy of the source. II. Basic Definitions (a) Discrete Memoryless Source Let X be a finite set. We denote by M(X ) the set of all probability distributions on X. Throughout this paper, by a source X with alphabet X, we mean a discrete memoryless source (DMS) of distribution P X M(X ). To denote a source, we will use both notations X and P X interchangeably. 2

3 For random variable X which has a distribution P X, we shall denote this entropy as H(P X )andh(x), interchangeably. H(P X ) = x X P X (x)logp X (x). Further, for arbitrary distributions P, Q M(X ), we denote by D(P Q) the information divergence D(P Q) = x X P (x)log P (x) Q(x). Lastly, we denote by d(p, Q) thevariational distance or l distance between two distributions P and Q on X d(p, Q) = P (x) Q(x). x X From now on, all logarithms and exponentials are to the base two. (b) Type of Sequence The type of a sequence x X n is defined as a distribution Px M(X ), where Px(a) isgivenby Px(a) = (number of occurrences of a X in x). () n We shall write P n or P for the set of types of sequences in X n. We denoted by TP n or T P the set of sequences of type P in X n. On the contrary for a distribution P M(X ), if T P then we denote by P the type of sequences in X n. We introduce some well-known facts, cf. Csiszár-Körner 8]: For the set of types in X n,wehave P n (n +) X (2) where denotes the cardinality of the set. For the set of sequences of type P in X n, (n +) X exp(nh(p )) T P exp(nh(p )) (3) If x T P,wethenhave From (3)(4), Q n (x) =exp n{d(p Q)+H(P )}]. (4) (n +) X exp( nd(p Q)) Q n (T P ) exp( nd(p Q)) (5) 3

4 (c) Intrinsic Randomness In this paper, we especially investigate the problem to generate a uniform random number with as large size as possible from a general source X = {X n } n=. This problem is called intrinsic randomness problem 9]. Here, we shall introduce basic definitions and a result for intrinsic randomness problem. Definition : For arbitrary source X = {X n } n=,rateris achievable Intrinsic Randomness (IR) rate if and only if there exists a map ϕ n : X n U Mn such that lim inf n n log M n R lim d(u M n n,ϕ n (X n )) = 0, where U Mn = {, 2,,Mn } and U Mn is a uniform distribution on U Mn. Definition 2 (sup achievable IR rate): S(X) =sup{r R is achievable IR rate} As for the characterization of IR rate, Vembu and Verdú 2]provedthe following fundamental theorem. Theorem : For any stationary source X, where H(X) is the entropy rate of X. S(X) =H(X) (6) III. Interval Algorithm In this chapter, we introduce the interval algorithm for random number generation, proposed by Han and Hoshi 3]. Let us consider to produce an i.i.d. random sequence Y n =(Y,Y 2,,Y n ). Each random variable Y i (i =, 2,,n) is subject to a generic distribution q =(q,q 2,,q N ). We generate this sequence by using an i.i.d. random sequence X,X 2,, with a generic distribution p =(p,p 2,,p M ). Interval Algorithm for Generating Random Process 4

5 a) Partition an unit interval 0, ) into N disjoint subinterval J(),J(2),,J(N) such that b) Set J(i) = Q i,q i ) i =, 2,,N i Q i = q k i =, 2,,N; Q 0 =0. P j = k= j p k j =, 2,,M; P 0 =0. k= 2) Set s = t = λ (null string), α s = γ t =0,β s = δ t =,I(s) =α s,β s ), J(t) =γ t,δ t ), and m =. 3) Obtain an output symbol from the source X to have a value a {, 2,,M}, and generate the subinterval of I(s) where I(sa) = α sa,β sa ) α sa = α s +(β s α s )P a β sa = α s +(β s α s )P a. 4a) If I(sa) is entirely contained in some J(ti) (i =, 2,,N), then output i as the value of the mth random number Y m and set t = ti. Otherwise, go to 5). 4b) If m = n then stop the algorithm. Otherwise, partition the interval J(t) γ t,δ t )inton disjoint subinterval J(t),J(t2),,J(tN) such that where J(tj) = γ tj,δ tj ) j =, 2,,N γ tj = γ t +(δ t γ t )Q j δ tj = γ t +(δ t γ t )Q j and set m = m + and go to 4a). 5

6 5) Set s = sa andgoto3). Han and Hoshi have shown that E(L) lim = H(p) n n H(q), (7) where E(L) is the average length of input sequence to obtain output sequences of length n. IV. Almost Sure Convergence Theorem We shall investigate large deviations performance of the interval algorithm for random number generation. Let us consider to produce an i.i.d. random sequence Y n =(Y,Y 2,,Y n ). Each random variable Y i (i =, 2,,n) is subject to a generic distribution P Y on Y. We generate this sequence by using an i.i.d. random sequence X,X 2, with a generic distribution P X on X.WedenotebyT n (x, y) the length of input sequence x X necessary to generate y Y n. Then, we obtain the following theorem: Theorem 2: lim n n T n(x, Y n )= H(Y ) a.s. (8) H(X) Before the proof of theorem, we shall give a necessary definition and some lemmas for strongly typical sequence 4]. Definition 3: LetPx be a type of the sequence x X n. For δ>0anda distribution P M(X ), if Px satisfies D(Px P ) δ then we call x X n P -typical sequence or strongly typical sequence. Further, when a random variable X with alphabet X has the distribution P,wealsocallx X n X-typical sequence. We shall write Tδ n(p )ort δ(p ) for the set of P -typical sequences in X n, and Tδ n(x) ort δ(x) for the set of X-typical sequences in X n. Lemma : For every 0 <δ 8,ifx T δ n (P )then n log P n (x) H(P ) γ n (9) where γ n = δ 2δ log 2δ X. (0) 6

7 Lemma 2: Suppose that a sequence {δ n } satisfies lim δ n = 0, lim nδ n = n n. For every P M(X ), P n (T δn (P )) ɛ n () where { ɛ n =exp ( n δ n ) } X log(n +). (2) n Proof of Theorem 2: Suppose that a sequence {δ n } satisfies lim δ n = 0 and lim nδ n =. n n Due to the nature of the interval algorithm, we can correspond each y Y n to a distinct subinterval J(y) of0, ) with its width PY n (y). On the other hand, we can also correspond each x X to I(x) withitswidth (x). Then, each subinterval I(x) corresponds to the input sequence. If the subinterval I(x) is included in some J(y), then the input sequence corresponding to I(x) can terminate the algorithm. PX (a)achievable part Assume that we don t have to terminate the algorithm for x / T δ (X). 7

8 Then, by using Lemma,2 and (2)-(5), we obtain { } Pr n T n(x, Y n ) R 2 max (x) y Y n : PY n(y) min x T δ (X) P X (x) + y Y n : PY n(y) min x T δ (X) P X (x) D(Q P Y )+H(Q)R(H(X)+γ n ) + D(Q P Y )+H(Q) R(H(X)+γ n ) D(Q P Y )+H(Q)R(H(X)+γ n ) + D(Q P Y )+H(Q) R(H(X)+γ n) P X x T δ (X) P n Y (y)+ Q P n 2exp n{d(q P Y ) 2Rγ n x X : x T δ (X) PX (x) 2exp{ n(r H(X) H(Q) Rγ n)} exp( nd(q P Y )) + ɛ n 2exp{ n(r H(X) H(Q) Rγ n )} exp{ n(d(q P Y ) 2Rγ n )} + ɛ n + R H(X) H(Q) D(Q P Y )+Rγ n + }]+ɛ n 2(n +) Y exp n min {D(Q P Y ) 2Rγ n Q M(Y) + R H(X) H(Q) D(Q P Y )+Rγ n + }]+ɛ n where x + =max{0,x}. Here,letbe E r (R, P X,P Y ) = min Q M(X ) {D(Q P Y )+ R H(X) H(Q) D(Q P Y ) + }. E r (R, P X,P Y ) = 0 if and only if Q = P Y and RH(X) H(Q), i.e. R H(Y ) H(X). This implies that E r(r, P X,P Y ) > 0 if and only if R> H(Y ) H(X). From this for every δ>0, there exists a sufficiently large n 0 such that γ n < δh(x) H(X) H(Y ) +δ 8

9 for all n n 0. Therefore, for all n n 0 we can see min {D(Q P Y ) 2Rγ n + R H(X) H(Q) D(Q P Y )+Rγ n + } Q M(X ) { ( ) H(Y ) = min D(Q P Y ) 2 Q M(Y) H(X) + δ γ n ( ) + H(Y ) H(Y ) H(Q)+δH(X) D(Q P Y )+ H(X) + δ + γ n > 0 R= H(Y ) H(X) +δ + } Hence, n= { Pr n T n(x, Y n ) H(Y ) } H(X) δ < (3) (b)converse part If n is sufficiently large, we can δ n 8 for all n n. Thus from Lemma, if x T δ (X) then exp{ (H(X)+γ )}PX (x) exp{ (H(X) γ )}. Let N(X ) be an integer such that exp{(h(x)+2γ )} N(X ) exp{(h(x)+3γ )}. It is easy to see PX (x) > N(X ) for all x T δ (X). Assume that all x / T δ (X) stop the algorithm, by using Lemma,2 and (2)-(5) we obtain { } Pr n T n(x, Y n ) R y Y n : P n Y (y) N(X ) P n Y (y)+ H(Q)+D(Q P Y ) n log(n(x )) H(Q)+D(Q P Y )R(H(X)+3γ n ) x X : x/ T δ (X) PX (x) exp( nd(q P Y )) + ɛ n exp( nd(q P Y )) + ɛ n (n +) Y exp{ n min Q M(Y): H(Q)+D(Q P Y )R(H(X)+3γ n ) D(Q P Y )} + ɛ n. 9

10 Here, let be F (R, P X,P Y ) = min D(Q P Y ). Q M(Y): H(Q)+D(Q P Y )R H(X) F (R, P X,P Y ) = 0 if and only if Q = P Y and H(Q) RH(X), i.e. R H(Y ) H(X). This implies that F (R, P X,P Y ) > 0 if and only if R< H(Y ) H(X). From this, for every δ > 0thereexistsasufficientlylargen 2 satisfying γ n < δh(x) H(Y ) for all n n 2. Therefore, for all n n 2 we can see 3 H(X) δ min D(Q P Y ) Q M(Y): H(Q)+D(Q P Y )R(H(X)+3γ n) H(Y ) R= H(X) δ Hence, = min Q M(Y): > 0 H(Q)+D(Q P Y )H(Y ) δh(x)+3γ n n= D(Q P Y ) H(Y ) H(X) δ { Pr n T n(x, Y ) H(Y ) } H(X) δ < (4) From (3)(4), by using the Borel-Cantelli s principle (e.g. 0]) we can obtain (8). Contrarily, let us consider to generate an i.i.d. random sequence Y,Y 2, by using an i.i.d. random sequence X n =(X,X 2,,X n ). We denote by L n (X n,y) the length of the generated sequence. Then from Theorem 2, we immediately obtain the following corollary. Corollary : lim n n L n(x n,y)= H(X) H(Y ) a.s. (5) V. Almost Sure Convergence of Number of Fair Bits per Input Sample In above chapter, we showed that the length of input sequence per the length of output sequence converges to the ratio of entropies of the input and output distributions almost surely. To investigate asymptotic properties, we consider more restricted case. Let us consider to produce a sequence of fair bits by using an i.i.d. random 0

11 sequence X n =(X,X 2,,X n )oflengthn. Each random variable X i (i =, 2,,n) is subject to a generic distribution P X on X.WedenotebyL n (x) the number of generated fair bits from the input sequence x X n. Here, we define the following functions: E r (R, P X ) = min H(Q)R D(Q P X ), (6) E sp (R, P X ) = min D(Q P X ), (7) D(Q P X )+H(Q)R F (R, P X ) = min D(Q P X ), (8) D(Q P X )+H(Q) R G(R, P X ) = min H(Q) R D(Q P X ). (9) Then, we obtain the following large deviations performances of interval algorithm: Theorem 3: ForR>0, lim inf { n n log Pr n L n(x n ) R} ] E r (R, P X ). (20) For R>R min = max x X log P X (x) lim sup { n n log Pr n L n(x n ) R} ] E sp (R, P X ). (2) Further, E r (R, P X ) > 0 if and only if R<H(X), E sp (R, P X ) > 0ifand only if R min <R<H(X), and E r (R, P X ) <E sp (R, P X )forr<h(x). Proof: We can show this theorem in a similar manner as Theorem 2. We can correspond each x X n to a distinct subinterval I(x) of0, ) with its width PX n (x). Partition a unit interval 0, ) into exp() subintervals J i = (i ) exp( ), iexp( )) i =, 2,, exp(). First, we shall show (20). The number of input sequences not to stop

12 the algorithm is not more than exp(). Then we obtain { } Pr n L n(x n ) R ( ) T Q, exp() PX n (x) min x T Q Q P n min = ], exp{ n(h(q) R)} exp( nd(q P X )) exp n{d(q P X )+ H(Q) R + }] Q P n ] (n +) X exp n min {D(Q P X)+ H(Q) R + } Q M(X ) which implies lim inf { }] n n log Pr n L n(x n ) R min {D(Q P X)+ H(Q) R + }. Q M(X ) Note that D(Q P X )+H(Q) R resp. H(Q)] is a linear (resp. convex) function of Q. Then, min H(Q) R = min {D(Q P X )+ H(Q) R + } H(Q)=R = min H(Q)=R {D(Q P X )+ H(Q) R + } D(Q P X ). Hence, we obtain (20). E r (R, P X ) = 0 if and only if Q = P X and H(Q) R, i.e. R H(X). This implies that E r (R, P X ) > 0 if and only if R<H(X). 2

13 Next, we show (2). We have { } Pr n L n(x n ) R n X (x) P x X n : PX n (x) exp( ) H(Q)+D(Q P X )R (n +) X exp( nd(q P X )) (n +) X exp{ n min D(Q P X )} H(Q)+D(Q P X )R which implies (2) for R>R min. It should be noted that the minimum of (7) is taken over the non-empty set of Q if R>R min. E sp (R, P X ) = 0 if and only if Q = P X and H(Q) R, i.e. R H(X). This implies that E sp (R, P X ) > 0 if and only if R min <R<H(X). Theorem 4: For0<R<R max = min x X lim inf n log P X(x), n log Pr { n L n(x n ) R} ] F (R, P X ). (22) For 0 <R<log X, lim sup { n n log Pr n L n(x n ) R} ] G(R, P X ). (23) Further, F (R, P X ) > 0 if and only if H(X) <R<R max, G(R, P X ) > 0if and only if H(X) <R<log X, andf (R, P X ) <G(R, P X )forr>h(x). Proof: We can show this theorem in a similar manner as the proof of Theorem 3. First, we shall show (22). We have { } Pr n L n(x n n ) R X (x) P x X n : PX n (x)exp( ) H(Q)+D(Q P X ) R exp( nd(q P X )) (n +) X exp{ nf (R, P X )} 3

14 which implies (22) for R<R max. It should be noted that the minimum of (8) is taken over the non-empty set of Q if R<R max. F (R, P X ) = 0 if and only if Q = P X and H(Q) R, i.e. R H(X). This implies that F (R, P X ) > 0 if and only if H(X) <R<R max. Next, we show (23). We have { } Pr n L n(x n ) R x T Q, T Q 2exp() 2 T Q P n X(x) 2 (n +) X T Q 2exp() 2 (n +) X H(Q) R+ log 2(n+) X n 2 (n +) X exp{ n min exp( nd(q P X )) exp( nd(q P X )) H(Q) R+ log 2(n+) X n D(Q P X )}. By the continuity of divergence, we can obtain (23) for R<log X. It should be noted that the minimum of (9) is taken over the non-empty set of Q if R<log X. G(R, P X ) = 0 if and only if Q = P X and H(Q) R, i.e. R H(X). This implies that G(R, P X ) > 0 if and only if H(X) <R<log X. Remark : Let us consider to produce a specified number of fair bits by using a sequence from the source X. We denote by T n (X) the length of input sequence to obtain fair bits of length n. Then, we obtain similar relations as (20)-(23). For example, corresponding to (20), we have lim inf { n n log Pr n T n(x) R} ] Ẽr(R, P X ) (24) where Ẽ r (R, P X )= min RD(Q P X ). (25) H(Q)/R VI. Error Exponent for Intrinsic Randomness 4

15 In this chapter, let us consider to produce fixed number of random bits with an input sequence of length n. In this case, we cannot generate fair bits exactly but approximately. First, we modify the interval algorithm for generating random process so that the algorithm outputs a specified sequence Y whenever the algorithm does not stop with an input sequence of length n, where Y = {0, }. The modified algorithm can be described below. Modified Interval Algorithm for Generating Fair Bits with Fixed Input Length a) Partition an unit interval 0, ) into disjoint subinterval J(0),J() such that J(i) = 2 i, ) (i +) i =0,. 2 b) Set P j = j p k j =, 2,,M; P 0 =0. k= 2) Set s = t = λ (null string), α s = γ t =0,β s = δ t =,I(s) =α s,β s ), J(t) =γ t,δ t ), l =0,andm =. 3) If l = n then output as the output sequence Y,andstopthe algorithm. Otherwise obtain an input symbol from the source X to have a value a {, 2,,M}, and generate the subinterval of I(s) where and set l = l +. I(sa) = α sa,β sa ) α sa = α s +(β s α s )P a β sa = α s +(β s α s )P a, 4a) If I(sa) is entirely contained in some J(ti) (i =0, ), then set t = ti. Otherwise, go to 5). 4b) If m = then output t as the output sequence Y,andstopthe algorithm. Otherwise, partition the interval J(t) γ t,δ t ) into disjoint subinterval J(t0),J(t) such that J(tj) = γ tj,δ tj ) j =0, 5

16 where γ tj = γ t + 2 j(δ t γ t ) and set m = m + and go to 4a). 5) Set s = sa andgoto3). δ tj = γ t + 2 (j +)(δ t γ t ), (a) Approximation Error by Variational Distance We first measured the approximation error by the variational distance between the desired and approximated output distribution. Then, we obtain the following theorems: Theorem 5: If the modified interval algorithm is used, then we have lim inf n n log d ( U exp(),py ) ] E r (R, P X ), (26) where U exp() is a uniform distribution on U exp() = {, 2,, exp()}, PY denote the output distribution of the modified interval algorithm, and E r (R, P X ) is given by (6). Further, for R>R min, lim sup n n log d ( U exp(),py ) ] E sp (R, P X ) (27) where E sp (R, P X ) is given by (7). Proof: First, we shall show (26). The number of input sequences to output a specified sequence Y is not more than exp(). Then, we 6

17 obtain d(u exp(), P Y )= = y Y : y = 2 y Y : y 2 min x T Q y Y exp( ) P Y (y) exp( ) P Y (y) + exp( ) P Y (y) ( ) T Q, exp() PX n (x) y Y : y (exp( ) P Y ] 2(n +) X exp n min {D(Q P X)+ H(Q) R + } Q M(X ) (y)) which implies (26). Next, we show (27). In a similar manner as the proof of (26), we obtain d ( U exp(), PY ) =2 exp( ) P Y (y) 2 P x X n : PX n (x) exp( ) y Y : y n X(x) 2(n +) X exp{ n min D(Q P X )} D(Q P X )+H(Q)R which implies (27) for R>R min. This theorem implies that if the length of output sequence per input sample is below the entropy of the source, the approximation error measured by the variational distance vanishes exponentially as the length of input sequence tends to infinity, by using the modified interval algorithm. Next theorem shows the upper bounds of the error exponent. Theorem 6: Let P Y denote a distribution on U exp() using any algorithm for random number generation with fixed input length n. Then for R> R min, lim sup ] n n log d(u exp(), P Y ) E sp (R, P X ), (28) 7

18 where E sp (R, P X ) is given by (7). Proof: It should be noted that P Y exp( ) (y) 2exp( ). Then, we have d(u exp(), P Y )= x X n : P n X (x) 2exp( ) y Y P Y exp( ) 2 P n X(x) 2 (n +) X D(Q P X )+H(Q)R n (y) 2 P Y (y) P Y (y) if exp( nd(q P X )) 2 (n +) X exp{ n min D(Q P X )}. D(Q P X )+H(Q)R n From the continuity of divergence, we can obtain (28). Note that E r (R, P X ) <E sp (R, P X )forr<h(x). Hence, it is still an open problem to obtain the exact error exponent of the proposed algorithm. Next theorem shows the converse result. Theorem 7: If the modified interval algorithm is used, then for R<R max, lim inf n n log { 2 d(u exp(), PY ) } ] F (R, P X ), (29) where F (R, P X ) is given by (8). Further, for R<log X lim sup n where G(R, P X ) is given by (9). n log { 2 d(u exp(),p Y ) } ] G(R, P X ), (30) Proof: First, we shall show (29). From the equality a + b = a b + 8

19 2min(a, b), we obtain 2 d ( U exp(), PY ) = 2 exp( ) P Y (y) = 2 = 2 2 y Y y Y min y Y : y ( ) exp( ), PY (y) PY (y)+2exp( ) P x X n : PX n (x)exp( ) n X(x)+2exp( ) 2(n +) X exp{ n min D(Q P X )+H(Q) R D(Q P X )} +2exp( ). Now that F (H(X),P X )=0,F(R, P X ) is monotonously increasing for R H(X) and D(Q P X ) = Q(x)log Q(x) P (x) x X Q(x)logP X (x) x X Q(x)logmin P X( x) bx X x X = log min P X( x) bx X = R max, then F (R, P X ) <Rfor R<R max. Hence, from the convexity of divergence we can obtain (29) for R<R max. Next, we show (30). In a similar manner as the proof of (29), we have 2 d(u exp(), P Y )=2 y Y min 2 x X n : x T Q, T Q 2exp() 2 T Q P n X(x) ( exp( ), P Y (y) ) (n +) X exp{ n min H(Q) R+ n log 2(n+) X D(Q P X )} 9

20 which implies (30) for R<log X. This theorem implies that if the length of output sequence per input sample is above the entropy of the source, the approximation error measured by the variational distance approaches to two exponentially as the length of input sequence tends to infinity. Next theorem was due to Ohama. Theorem 8 5]: Consider the optimum algorithm for random number generation with fixed input length n, let P Y denote the distribution on Y which minimizes the variational distance. Then, we have n { log 2 d(u exp(), P Y )} ] = F (R, P X ) (3) where lim n F (R, P X )= min {D(Q P X)+ R H(Q) D(Q P X ) + }. (32) Q M(X ) Further, F (R, P X ) F (R, P X ) and equality holds for R R 0,where R 0 = D(U X P X )+log X. (33) Proof: (a)converse part For x X n such that PX n (x) exp( ), we assign x to a certain y Y one by one. We shall denote the set of these y Y by A. Also, for x X n such that PX n (x) exp( ), we assign as many x as possible to a certain y A c, on condition that the sum of probability of assigned x is not over the probability of y. We shall denote the set of these y A c by B. If there are some x to be corresponded to no y, we assign these x to suitable y B one by one. We shall denote the set of these y B by B 2. 20

21 Then, we have 2 d(u exp(), P Y ) = 2 ( ) exp( ), P Y (y) y Y min = 2 y A exp( )+2 = 2 2 y B B c 2 2 exp( )+2 y A y B P Y (y)+2 exp( ) y B 2 P Y (y) exp( )+2 P x X n : PX n (x) exp( ) x X n : PX n (x)exp( ) D(Q P X )+H(Q)R = 2 Q P n exp exp{ n(r H(Q))} +2 n X (x) D(Q P X )+H(Q) R exp( nd(q P X )) n{d(q P X )+ R H(Q) D(Q P X ) + } ] 2(n +) X exp n min {D(Q P X)+ R H(Q) D(Q P X ) + } Q M(X ) which implies lim inf n n log { 2 d(u exp(), P Y )} ] F (R, P X ). (b)achievable part (b-i) Suppose that R R 0.Weassignx X n such that PX n (x) exp( ) to y Y one by one, and arbitrary for other x X n. Then, we have 2 d(u exp(), P Y )=2 ( ) exp( ), P Y (y) 2 y Y min exp( ) x X n : PX n (x) exp( ) 2(n +) X D(Q P X )+H(Q)R exp{ n(r H(Q))} 2(n +) X exp{ n min (R H(Q))}. D(Q P X )+H(Q)R 2 ]

22 By the way since R R 0, from the convexity of R H(Q) wehave Therefore, we obtain min (R H(Q)) = R log X. D(Q P X )+H(Q)R min {D(Q P X)+ R H(Q) D(Q P X ) + } Q M(X ) = min (R H(Q)), min Here, min D(Q P X )+H(Q)R = min R log X, Then, we have Q =arg min D(Q P X )+H(Q) R min D(Q P X )+H(Q) R D(Q P X )+H(Q) R D(Q P X ). D(Q P X ). D(Q P X ) D(Q P X ) (R log X ) R H(Q ) (R log X ) H(Q )+log X 0 which implies that min {D(Q P X)+ R H(Q) D(Q P X ) + } = R log X Q M(X ) for R R 0. Hence, we have lim sup n for R R 0. n { log 2 d(u exp(), P Y )} ] F (R, P X ) (b-ii) Suppose that R<R 0. We can select a type Q such that D(Q P X )+H(Q ) R and Q minimizes D(Q P X ) > 0. Then, we assign as many x T Q as possible to y Y, on condition that the sum of the 22

23 probability of assigned x is not over the probability of y, and arbitrary for x / T Q. In this case, the number of x corresponding to a certain y is k = exp( ) exp n{d(q P X )+H(Q )}] Thus, for a sufficiently large n 0 and all n n 0,thenumberofy to be assigned is upperbounded as follows. exp(nh(q )) exp(nh(q )) + k k exp(nh(q )) exp n{r H(Q ) D(Q P X )}] + Therefore, we have. 2exp(nH(Q )) exp n{r H(Q ) D(Q P X )}] + = 2expn{R D(Q P X )}]+ exp(). 2 d(u exp(), P Y ) = 2 ( ) exp( ), P Y (y) y Y min 2 x X n : x T Q P n X(x) 2(n +) X exp{ n min D(Q P X )}. D(Q P X )+H(Q) R By the way, note that R H(Q) resp.d(q P X )+H(Q)] is convex (resp. linear) function of Q. Thus, for R<R 0, (R H(Q)) can be attained at its boundary, that is, min D(Q P X )+H(Q)R min (R H(Q)) = min (R H(Q)) D(Q P X )+H(Q)R D(Q P X )+H(Q)=R = min D(Q P X )+H(Q)=R D(Q P X ). 23

24 This implies that for R<R 0 min {D(Q P X)+ R H(Q) D(Q P X ) + } = min D(Q P X ). Q M(X ) D(Q P X )+H(Q) R Hence, we have lim sup n n { } ] log 2 d(u exp(), P Y ) F (R, P X ) for R<R 0. From (a)(b-i)(b-ii), we obtain (3). F (R, P X ) = 0 if and only if Q = P X and R H(Q), i.e. R H(X). This implies that F (R, P X ) > 0 if and only if R>H(X). Theorem 7 and 8 imply that the modified interval algorithm is not optimum if R R 0. (b) Approximation Error by Divergence Next, we shall consider to measure the approximation error by the divergence between the desired and approximated output distribution. First, we show the following lemma. Lemma 3 Let P n,q n be arbitrary distributions on X n.if then d(p n,q n ) ɛ, D(P n Q n ) ɛ log P n minq n min, where Pmin n (resp. Qn min ) is the minimum of P n (resp. Q n )onx n. Proof: Using the inequality Q n (x)logq n (x) Q n (x)logp n (x), (34) x X n x X n we have (P n (x)logp n (x) Q n (x)logq n (x)) x X n (P n (x) Q n (x)) log P n (x) x X n log Pmin n P n (x) Q n (x) x X n = d(p n,q n )logpmin n ɛ log Pmin n. 24

25 Hence, we obtain D(P n Q n ) = P n (x)log P n (x) Q n (x) x X n = P n (x)logp n (x) P n (x)logq n (x) x X n x X n Q n (x)logq n (x) ɛ log Pmin n P n (x)logq n (x) x X n x X n = (Q n (x) P n (x)) log Q n (x) ɛ log Pmin n x X n log Q n min P n (x) Q n (x) ɛlog Pmin n x X n ɛ log P n minq n min. From Theorem 5 and Lemma 3, we immediately obtain the following corollary. Corollary 2: If the modified interval algorithm is used, then we have lim inf n n log D ( U exp() PY ) ] E r (R, P X ) (35) where E r (R, P X ) is given by (6). This corollary implies that if the length of output sequence per input sample is below the entropy of the source, the approximation error measured by the divergence also vanishes exponentially as the length of input sequence tends to infinity. Remark 2: Han9] has showed that there exists an algorithm for random number generation of which normalized divergence vanishes. However as shown in Corollary 2, for DMS (more generally finite-state unifilar sources), even divergence can vanish as the length of input sequence tends to infinity. Next, we show the following lemmas. Lemma 4: LetP n,q n be arbitrary distributions on X n.if 2 d(p n,q n ) ɛ, 25

26 then D(P n Q n )+D(Q n P n ) log ɛ. Proof: Using the log-sum inequality8] and (34), we obtain D(P n Q n )+D(Q n P n ) = P n (x)log P n (x) Q n (x) + Q n (x)log Qn (x) P n (x) x X n x X n = (P n (x)logp n (x)+q n (x)logq n (x)) x X n (P n (x)logq n (x)+q n (x)logp n (x)) x X n (P n (x)+q n (x)) log P n (x) log Q n (x) x X n = (P n (x)+q n (x)) log P n (x) Q n (x) x X n = (P n (x)+q n (x)) log max(p n (x), Qn (x)) min(p n (x), Q n (x)) x X n P n P n (x) (x)log min(p n (x), Q n (x)) x X n log min(p n (x), Q n (x)) x X n log ɛ. Lemma 5: LetP n,q n be arbitrary distributions on X n.if 2 d(p n,q n ) ɛ, then Proof: Note that D(P n Q n ) (2 ɛ)logp n min Qn min. d(p n,q n ) 2 ɛ. 26

27 Then, in a similar manner as the proof of Lemma 4, we obtain D(P n Q n ) = P n (x)log P n (x) Q n (x) x X n = P n (x)logp n (x) P n (x)logq n (x) x X n x X n Q n (x)logq n (x) (2 ɛ)logpmin n P n (x)logq n (x) x X n x X n = (Q n (x) P n (x)) log Q n (x) (2 ɛ)logpmin n x X n log Q n min P n (x) Q n (x) (2 ɛ)logpmin n x X n (2 ɛ)logpminq n n min. From these lemmas and Theorem 7, we immediately obtain the following corollary. Corollary 3: If the modified interval algorithm is used, then for R<R max, lim inf n { n D ( U exp() P Y ) + D ( P Y U exp() ) } F (R, P X ), (36) where F (R, P X ) is given by (8). Further, for R<log X lim sup n n D ( U exp() PY ) R( log min P X( x)). (37) bx X This corollary implies that if the length of output sequence per input sample is above the entropy of the source, the approximation error measured by the divergence approaches to two linearly as the length of input sequence tends to infinity. Remark 3: We can easily extend the result of Chapter IV, V and VI to stationary ergodic Markov source by using Markov type ]. Further, we can extend these result to finite-state unifilar Markov source by using the definition of type in 2, 3]. VII. Conclusion 27

28 We have investigated large deviations performance of the interval algorithm for random number generation. We have showed almost surely proposition for i.i.d. random sequence. We have clarified some asymptotic properties, when target random number is subject to uniform distribution. As future researches, we are going to generalize our results to more complex sources. References ] D. Knuth and A. Yao, The complexity of nonuniform random number generation, Algorithm and Complexity, New Directions and Results, pp , ed. by J. F. Traub, Academic Press, New York, ]S.VembuandS.Verdú, Generating random bits from and arbitrary source: Fundamental limits, IEEE Trans. on Inform. Theory, vol.it- 4, pp , Sep ] T. S. Han and M. Hoshi, Interval algorithm for random number generation, IEEE Trans. on Inform. Theory, vol.43, pp.599-6, Mar ] F. Kanaya, An asymptotically optimal algorithm for generating Markov random sequences, Proc. of SITA 97, pp.77-80, Matsuyama, Japan, Dec. 997 (in Japanese). 5] Y. Ohama, Fixed to fixed length random number generation using one dimensional piecewise linear maps, Proc. of SITA 98, pp.57-60, Gifu, Japan, Dec. 998 (in Japanese). 6] T. Uyematsu and F. Kanaya, Methods of channel simulation achieving conditional resolvability by statistically stable transformation, submitted to IEEE Trans. on Inform. Theory 7] O. Uchida and T. S. Han, Performance analysis of interval algorithm for generating Markov processes, Proc. of SITA 98, pp.65-68, Gifu, Japan, Dec ] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic, 98. 9] T. S. Han: Information-Spectrum Methods in Information Theory, Baifukan, Tokyo, 998 (in Japanese). 28

29 0] P. C. Shields: The ergodic theory of discrete sample paths, Graduate Studies in Math. vol.3, American Math. Soc., 996. ] L. D. Davisson, G. Longo and A. Sgarro, Error exponent for the noiseless of finite ergodic Markov sources, IEEE Trans. on Inform. Theory, vol.it-27, pp , Jul ] N. Merhav, Universal coding with minimum probability of codeword length overflow, IEEE Trans. on Inform. Theory, vol.37, pp , May ] N. Merhav and D. L. Neuhoff, Variable-to-fixed length codes provide better large deviations performance than fixed-to-variable length codes, IEEE Trans. on Inform. Theory, vol.38, pp.35-40, Jan ] T. Uyematsu: Today s Shannon Theory, Baifukan, Tokyo, 998 (in Japanese). 29

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract