Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Similar documents
Large Deviations Performance of Interval Algorithm for Random Number Generation

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Arimoto Channel Coding Converse and Rényi Divergence

EECS 750. Hypothesis Testing with Communication Constraints

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

1 Introduction to information theory

Information measures in simple coding problems

An Extended Fano s Inequality for the Finite Blocklength Coding

Homework Set #2 Data Compression, Huffman code and AEP

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Convexity/Concavity of Renyi Entropy and α-mutual Information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

arxiv: v4 [cs.it] 17 Oct 2015

Optimum Binary-Constrained Homophonic Coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Solutions to Set #2 Data Compression, Huffman code and AEP

Hypothesis Testing with Communication Constraints

INFORMATION THEORY AND STATISTICS

Coding on Countably Infinite Alphabets

The Method of Types and Its Application to Information Hiding

Variable Length Codes for Degraded Broadcast Channels

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Coding of memoryless sources 1/35

Second-Order Asymptotics in Information Theory

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Lecture 5 - Information theory

Chapter 5: Data Compression

ECE 4400:693 - Information Theory

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Lecture 2: August 31

Information Theory and Hypothesis Testing

Chaos, Complexity, and Inference (36-462)

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Fixed-Length-Parsing Universal Compression with Side Information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Subset Source Coding

Discrete Memoryless Channels with Memoryless Output Sequences

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

Information Theory and Statistics Lecture 2: Source coding

Homework 1 Due: Wednesday, September 28, 2016

IN this paper, we study the problem of universal lossless compression

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Soft Covering with High Probability

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Network coding for multicast relation to compression and generalization of Slepian-Wolf

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

ELEC546 Review of Information Theory

Context tree models for source coding

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Coding for Discrete Source

A new converse in rate-distortion theory

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Simple Channel Coding Bounds

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY

arxiv: v5 [cs.it] 28 Feb 2015

Memory in Classical Information Theory: A Brief History

On universal types. Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL September 6, 2004*

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Dispersion of the Gilbert-Elliott Channel

CSCI 2570 Introduction to Nanocomputing

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)

Lecture 4 : Adaptive source coding algorithms

Upper Bounds on the Capacity of Binary Intermittent Communication

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Universal source coding for complementary delivery

Capacity of a channel Shannon s second theorem. Information Theory 1/33

IN [1], Forney derived lower bounds on the random coding

Motivation for Arithmetic Coding

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Markov approximation and consistent estimation of unbounded probabilistic suffix trees

EE 4TM4: Digital Communications II. Channel Capacity

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

DISCRIMINATING two distributions is treated as a fundamental problem in the field of statistical inference. This problem

Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes

Hash Property and Fixed-rate Universal Coding Theorems

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

Cut-Set Bound and Dependence Balance Bound

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

An Entropy Bound for Random Number Generation

Quantifying Computational Security Subject to Source Constraints, Guesswork and Inscrutability

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

arxiv: v1 [cs.it] 5 Sep 2008

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

ECE Information theory Final (Fall 2008)

The Capacity Region for Multi-source Multi-sink Network Coding

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

Transcription:

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract We investigate large deviations performance of the algorithm for random number generation proposed by Knuth and Yao. First, we show that the number of input fair random bits per the length of output sequence approaches to the of output source almost surely, and the large deviations performance of this algorithm is equal to the one of the interval algorithm proposed by Han and Hoshi. Next, we consider to obtain the fixed length of output sequence from the input fair random bits with fixed length. We show that the approximation error measured by the variational distance vanishes exponentially at the speed equal to the interval algorithm as the length of output sequence tends to infinity, if the number of input fair bits per output sample is above the entropy of source. Contrarily, the approximation error measured by the variational distance approaches to two exponentially at the speed equal to the interval algorithm, if the number of input fair bits per output sample is below the entropy. Dept. of Electrical and Electronic Eng., Tokyo Institute of Technology, 2-2- Ookayama, Meguro-ku, Tokyo 52-8552, Japan

I. Introduction Random number generation is a problem of simulating some prescribed target distribution by using a given source. This problem has been investigated in computer science, and has a close relation to information theory [, 2, 3]. Some practical algorithms for random number generation have been proposed so far, i.e. [, 3, 4, 5]. In this paper, we consider the algorithm proposed by Knuth and Yao []. Knuth-Yao algorithm can be regarded as a starting point of practical algorithms for random number generation, and some modified algorithm have been proposed. But the asymptotic property has not yet been made clear. On the other hand, performance of the interval algorithm proposed by Han and Hoshi [3] has already been investigated in [3, 6, 7, 8]. Especially, Uyematsu and Kanaya [6] have investigated large deviations performance of the interval algorithm, where the distribution of input source is uniform. We shall investigate large deviations performance of Knuth-Yao algorithm on the same condition as Uyematsu and Kanaya. First, we show that the number of input fair random bits per the length of output sequence approaches to the of output source almost surely, and the large deviations performance of this algorithm is equal to the one of the interval algorithm. Next, we consider to obtain the fixed length of output sequence from the input fair random bits with fixed length. We show that the approximation error measured by the variational distance vanishes exponentially at the speed equal to the interval algorithm as the length of output sequence tends to infinity, if the number of input fair bits per output sample is above the entropy of source. Contrarily, the approximation error measured by the variational distance approaches to two exponentially at the speed equal to the interval algorithm, if the number of input fair bits per output sample is below the entropy. II. Basic Definitions (a) Discrete Memoryless Source Let X be a finite set. We denote by M(X ) the set of all probability distributions on X. Throughout this paper, by a source X with alphabet X, we mean a discrete memoryless source (DMS) of distribution P X M(X ). To denote a source, we will use both notations X and P X interchangeably. For random variable X which has a distribution P X, we shall denote this 2

entropy as H(P X )andh(x), interchangeably. H(P X ) = x X P X (x)logp X (x). Further, for arbitrary distributions P, Q M(X ), we denote by D(P Q) the information divergence D(P Q) = x X P (x)log P (x) Q(x). Lastly, we denote by d(p, Q) thevariational distance or l distance between two distributions P and Q on X d(p, Q) = P (x) Q(x). x X From now on, all logarithms and exponentials are to the base two. (b) Type of Sequence The type of a sequence x X n is defined as a distribution Px M(X ), where Px(a) isgivenby Px(a) = (number of occurrences of a X in x). () n We shall write P n or P for the set of types of sequences in X n. We denoted by TP n or T P the set of sequences of type P in X n. On the contrary for a distribution P M(X ), if T P then we denote by P the type of sequences in X n. We introduce some well-known facts, cf. Csiszár-Körner [9]: For the set of types in X n,wehave P n (n +) X (2) where denotes the cardinality of the set. For the set of sequences of type P in X n, (n +) X exp(nh(p )) T P exp(nh(p )) (3) If x T P,wethenhave From (3)(4), Q n (x) =exp[ n{d(p Q)+H(P )}]. (4) (n +) X exp( nd(p Q)) Q n (T P ) exp( nd(p Q)) (5) 3

(c) Resolvability In this paper, we especially investigate the problem to generate a general source Y = {Y n } n= using a uniform random number with as small size as possible. This problem is called resolvability problem [0]. Here, we shall introduce basic definitions for resolvability problem. Definition : For an arbitrary source Y = {Y n } n=,rater is achievable resolvability rate if and only if there exists a map ϕ n : U Mn Y n such that lim sup n log M n R lim d(y n,ϕ n (U Mn )) = 0, where U Mn = {, 2,,Mn } and U Mn is an uniform distribution on U Mn. Definition 2 (inf achievable resolvability rate): S(Y )= inf{r R is achievable resolvability rate} As for the characterization of resolvability rate, Han and Verdú [] proved the following fundamental theorem. Theorem : For any stationary source Y, where H(Y ) is the entropy rate of Y. S(Y )=H(Y ) (6) III. Algorithms for Random Number Generation In this chapter, we describe algorithms for random number generarion, the interval algorithm proposed by Han and Hoshi [3], and the Knuth-Yao algorithm proposed by Knuth and Yao []. (a)interval Algorithm In this section, we describe the interval algorithm proposed by Han and Hoshi [3]. Let us consider to produce an i.i.d. random sequence Y n = (Y,Y 2,,Y n ). Each random variable Y i (i =, 2,,n) is subject to a generic distribution q =(q,q 2,,q N ). We generate this sequence by using an i.i.d. random sequence X,X 2,, with a generic distribution p =(p,p 2,,p M ). The algorithm can be described as follows. Interval Algorithm for Generating Random Process 4

a) Partition an unit interval [0, ) into N disjoint subinterval J(),J(2),,J(N) such that b) Set J(i) = [Q i,q i ) i =, 2,,N i Q i = q k i =, 2,,N; Q 0 =0. P j = k= j p k j =, 2,,M; P 0 =0. k= 2) Set s = t = λ (null string), α s = γ t =0,β s = δ t =,I(s) =[α s,β s ), J(t) =[γ t,δ t ), and m =. 3) Obtain an output symbol from the source X to have a value a {, 2,,M}, and generate the subinterval of I(s) where I(sa) = [α sa,β sa ) α sa = α s +(β s α s )P a β sa = α s +(β s α s )P a. 4a) If I(sa) is entirely contained in some J(ti) (i =, 2,,N), then output i as the value of the mth random number Y m and set t = ti. Otherwise, go to 5). 4b) If m = n then stop the algorithm. Otherwise, partition the interval J(t) [γ t,δ t )inton disjoint subinterval J(t),J(t2),,J(tN) such that where J(tj) = [γ tj,δ tj ) j =, 2,,N γ tj = γ t +(δ t γ t )Q j δ tj = γ t +(δ t γ t )Q j and set m = m + and go to 4a). 5

5) Set s = sa andgoto3). Here,wedenotebyT n (x, y) the length of input sequence x necessary to generate the sequnece y Y n. Han and Hoshi [3] have showed the following bound H(Y n ) H(X) E[T n(x, Y n )] H(Y n ) log(2(m )) h(p max ) + + H(X) H(X) ( p max )H(X) where p max = max j M p j and h( ) is the binary entropy. In the special case of using fair random bits for the input sequence, we reduce this bound to the following. H(Y n ) E[T n (Y n )] H(Y n )+3 (7) (b)knuth-yao Algorithm In this section, we describe the Knuth-Yao algorithm proposed by Knuth and Yao []. Let us consider to produce an i.i.d. random sequence Y n = (Y,Y 2,,Y n ). Each random variable Y i (i =, 2,,n) is subject to a generic distribution q =(q,q 2,,q N )ony = {, 2,,N}. We genarate this random sequence by using a sequence of fair bits. We can describe this algorithm by mapping sequences of fair bits X,X 2, to possible outcomes Y n by a binary tree which is called a generating tree. Leaves of the tree are marked by output symbols Y n and the path to the leaf is given by a sequence of fair bits. The generating tree must satisfy the following properties.. The tree should be complete, i.e. every node is either a leaf or has two descendants in the tree. 2. The probability of the leaf at depth k is exp( k). Many leaves may be labeled with the same output symbol, but the total probability of all these leaves should be equal to the desired probability of the output symbol. 3. The expected number of fair bits required to generate Y is equal to the expected depth of this tree. The following algorithm is to construct generating tree satisfying these properties. Knuth-Yao Algorithm for Constructing Generating Tree a) Prepare a root of the tree, and set i =. 6

b) For a sequence y =(y,y 2,,y n ) Y n from the DMS Y, the binary expansion of the probability PY n (y) be P n Y (y) = k α (y) k exp( k) where α (y) k = {0, }. 2) If nodes at depth i are not labeled with any symbols, then attach two branches and leaves to these nodes, and label one of the branches with 0, the other with. 3) For all y Y n satisfying α (y) i =, label one of the attached leaves in step 2) with y. 4) If all leaves are labeled, then stop the algorithm. Otherwise, set i = i +andgoto2). Here, we denote by T n (y) the length of input sequence necessary to generate the sequence y Y n. Knuth and Yao [] have showed the following bound. H(Y n ) E[T n (Y n )] H(Y n )+2. (8) This bound is tighter just than that of the interval algorithm (see (7)). From this, we can see that the Knuth-Yao algorithm may be better than the interval algorithm in sight of large deviations performance. IV. Large Deviations Analysis of Length of Input Sequence In this chapter, we investigate large deviations performance for the length of input sequence in the Knuth-Yao algorithm, and compare with the performance of the interval algorithm. Let us consider to produce an i.i.d. random sequence Y n =(Y,Y 2,,Y n ). Each random variable Y i (i =, 2,,n) is subject to a generic distribution P Y on Y. We generate this random sequence by using a sequence of fair bits. We denote by T n (y) the length of input sequence to generate y Y n. 7

Here, we define the following function: E r (R, P Y ) = min Y )+ R H(Q) D(Q P Y ) + } (9) Q M(Y) E sp (R, P Y ) = min Q M(Y): D(Q P Y )+H(Q) R D(Q P Y ) (0) F (R, P Y ) = min Q M(Y): D(Q P Y )+H(Q) R D(Q P Y ) () G(R, P Y ) = min Q M(Y): H(Q) R D(Q P Y ) (2) where x + =max{0,x}. (a)interval Algorithm Considering to use the interval algorithm, this problem is equivalent to channel simulation (investigated by Uyematsu and Kanaya [6]) in the case that the input of channel is unique symbol. To modify the results of Uyematsu and Kanaya [6], we can easily obtain the following theorems. Theorem 2: [ lim inf { }] n log Pr n T n(y n ) R E r (R, P Y ) (3) Also, for R>R min = max log P Y (y) y Y [ lim { }] n log Pr n T n(y n ) R = F (R, P Y ) (4) Further, E r (R, P Y ) > 0 if and only if R>H(Y )andf(r, P Y ) > 0ifand only if R min <R<H(Y ). (b)knuth-yao Algorithm Considering to use the Knuth-Yao algorithm, we investigate large deviations performance in a similar manner as the interval algorithm. Then, we obtain the following theorems. Theorem 3: lim inf [ { }] n log Pr n T n(y n ) R E r (R, P Y ) (5) where E r (R, P Y ) is given by (9). Proof: If n is sufficiently large, without loss of generality let nr be integer. First, we modify the generating tree of the Knuth-Yao algorithm such that the depth of all leaves are nr by the following operation. 8

. If depth of a leaf with a symbol y Y n is smaller than nr then construct a sub-tree under this leaf until the depth of all descendants are nr, and label all the descendants with y. 2. If nodes at depth nr are not leaves, then cut all the branches attached under these nodes, and label these leaves with no symbol. In this binary tree, The probability of each leaf is exp( nr). Therefore, P n this operation corresponds to labeling Y (y) leaves with y Y n exp( nr) which satisfy PY n (y) exp( nr), where x is the maximum integer which is not over x. The leaves with no symbol correspond to sequences of fair bits of length nr not to stop the algorithm. On the other hand, the leaves with some label correspond to sequences of length nr to stop the algorithm. From this observation, using (2)-(5) we have { } Pr n T n(y ) R PY n (y)+ exp( nr) PY n(y) exp( nr) Q P n: D(Q P Y )+H(Q) R = P n Y (y)>exp( nr) exp( nd(q P Y )) + Q P n: D(Q P Y )+H(Q)<R exp[ n{d(q P Y )+ R H(Q) D(Q P Y ) + }] Q P n (n +) Y exp{ ne r (R, P Y )} which implies (5). Theorem 4: ForR>R min = max y Y log P Y (y) exp{ n(r H(Q))} [ lim { }] n log Pr n T n(y n ) R = F (R, P Y ) (6) where F (R, P Y ) is given by (). 9

Proof: In a similar manner as the proof of Theorem 3, we obtain { } Pr n T n(y ) R P n = Y (y) exp( nr) exp( nr) which implies lim inf PY n(y) exp( nr) PY n(y) exp( nr) Q P n: D(Q P Y )+H(Q) R P n Y (y) exp( nd(q P Y )) (n +) Y exp{ nf (R, P Y )} [ { }] n log Pr n T n(y n ) R F (R, P Y ) (7) for R>R min. It should be noted that the minimum of () is taken over the non-empty set of Q if R>R min. Also, we can obtain { } Pr n T n(y n ) R which implies P n Y (y) exp( nr) 2 P n Y (y) 2 (n +) Y Q P n: D(Q P Y )+H(Q) R exp( nd(q P Y )) 2 (n +) Y exp{ n min D(Q P Y )}, Q P n: D(Q P Y )+H(Q) R lim sup [ { }] n log Pr n T n(y n ) R F (R, P Y ) (8) for R>R min. From (7)(8), we obtain (6). 0

Theorem 5: ForR<R max = min y Y log P Y (y) [ lim sup { }] n log Pr n T n(y n ) R E sp (R, P Y ). (9) Further, E sp (R, P Y ) > 0 if and only if H(Y ) <R<R max. Proof: In a similar manner as the proof of Theorem 3, we obtain { } Pr n T n(y n ) R PY n (y) P n Y (y) exp( nr) (n +) Y Q P n: D(Q P Y )+H(Q) R exp( nd(q P Y )) (n +) Y exp{ n min D(Q P Y )} Q P n: D(Q P Y )+H(Q) R which implies (9) for R<R max. It should be noted that the minimum of (0) is taken over the non-empty set of Q if R<R max. E sp (R, P Y ) = 0 if and only if Q = P Y and H(Q) R, i.e. R H(Y ). This implies that E sp (R, P Y ) > 0 if and only if H(Y ) <R<R max. From Theorem 3 and 4, by using the Borel-Cantelli s principle (see e.g. [2]), we immediately obtain the following corollary. Corollary : lim n T n(y n )=H(Y) a.s. (20) From Theorem 3 and 4, we can conclude that the Knuth-Yao algorithm has the same large deviations performance for the length of input sequence as the interval algorithm using a sequence of fair bits. V. Error Exponent for Resolvability Problem In this chapter, let us consider to produce an i.i.d. random sequence Y n =(Y,Y 2,,Y n ) using a fixed number of fair bits. In this case, we cannot generate this random sequence exactly but approximately. (a) Interval Algorithm

First, we modify the interval algorithm so that the algorithm outputs a specified sequence y 0 Y n whenever the algorithm does not stop with an input fair bits of length nr. The modified algorithm can be described as follows. Modified Interval Algorithm for Generating Random Process with Fixed Number of Fair Bits ) Partition an unit interval [0, ) into N disjoint subinterval J(),J(2),,J(N) such that J(i) = [Q i,q i ) i =, 2,,N i Q i = q k i =, 2,,N; Q 0 =0. k= 2) Set s = t = λ (null string), α s = γ t =0,β s = δ t =,I(s) =[α s,β s ), J(t) =[γ t,δ t ), l =0,andm =. 3) If l = nr then output y 0 as the output sequence Y n,andstopthe algorithm. Otherwise obtain an output symbol from the source X to have a value a {0, }, and generate the subinterval of I(s) where and set l = l +. I(sa) = [α sa,β sa ) α sa = α s + 2 a(β s α s ) β sa = α s + 2 (a +)(β s α s ), 4a) If I(sa) is entirely contained in some J(ti) (i =, 2,,N), then set t = ti. Otherwise,goto5). 4b) If m = n then output t as the output sequence Y n, stop the algorithm. Otherwise, partition the interval J(t) [γ t,δ t )inton disjoint subinterval J(t),J(t2),,J(tN) such that where J(tj) = [γ tj,δ tj ) j =, 2,,N γ tj = γ t +(δ t γ t )Q j δ tj = γ t +(δ t γ t )Q j and set m = m + and go to 4a). 2

5) Set s = sa andgoto3). This problem also has been investigated by Uyematsu and Kanaya [6]. They have measured the approximation error by the variational distance between the desired and approximated distribution. They have showed the following theorems for the error exponent of the interval algorithm. Theorem 6 [6]: If the modified interval algorithm is used for random number genaration of an i.i.d. source with a generic distribution P Y,thenwe have [ lim inf ] n log d(p Y n, P Y n ) E r (R, P Y ), (2) where P Y n denotes the output distribution of the modified interval algorithm, and E r (R, P Y ) is given by (9). Theorem 7 [6]: If the modified interval algorithm is used for random number genaration of an i.i.d. source with a generic distribution P Y,thenfor R>R min we have lim where F (R, P Y ) is given by (). [ ] n log{2 d(p Y n, P Y n )} = F (R, P Y ), (22) Theorem 6 implies that if the number of fair bits per the length of output sequence is above the entropy of output source, then the approximation error measured by the variational distance vanishes exponentially as the lenth of output sequence tends to infinity. Whereas, Theorem 7 implies that if the number of fair bits per the length of output sequence is below the entropy, the approximation error approaches to two. Also, they have showed the following theorems for the optimum error exponent of algorithms for random number generation. Theorem 8 [6]: Let P Y n(y) denote a distribution on Yn using any algorithm for random number generation using fair bits of length nr. Then for R< R max, lim sup [ n ] log d(p ny, P ny ) E sp (R, P Y ), (23) where E sp (R, P Y ) is given by (0). Further, E sp (R, P Y ) E r (R, P Y )and equality holds for R R 0,where R 0 = D(U Y P Y )+log Y. 3

Theorem 9 [6]: Consider the optimum algorithm for random number generation using fair bits of length nr, let P Y n denote the distribution on Yn which minimizes the variational distance. Then, [ n ] log{2 d(p ny, P ny )} = G(R, P Y ). (24) lim Further, G(R, P Y ) > 0 if and only if R<H(Y )andf(r, P Y ) >G(R, P Y ) for R<H(Y ). From Theorem 6 and 8, the algorithm proposed by Uyematsu and Kanaya is optimum for H(Y ) <R R 0, but it is still an open problem to obtain the error exponent for R>R 0. Also, Theorem 7 and 9 imply that the algorithm is not optimum for R<H(Y ). (b) Knuth-Yao Algorithm In a similar manner as the above section, we modify the Knuth-Yao algorithm so that the algorithm outputs a specified sequence y 0 Y n whenever the algorithm does not stop with an input fair bits of length nr. Modified Knuth-Yao Algorithm for Constructing Generating Tree with Fixed Input Length a) Prepare a root of tree, and set i =. b) For a sequence y =(y,y 2,,y n ) Y n from the DMS Y, the binary expansion of the probability PY n (y) be P n Y (y) = k α (y) k exp( k) where α (y) k = {0, }. 2) If nodes at depth i are not labeled with any symbols, then attach two branches and leaves to these nodes, and label one of the branches with 0, the other with. 3) For all y Y n satisfying α (y) i =, label one of the attached leaves in step 2) with y. 4a) If i = nr then label all the leaves labeled with no label with y 0,and stop the algorithm. 4b) If all leaves are labeled, then stop the algorithm. Otherwise, set i = i +andgoto2). 4

We measure the approximation error by the variational distance between the desired and approximated distribution. Then, we obtain the following theorems. Theorem 0: If the modified Knuth-Yao algorithm is used for random number genaration of an i.i.d. source with a generic distribution P Y,then we have lim inf [ n ] log d(p ny, P ny ) E r (R, P Y ), (25) where P Y n denotes the output distribution of the modified Knuth-Yao algorithm, and E r (R, P Y ) is given by (9). Proof : d(py n, P Y n )= PY n (y) P Y n (y) y Y n = PY n (y) P Y n (y) + (PY n (y) P Y n (y)) y y 0 y y 0 = 2 PY n (y) P Y n (y) y y 0 2 P n Y (y) exp( nr) exp( nr)+2 P n Y (y) exp( nr) P n Y (y) 2(n +) Y exp{ n min Q P n (D(Q P Y )+ R H(Q) D(Q P Y ) + )}, which implies (25). Theorem : If the modified Knuth-Yao algorithm is used for random number genaration of an i.i.d. source with a generic distribution P Y,then for R>R min we have [ n ] log{2 d(p ny, P ny )} lim where F (R, P Y ) is given by (). = F (R, P Y ), (26) 5

Proof : From the equality a + b = a b +2min(a, b), we obtain 2 d(py n, P Y n )=2 min(py n (y), P Y n (y)) y Y n = 2 P Y n (y)+2py n (y 0 ) y y 0 2 P n Y (y) exp( nr) P n Y (y)+2exp( nr) 2(n +) Y exp{ n min D(Q P Y )} +2exp( nr), Q P n: D(Q P Y )+H(Q) R which implies lim inf [ n ] log d(p ny, P ny ) F (R, P Y ) for R>R min. We can obtain the reverse inequality as follows. 2 d(py n, P Y n ) = 2 P Y n (y)+2p Y n (y 0) y y 0 2 P n Y (y) exp( nr) 2 P n Y (y) (n +) Y exp{ n min D(Q P Y )}. Q P n: D(Q P Y )+H(Q) R Then, we obtain (26). From these theorems Theorem 6 and 7, we can conclude that the Knuth- Yao algorithm has the same error exponent measured by the variational distance as the interval algorithm using a sequence of fair bits. VI. Conclusion We have investigated large deviations performance of the Knuth-Yao algorithm. We have clarified some asymptotic properties, and we have showed that the performance is the same as the interval algorithm using a sequence of fair bits. As future reserches, we are going to generalize our results to more complex sources. 6

References [] D. Knuth and A. Yao, The complexity of nonuniform random number generation, Algorithm and Complexity, New Directions and Results, pp.357-428, ed. by J. F. Traub, Academic Press, New York, 976. [2]S.VembuandS.Verdú, Generating random bits from and arbitrary source: Fundamental limits, IEEE Trans. on Inform. Theory, vol.it- 4, pp.322-332, 995. [3] T. S. Han and M. Hoshi, Interval algorithm for random number generation, IEEE Trans. on Inform. Theory, vol.43, pp.599-6, Mar. 997. [4] F. Kanaya, An asymptotically optimal algorithm for generating Markov random sequences, Proc. of SITA 97, pp.77-80, Matsuyama, Japan, Dec., 997 (in Japanese). [5] Y. Ohama, Fixed to fixed length random number generation using one dimensional piecewise linear maps, Proc. of SITA 98, pp.57-60, Gifu, Japan, Dec., 998 (in Japanese). [6] T. Uyematsu and F. Kanaya, Methods of channel simulation achieving conditional resolvability by statistically stable transformation, submitted to IEEE Trans. on Inform. Theory. [7] O. Uchida and T. S. Han, Performance analysis of interval algorithm for generating Markov processes, Proc. of SITA 98, pp.65-68, Gifu, Japan, Dec., 998. [8] A. Kimura and T. Uyematsu, Large deviations performance of interval algorithm for random number generation, Proc. of Memorial Workshop for 50th Annivarsary of Shannon Theory, pp.-4, Yamanashi, Japan, Jan. 999. [9] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic, 98. [0] T. S. Han: Information-Spectrum Methods in Information Theory, Baifukan, Tokyo, 998 (in Japanese). [] T. S. Han and S. Verdú, Approximation theory of output statistics, IEEE Trans. on Inform. Theory, vol.it-39, pp.752-772, May. 993. 7

[2] P. C. Shields: The ergodic theory of discrete sample paths, Graduate Studies in Math. vol.3, American Math. Soc., 996. 8