Efficiently Generating Random Bits from Finite State Markov Chains

Similar documents
Efficient Generation of Random Bits from Finite State Markov Chains

Efficient Generation of Random Bits from Finite State Markov Chains

A Brief Introduction to Markov Chains and Hidden Markov Models

XSAT of linear CNF formulas

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

CS229 Lecture notes. Andrew Ng

Streaming Algorithms for Optimal Generation of Random Bits

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

MONTE CARLO SIMULATIONS

Separation of Variables and a Spherical Shell with Surface Charge

Week 6 Lectures, Math 6451, Tanveer

Asynchronous Control for Coupled Markov Decision Systems

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

A. Distribution of the test statistic

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Problem set 6 The Perron Frobenius theorem.

Algorithms to solve massively under-defined systems of multivariate quadratic equations

LECTURE NOTES 8 THE TRACELESS SYMMETRIC TENSOR EXPANSION AND STANDARD SPHERICAL HARMONICS

CONGRUENCES. 1. History

STABLE GRAPHS BENJAMIN OYE

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008

Approximated MLC shape matrix decomposition with interleaf collision constraint

Some Measures for Asymmetry of Distributions

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

Cryptanalysis of PKP: A New Approach

On the Goal Value of a Boolean Function

Mat 1501 lecture notes, penultimate installment

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

FOURIER SERIES ON ANY INTERVAL

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case

Traffic data collection

A proposed nonparametric mixture density estimation using B-spline functions

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

The Group Structure on a Smooth Tropical Cubic

The EM Algorithm applied to determining new limit points of Mahler measures

Integrating Factor Methods as Exponential Integrators

A CLUSTERING LAW FOR SOME DISCRETE ORDER STATISTICS

4 Separation of Variables

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

Approximated MLC shape matrix decomposition with interleaf collision constraint

Recursive Constructions of Parallel FIFO and LIFO Queues with Switched Delay Lines

Integrality ratio for Group Steiner Trees and Directed Steiner Trees

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

Lecture Note 3: Stationary Iterative Methods

Honors Thesis Bounded Query Functions With Limited Output Bits II

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

CS 331: Artificial Intelligence Propositional Logic 2. Review of Last Time

STA 216 Project: Spline Approach to Discrete Survival Analysis

Formulas for Angular-Momentum Barrier Factors Version II

Explicit overall risk minimization transductive bound

VI.G Exact free energy of the Square Lattice Ising model

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

C. Fourier Sine Series Overview

Stat 155 Game theory, Yuval Peres Fall Lectures 4,5,6

FRIEZE GROUPS IN R 2

Error-free Multi-valued Broadcast and Byzantine Agreement with Optimal Communication Complexity

Pricing Multiple Products with the Multinomial Logit and Nested Logit Models: Concavity and Implications

Course 2BA1, Section 11: Periodic Functions and Fourier Series

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

Unconditional security of differential phase shift quantum key distribution

Streaming Algorithms for Optimal Generation of Random Bits

Math 124B January 31, 2012

Partial permutation decoding for MacDonald codes

Math 124B January 17, 2012

More Scattering: the Partial Wave Expansion

Rate-Distortion Theory of Finite Point Processes

arxiv: v1 [cs.db] 25 Jun 2013

Homework 5 Solutions

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Reliability: Theory & Applications No.3, September 2006

1D Heat Propagation Problems

Maximum likelihood decoding of trellis codes in fading channels with no receiver CSI is a polynomial-complexity problem

How many random edges make a dense hypergraph non-2-colorable?

4 1-D Boundary Value Problems Heat Equation

BASIC NOTIONS AND RESULTS IN TOPOLOGY. 1. Metric spaces. Sets with finite diameter are called bounded sets. For x X and r > 0 the set

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS

Reichenbachian Common Cause Systems

arxiv: v1 [math.co] 17 Dec 2018

#A48 INTEGERS 12 (2012) ON A COMBINATORIAL CONJECTURE OF TU AND DENG

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

Stochastic Variational Inference with Gradient Linearization

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Tracking Control of Multiple Mobile Robots

A Novel Learning Method for Elman Neural Network Using Local Search

Multi-server queueing systems with multiple priority classes

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5].

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

DISTRIBUTION OF TEMPERATURE IN A SPATIALLY ONE- DIMENSIONAL OBJECT AS A RESULT OF THE ACTIVE POINT SOURCE

Primal and dual active-set methods for convex quadratic programming

Higher dimensional PDEs and multidimensional eigenvalue problems

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations

14 Separation of Variables Method

V.B The Cluster Expansion

Biometrics Unit, 337 Warren Hall Cornell University, Ithaca, NY and. B. L. Raktoe

Transcription:

1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown probabiity distribution) dates back to von eumann s 1951 work. Eias (1972) generaized von eumann s scheme and showed how to achieve optima efficiency in unbiased random bits generation. Hence, a natura question is what if the sources are correated? Both Eias and Samueson proposed methods for generating unbiased random bits in the case of correated sources (of unknown probabiity distribution), specificay, they considered finite Markov chains. However, their proposed methods are not efficient or have impementation difficuties. Bum (1986) devised an agorithm for efficienty generating random bits from degree-2 finite Markov chains in expected inear time, however, his beautifu method is sti far from optimaity on information-efficiency. In this paper, we generaize Bum s agorithm to arbitrary degree finite Markov chains and combine it with Eias s method for efficient generation of unbiased bits. As a resut, we provide the first known agorithm that generates unbiased random bits from an arbitrary finite Markov chain, operates in expected inear time and achieves the information-theoretic upper bound on efficiency. Index Terms Random sequence, Random bits generation, Markov chain. I. ITRODUCTIO The probem of random number generation dates back to von eumman [1] who considered the probem of simuating an unbiased coin by using a biased coin with unknown probabiity. He observed that when one focuses on a pair of coin tosses, the events HT and T H have the same probabiity; hence, HT produces the output symbo 0 and T H produces the output symbo 1. The other two possibe events, namey, HH and T T, are ignored, namey, they do not produce any output symbos. More efficient agorithms to generate random bits from a biased coin were proposed by Hoeffding and Simons [2], Stout and Warren [3] and Peres [4]. Eias [5] gave an optima procedure such that the expected number of unbiased random bits generated per coin toss is asymptoticay equa to the entropy of the biased coin. On the other hand, Knuth and Yao [6] gave a simpe procedure to generate arbitrary distribution from an unbiased coin. Han and Hoshi [7] generaized this probem to consider the case that the given coin is biased with a known distribution. In this paper, we study the probem of generating random bits from an arbitrary and unknown finite Markov chain. The input to our probem is a sequence of symbos that represent a random trajectory through the states of the Markov Hongchao Zhou and Jehoshua Bruck are with the Department of Eectrica Engineering, Caifornia Institute of Technoogy, Pasadena, CA 91125, USA, e-mai: hzhou@catech.edu; bruck@catech.edu. This work was supported in part by the SF Expeditions in Computing Program under grant CCF-0832824. chain - given this input sequence our agorithm generates an independent unbiased binary sequence caed the output sequence. This probem was first studied by Samueson [8]. His approach was to focus on a singe state (ignoring the other states) treat the transitions out of this state as the input process, hence, reducing the probem of correated sources to the probem of a singe random source; obviousy, this method is not efficient. Eias [5] suggested to make good use of a the states, for each state we produce an independent output sequence from the transitions out of this state, then the output sequence can be generated by pasting (concatenating) the coection of output sequences. However, neither Samueson nor Eias proved that their methods work for arbitrary Markov chains. In fact, Bum [9] probaby reaized it, as he mentioned that: (i) Eias s agorithm is exceent, but certain difficuties arise in trying to use it (or the origina von eumman scheme) to generate bits in expected inear time from a Markov chain, and (ii) Eias has suggested a way to use a the symbos produced by a MC (Markov Chain). His agorithm approaches the maximum possibe efficiency for a one-state MC. For a muti-state MC, his agorithm produces arbitrariy ong finite sequences. He does not, however, show how to paste these finite sequences together to produce infinitey ong independent unbiased sequences. Bum [9] worked on this probem and derived a beautifu agorithm to generate random bits from a degree-2 Markov chain in expected inear time by extending the singe coin von eumann scheme. Whie his approach can be extended to arbitrary out-degrees (the genera Markov chain mode used in this paper), the information-efficiency is sti far from optimaity due to the imitation (compared to Eias s agorithm) of von eumman scheme. In this paper, we generaize Bum s agorithm to arbitrary degree finite Markov chains and combine it with Eias s method for efficient generation of unbiased bits. As a resut, we provide the first known agorithm that generates unbiased random bits from arbitrary finite Markov chains, operates in expected inear time and achieves the information-theoretic upper bound on efficiency. Specificay, we propose Agorithm A, which is a simpe modification of Eias s suggestion to generate random bits, it operates on finite sequences and its efficiency can reach the information-theoretic upper bound in the imit of ong input sequences. There are severa variants of Agorithm A. In addition, we propose Agorithm B, it is a combination of Bum s and Eias s agorithms, it generates infinitey ong sequences of random bits in expected inear time. One of the ideas in our agorithms is that for the sake of achieving independence we ignore some input symbos. Hence, a natura question is: Can we improve the efficiency

2 by minimizing the number of input symbos we ignore? We provide a positive answer to this question and describe Agorithm C, it is the first known optima agorithm (in terms of efficiency) for random bits generation from an input sequence of ength, and it has poynomia time compexity. The remainder of this paper is organized as foows. Section II introduces Eias s schemes to generate random bits from any biased coin. Section III presents our main emma that aows the generaization to arbitrary Markov chains. Agorithm A is presented and anayzed in Section IV, it is a simpe modification of Eias s suggestion for Markov chain. Agorithm B is presented in Section V, it is a generaization of Bum s agorithm. An optima agorithm, caed Agorithm C, is provided in Section VI, it can generate random bits from an arbitrary Markov chain with the maxima efficiency. Section VII addresses the computationa compexity of our agorithms and shows that both Agorithm A and Agorithm C can generate outputs in O( og 3 og og ) time. Finay, Section VIII provides evauation of our agorithms through simuations. II. ELIAS S SCHEME GEERATIO OF RADOM BITS Consider a ength sequence generated by a biased n-face coin X = x 1 x 2...x {s 1, s 2,..., s n } such that the probabiity to get s i is p i, with n p i = 1. Whie we are given a sequence X the probabiities p 1, p 2,..., p n are unknown, the question is: How can we generate an independent and unbiased sequence of 0 s and 1 s from X? The efficiency of a generation agorithm is defined as the ratio between the expected ength of the output sequence and the ength of the input sequence. Eias [5] proposed an optima (in terms of efficiency) generation agorithm; for the sake of competeness we describe it here. His method is based on the foowing idea: The possibe n input sequences of ength can be partitioned into casses such that a the sequences in the same cass have the same number of s k s with 1 k n. ote that for every cass, the members of the cass have the same probabiity to be generated. For exampe, et n = 2 and = 4, we can divide the possibe n = 16 input sequences into 5 casses: S 0 = {s 1 s 1 s 1 s 1 } S 1 = {s 1 s 1 s 1 s 2, s 1 s 1 s 2 s 1, s 1 s 2 s 1 s 1, s 2 s 1 s 1 s 1 } S 2 = {s 1 s 1 s 2 s 2, s 1 s 2 s 1 s 2, s 1 s 2 s 2 s 1, s 2 s 1 s 1 s 2, s 2 s 1 s 2 s 1, s 2 s 2 s 1 s 1 } S 3 = {s 1 s 2 s 2 s 2, s 2 s 1 s 2 s 2, s 2 s 2 s 1 s 2, s 2 s 2 s 2 s 1 } S 4 = {s 2 s 2 s 2 s 2 } ow, our goa is to assign a string of bits (the output) to each possibe input sequence, such that any two output sequences Y and Y with the same ength (say k), have the same probabiity to be generated, namey c k 2 for some 0 c n k 1. The idea is that for any given cass we partition the members of the cass to groups of sizes that are a power of 2, for a group with 2 i members (for some i) we assign binary strings of ength i. ote that when the cass size is odd we have to excude one member of this cass. We now demonstrate the idea using the exampe above. ote that in the exampe above, we cannot assign any bits to the sequence in S 0, so if the input sequence is s 1 s 1 s 1 s 1, the output sequence shoud be ϕ (denotes the empty sequence). There are 4 sequences in S 1 and we assign the binary strings as foows: s 1 s 1 s 1 s 2 00, s 1 s 1 s 2 s 1 01 s 1 s 2 s 1 s 1 10, s 2 s 1 s 1 s 1 11 Simiary, for S 2, there are 6 sequences that can be divided into a group of 4 and a group of 2: s 1 s 1 s 2 s 2 00, s 1 s 2 s 1 s 2 01 s 1 s 2 s 2 s 1 10, s 2 s 1 s 1 s 2 11 s 2 s 1 s 2 s 1 0, s 2 s 2 s 1 s 1 1 In genera, for a cass with m members that were not assigned yet, assign 2 j possibe output binary sequences of ength j to 2 j distinct unassigned members, where 2 j W < 2 j+1. Repeat the procedure above for the rest of the members that were not assigned. ote that when a cass has odd number of members, there wi be one and ony one member assigned to ϕ (the empty string). The foregoing assignment agorithm is not efficient as it is using a ookup tabe, however, efficient assignment can be achieved by using the exicographic order. Given input sequence X of ength, the output sequence can be written as a function of X, denoted by f E (X), caed Eias s function. Pae and Loui [10] showed that Eias s function is computabe in poynomia time. ext we describe the procedure to compute f E (X) using the concept of a rank. Computing the Eias Function 1) Given X, determine S(X), the size of the cass that incudes X. It is the corresponding mutinomia coefficient. 2) Compute the rank r(x) of X in the cass with respect to the exicographica order (assume that the rank of the first eement is 0). 3) Determine the output sequence based on r(x) and S(X), as foows: Let S(X) = α 2 + α 1 2 1 +... + α 0 2 0 where α, α 1,..., α 0 is the binary expansion of integer S k with α = 1. If r(x) < 2 then f E (X) is the digit binary representation of X. If α 2 +... + α i+1 2 i+1 r(x) < α 2 +... + α i 2 i then f E (X) is i digit binary representation of r(x) α 2 α 1 2 1... α i+1 2 i+1 For exampe, consider the cass S 2 in the exampe above, for each X S 2, we have S(X) = 110

3 then we have the foowing: r(s 1 s 1 s 2 s 2 ) = 000 f E (s 1 s 1 s 2 s 2 ) = 00 r(s 1 s 2 s 1 s 2 ) = 001 f E (s 1 s 2 s 1 s 2 ) = 01 r(s 1 s 2 s 2 s 1 ) = 010 f E (s 1 s 1 s 2 s 2 ) = 10 r(s 2 s 1 s 1 s 2 ) = 011 f E (s 1 s 1 s 2 s 2 ) = 11 r(s 2 s 1 s 2 s 1 ) = 100 f E (s 1 s 1 s 2 s 2 ) = 00 r(s 2 s 2 s 1 s 1 ) = 101 f E (s 1 s 1 s 2 s 2 ) = 1 The foowing property foows directy from the agorithm for computing the Eias function. It basicay says that for a given cass, if we generate a binary sequence of ength k, we aso generate a the possibe 2 k binary sequences of ength k. We wi appy this property in the proofs in this paper. Lemma 1 (Property of f E ). Given an input sequence X of ength such that Y = f E (X) {0, 1} k. Then for any Y {0, 1} k, there exists one and ony one input sequence X, that is a permutation of X, such that f E (X ) = Y. III. MAI LEMMA: EQUIVALECE OF EXIT SEQUECES Our goa is to efficienty generate random bits from a Markov chain with unknown transition probabiities. The paradigm we study is that a Markov chain generates the sequence of states that it is visiting and this sequence of states is the input sequence to our agorithm for generating random bits. Specificay, we express an input sequence as X = x 1 x 2...x with x i {s 1, s 2,..., s n }, where {s 1, s 2,..., s n } indicate the states of a Markov chain. We first define the foowing notations: x a : the a th eement of X X[a] : the a th eement of X X[a : b] : subsequence of X from the a th to b th eement X a : X[1 : a] Y X : Y is a permutation of X Y =. X : Y is a permutation of X and y Y = x X The key idea is that for a given Markov chain, we can treat each state as a coin and treat its exits as the corresponding coin tosses. amey, we can generate a coection of sequences π(x) = [π 1 (X), π 2 (X),..., π n (X)], caed exit sequences, where π i (X) is the sequence of states foowing s i in X, namey, π i (X) = {x j+1 x j = s i, 1 j < } For exampe, assume that the input sequence is In genera, the exit sequences are: π 1 (X) = s 4 s 3 s 1 s 2 π 2 (X) = s 1 s 3 s 3 π 3 (X) = s 2 s 1 s 4 π 4 (X) = s 2 s 1 Lemma 2 (Uniqueness). An input sequence X can be uniquey determined by x 1 and π(x). Proof: Given x 1 and π(x), according to the work of Bum in [9], x 1 x 2...x can uniquey be constructed in the foowing way: Initiay, set the starting state as x 1. Inductivey, if x i = s k, then set x i+1 as the first eement in π k (X) and remove the first eement of π k (X). Finay, we can uniquey generate the sequence x 1 x 2...x. Lemma 3 (Equa-probabiity). Two input sequences X = x 1 x 2...x and Y = y 1 y 2...y with x 1 = y 1 have the same probabiity to be generated if π i (X) π i (Y ) for a 1 i n. Proof: ote that the probabiity to generate X is P [X] = P [x 1 ]P [x 2 x 1 ]...P [x x 1 ] and the probabiity to generate Y is P [Y ] = P [y 1 ]P [y 2 y 1 ]...P [y y 1 ] By permutating the terms in the expression above, it is not hard to get that P [X] = P [Y ] if x 1 = y 1 and π i (X) π i (Y ) for a 1 i n. Basicay, the exit sequences describe the edges that are used in the trajectory in the Markov chain. The edges in the trajectories that correspond to X and Y are identica, hence P [X] = P [Y ]. In [8], Samueson considered a two-state Markov chain, and he pointed out that it may generate unbiased random bits by appying von eumman scheme to the exit sequence of state s 1. Later, in [5], in order to increase the efficiency, Eias has suggested a way to use a the symbos produced by a Markov chain. His main idea is to concatenate the output sequences that correspond to π 1 (X), π 2 (X),... as the fina output. However, neither Samueson nor Eias showed how to correcty paste output sequences of different parts to generate the fina output, and they aso did not prove that their methods can generate unbiased random bits for an arbitrary Markov chain. ow we consider two straightforward methods: (1) Ony use f E (π 1 (X)) as the fina output. (2) Concatenate a the sequences f E (π 1 (X)), f E (π 2 (X)),... as the fina output. However, neither of them can generate random bits from an arbitrary Markov chain. Let s consider a simpe exampe of a two-state Markov chain in which P [s 2 s 1 ] = p 1 and P [s 1 s 2 ] = p 2, as shown in Fig. 1. Assume an input X = s 1 s 4 s 2 s 1 s 3 s 2 s 3 s 1 s 1 s 2 s 3 s 4 s 1 If we consider the states foowing s 1 we get π 1 (X) as the set of states in bodface: X = s 1 s 4 s 2 s 1 s 3 s 2 s 3 s 1 s 1 s 2 s 3 s 4 s 1 Fig. 1. An instance of Markov chain.

4 Input sequence Probabiity f E (π 1 (X)) f E (π 1 (X)) f E (π 2 (X)) s 1 s 1 s 1 s 1 (1 p 1 ) 3 ϕ ϕ s 1 s 1 s 1 s 2 (1 p 1 ) 2 p 1 0 0 s 1 s 1 s 2 s 1 (1 p 1 )p 1 p 2 0 0 s 1 s 1 s 2 s 2 (1 p 1 )p 1 (1 p 2 ) 0 0 s 1 s 2 s 1 s 1 p 1 p 2 (1 p 1 ) 1 1 s 1 s 2 s 1 s 2 p 2 1 p 2 ϕ ϕ s 1 s 2 s 2 s 1 p 1 (1 p 2 )p 2 ϕ 1 s 1 s 2 s 2 s 2 p 1 (1 p 2 ) 2 ϕ ϕ TABLE I sequence with ength = 4 is generated from this Markov chain and the starting state is s 1, then the probabiities of the possibe input sequences and their corresponding output sequences (based on direct concatenation) are given in Tabe I. We can see that when the input sequence ength = 4, the probabiities to produce a bit 0 or 1 are different for some p 1, p 2 in both of the methods. It is not easy to figure out how to generate random bits from an arbitrary Markov chain, as Bum said in [9]: Eias s agorithm is exceent, but certain difficuties arise in trying to use it (or the origina von eumann scheme) to generate random bits in expected inear time from a Markov chain. What are the difficuties? The difficuties of generating random bits from an arbitrary Markov chain in expected inear (or poynomia) time come from the fact that it is hard to extract independence from the given Markov chain. It seems that the exit sequence of each state is independent since each exit of this state wi not affect the other exits. However, this is not aways true. When the ength of the input sequence is given, say, then the exit sequence of a state may not be independent. Let s sti consider the exampe of a twostate Markov chain in Fig. 1. Assume the starting state of this Markov chain is s 1, if 1 p 1 > 0, then with non-zero probabiity we have π 1 (X) = s 1 s 1...s 1 whose ength is 1. But it is impossibe to have π 1 (X) = s 2 s 2...s 2 of ength 1. That means π 1 (X) is not an independent sequence. The main reason is that athough each exit of a state wi not affect the other exits, it wi affect the ength of the exit sequence. In fact, π 1 (X) is an independent sequence if the ength of π 1 (X) is given, not the ength of X. So far, we know that it is hard to get an independent sequence from an arbitrary Markov chain efficienty. So we have to consider this probem from another aspect. According to Lemma 3, we know that permutating the exit sequences does not change the probabiity of a sequence if the new sequence exists after the permutations. However, whether the new sequence exists or not depends on the permutations on the exit sequences. Let s go back to the initia exampe, where and X = s 1 s 4 s 2 s 1 s 3 s 2 s 3 s 1 s 1 s 2 s 3 s 4 s 1 π(x) = [s 4 s 3 s 1 s 2, s 1 s 3 s 3, s 2 s 1 s 4, s 2 s 1 ] If we permutate the ast exit sequence s 2 s 1 to s 1 s 2, we cannot get a new sequence such that whose starting state is s 1 and exit sequences are [s 4 s 3 s 1 s 2, s 1 s 3 s 3, s 2 s 1 s 4, s 1 s 2 ] This can be verified by simpy trying to construct a sequence, using the method of Bum (which is given in the proof of Lemma 2). But if we permutate the first exit sequence s 4 s 3 s 1 s 2 into s 1 s 2 s 3 s 4, we can find such a new sequence, which is Y = s 1 s 1 s 2 s 1 s 3 s 2 s 3 s 1 s 4 s 2 s 3 s 4 s 1 This observation motivated us to study the characterization of exit sequences that are feasibe in Markov chains (or finite state machines). Definition 1 (Feasibiity). Given a starting state s α and a coection of sequences Λ = [Λ 1, Λ 2,..., Λ n ], we say that (s α, Λ) is feasibe if and ony if there exists a sequence X and a Markov chain such that x 1 = s α and π(x) = Λ. Based on the definition of feasibiity, we present the main technica emma of the paper. Repeating the definition from the beginning of the section, we say that a sequence Y is a tai-fixed permutation of X, denoted as Y. = X, if and ony if (1) Y is a permutation of X, and (2) X and Y have the same ast eement, namey, y Y = x X. Lemma 4 (Main Lemma: Equivaence of Exit Sequences). Given a starting state s α and two coections of sequences Λ = [Λ 1, Λ 2,..., Λ n ] and Γ = [Γ 1, Γ 2,..., Γ n ] such that Λ i. = Γ i (tai-fixed permutation) for a 1 i n. Then (s α, Λ) is feasibe if and ony if (s α, Γ) is feasibe. Proof: In the rest of this section we wi prove the main emma. To iustrate the caim in the emma, we express express s α and Λ by a a directed graph that has abes on the vertices and edges. For exampe, when s α = s 1 and Λ = [s 4 s 3 s 1 s 2, s 1 s 3 s 3, s 2 s 1 s 4, s 2 s 1 ], we have the directed graph in Fig. 2. The vertex set and the edge set V = {s 0, s 1, s 2,..., s n } E = {(s i, Λ i [k])} {(s 0, s α )} For each edge (s i, Λ i [k]), the abe of this edge is k. For the edge (s 0, s α ), the abe is 1. amey, the abe set of the outgoing edges of each state is {1, 2,...}.

5 Fig. 2. Directed graph G with abes. Fig. 3. Directed graph G with new abes. Given the abeing of the directed graph as defined above, we say that it contains a compete wak if there is a path in the graph that visits a the edges, without visiting an edge twice, in the foowing way: (1) Start from s 0. (2) At each vertex, we choose an unvisited edge with the minima abe and foow this edge. Obviousy, the abeing corresponding to (s α, Λ) is a compete wak if and ony if (s α, Λ) is feasibe. In this case, for short, we aso say that (s α, Λ) is a compete wak. Before continuing to prove the main emma, we first give Lemma 5 and Lemma 6. Lemma 5. Assume (s α, Λ) with Λ = [Λ 1, Λ 2,..., Λ n ] is a a compete wak, which ends at state s χ. Then (s α, Γ) with Γ = [Λ 1,..., Γ χ,..., Λ n ] is aso a compete wak ending at s χ, if Λ χ Γ χ (permutation). Proof: The proof of this emma is given in the appendix. For exampe, we know that, when s α = s 1, Λ = [s 4 s 3 s 1 s 2, s 1 s 3 s 3, s 2 s 1 s 4, s 2 s 1 ], (s α, Λ) is feasibe. The abeing on a directed graph corresponding to (s α, Λ) is given in Fig. 2, which is a compete wak starting at state s 0 and ending at state s 1. The path of the wak is s 0 s 1 s 4 s 2 s 1 s 3 s 2 s 3 s 1 s 1 s 2 s 3 s 4 s 1 By permutating the abes of the outgoing edges of s 1, we can have the graph as shown in Fig. 3. The new abeing on G is aso a compete wak ending at state s 1, and its path is s 0 s 1 s 1 s 2 s 1 s 3 s 2 s 3 s 1 s 4 s 2 s 3 s 4 s 1 Based on Lemma 5 above, we have Lemma 6. Given a starting state s α and two coections of sequences Λ = [Λ 1, Λ 2,..., Λ n ] and Γ = [Λ 1,..., Γ k,..., Λ n ]. such that Γ k = Λk (tai-fixed permutation). Then (s α, Λ) and (s α, Γ) have the same feasibiity. Proof: We prove that if (s α, Λ) is feasibe, then (s α, Γ) is aso feasibe. If (s α, Λ) is feasibe, there exists a sequence X such that s α = x 1 and Λ = π(x). Suppose its ast eement is x = s χ. When k = χ, according to Lemma 5, we know that (s α, Γ) is feasibe. When k χ, we assume that Λ k = π k (X) = x k1 x k2...x kw. Let s consider the subsequence X = x 1 x 2...x kw 1 of X. Then π k (X) = Λ Λ k 1 k and the ast eement of X is s k. According to Lemma 5, we can get that: there exists a sequence x 1x 2...x k w 1 with x 1 = x 1 and x k w 1 = x k w 1 such that π(x 1x 2...x k w 1) = [π 1 (X),..., Γ Γ k 1 k, π k+1 (X),..., π n (X)] because Γ Γ k 1 k Λ Λ k 1 k Let x k w x k w +1...x = x k w x kw+1...x, i.e., concatenating x kw x kw +1...x to the end of x 1x 2...x k w 1, we can generate a sequence x 1x 2...x such that its exit sequence of state s k is Γ Γ k 1 k x kw = Γ k and its exit sequence of state s i with i k is Λ i = π i (X). So if (s α, Λ) is feasibe, then (s α, Γ) is aso feasibe. Simiary, if (s α, Γ) is feasibe, then (s α, Λ) is feasibe. As a resut, (s α, Λ) and (s α, Γ) have the same feasibiity. According to the emma above, we know that (s α, [Λ 1, Λ 2,..., Λ n ]) and (s α, [Γ 1, Λ 2,..., Λ n ]) have the same feasibiity, (s α, [Γ 1, Λ 2,..., Λ n ]) and (s α, [Γ 1, Γ 2,..., Λ n ]) have the same feasibiity,..., (s α, [Γ 1, Γ 2,..., Γ n 1, Λ n ]) and (s α, [Γ 1, Γ 2,..., Γ n ]) have the same feasibiity, so the statement in the main emma is true. Here, we give an equivaent statement of the Main Lemma (Lemma 4). Lemma 7. Given an input sequence X = x 1 x 2...x with x = s χ, produced from a Markov chain, for any [Λ 1, Λ 2,..., Λ n ] with 1) Λ i is a permutation ( ) of π i (X) for i = χ. 2) Λ i is a tai-fixed permutation ( =). of π i (X) for i χ. then there exists one sequence X = x 1x 2...x such that x 1 = x 1 and π(x ) = [Λ 1, Λ 2,..., Λ n ]. For this X, we have x = x. One might reason that Lemma 7 is stronger than the Main Lemma (Lemma 4). However, we wi show that these two emmas are equivaent. It is obvious that if the statement in

6 Lemma 7 is true, then the statement in the Main Lemma is aso true. ow we show that if the statement in the Main Lemma is true then the statement in Lemma 7 is aso true. Proof: Given X = x 1 x 2...x, et s add one more symbo s n+1 to the end of X (s n+1 is different from a the states in X), then we can get a new sequence x 1 x 2...x s n+1, whose exit sequences are [π 1 (X), π 2 (X),..., π χ (X)s n+1,..., π n (X), ϕ] According to the Main emma, we know that there exists another sequence x 1x 2...x x +1 such that its exit sequences are [Λ 1, Λ 2,..., Λ χ s n+1,...λ n, ϕ] and x 1 = x 1. Definitey, the ast symbo of this sequence is s n+1, i.e., x +1 = s n+1. As a resut, we have x = s χ. ow, by removing the ast eement from x 1x 2...x x +1, we can get a new sequence x = x 1x 2...x such that its exit sequences are [Λ 1, Λ 2,..., Λ χ,...λ n ] and x 1 = x 1. And more, we have x = s χ. This competes the proof. We can demonstrate our resuts on the equivaence of the main emma by considering the exampe at the beginning of this section. Let X = s 1 s 4 s 2 s 1 s 3 s 2 s 3 s 1 s 1 s 2 s 3 s 4 s 1 with χ = 1 and its exit sequences is given by [s 4 s 3 s 1 s 2, s 1 s 3 s 3, s 2 s 1 s 4, s 2 s 1 ] After permutating a the exit sequences (for i 1, we keep the ast eement of the i th sequence fixed), we get a new group of exit sequences [s 1 s 2 s 3 s 4, s 3 s 1 s 3, s 1 s 2 s 4, s 2 s 1 ] Based on these new exit sequences, we can generate a new input sequence X = s 1 s 1 s 2 s 3 s 1 s 3 s 2 s 1 s 4 s 2 s 3 s 4 s 1 IV. ALGORITHM A : MODIFICATIO OF ELIAS S SUGGESTIO In the section above, we see that Eias suggested to paste the outputs of different exit sequences together, as the fina output, but the simpe direct concatenation cannot aways work. By modifying the method to paste these outputs, we get Agorithm A to generate unbiased random bits from any Markov chain. Agorithm A Input: A sequence X = x 1 x 2...x produced by a Markov chain, where x i S = {s 1, s 2,..., s n }. Output: A sequence of 0 s and 1 s. Main Function: Suppose x = s χ. for i := 1 to n do if i = χ then Output f E (π i (X)). ese Output f E (π i (X) π i(x) 1 ) end if end for Comment: f E (X) is Eias s function. The ony difference between Agorithm A and direct concatenation is that: Agorithm A ignores the ast symbos of some exit sequences. Let s go back to the exampe of a twostate Markov chain with P [s 2 s 1 ] = p 1 and P [s 1 s 2 ] = p 2 in Fig. 1, which demonstrates that two simpe methods (incuding direct concatenation) do not aways work we. Here, we sti assume that an input sequence with ength = 4 is generated from this Markov chain and the starting state is s 1, then the probabiity of each possibe input sequence and its corresponding output sequence (based on Agorithm A) are given by: Input sequence Probabiity Output sequence s 1 s 1 s 1 s 1 (1 p 1 ) 3 ϕ s 1 s 1 s 1 s 2 (1 p 1 ) 2 p 1 ϕ s 1 s 1 s 2 s 1 (1 p 1 )p 1 p 2 0 s 1 s 1 s 2 s 2 (1 p 1 )p 1 (1 p 2 ) ϕ s 1 s 2 s 1 s 1 p 1 p 2 (1 p 1 ) 1 s 1 s 2 s 1 s 2 p 2 1p 2 ϕ s 1 s 2 s 2 s 1 p 1 (1 p 2 )p 2 ϕ s 1 s 2 s 2 s 2 p 1 (1 p 2 ) 2 ϕ We can see that when the input sequence ength = 4, a bit 0 and a bit 1 have the same probabiity to be generated and no onger sequences can be generated. In this case, the output sequence is independent and unbiased. ow, our goa is to prove that a the sequences generated by Agorithm A are independent and unbiased. In order to prove this, we need to show that for any sequences Y and Y of the same ength, they have the same probabiity to be generated. Let S u (Y ) denote the set of input sequences that underies output sequence Y in Agorithm A. Then in the foowing emma, we wi show that there is a one-to-one mapping between S u (Y ) and S u (Y ) if Y = Y. Lemma 8 (Mapping for Agorithm A). In Agorithm A, assume input sequence X = x 1 x 2...x underies an output sequence Y {0, 1} k for some k, then for any Y {0, 1} k, there exists one and ony one sequence X = x 1x 2...x which underies Y such that 1) x 1 = x 1 and x = x = s χ for some χ. 2) If i = χ, π i (X ) π i (X) and f E (π i (X )) = f E (π i (X)) 3) For a i χ, π i (X ). = π i (X) and f E (π i (X ) πi(x ) 1 ) = f E (π i (X) πi(x) 1 ) Proof: First we prove that for any Y with Y = Y, there exists such input sequence X = x 1x 2...x satisfying a the requirements. ow, given a desired sequence Y, we

7 try to construct X. Let s spit Y into severa segments, Y 1, Y 2,..., Y n, namey Y = Y 1Y 2...Y n such that the ength of Y i for 1 i n is given by { Y i f = E (π i (X)) if i = χ f E (π i (X) πi(x) 1 ) if i χ According to Lemma 1, if i = χ, we can find Λ i π i (X) such that f E (Λ i ) = Y i. If i χ, we can find Λ i such that Λ Λi 1 i π i (X) πi(x) 1, Λ i [ Λ i ] = π i (X)[ π i (X) ] and f E (Λ Λ i 1 i ) = Y i. Based on Lemma 7 (the equivaent statement of the Main Lemma), there exists a sequence X = x 1x 2...x with x 1 = x 1, x = x and its exit sequences are (Λ 1, Λ 2,..., Λ n). It is not hard to check that X satisfies a the requirements in the emma. ext, we prove that there are at most one sequence satisfying the requirements in the emma. Assume there are two (or more) sequences U, V satisfying the requirements. According to Lemma 2, we know that they have different exit sequences. Then there exists a state s i, such that π i (U) π i (V ). If i = χ, we aso have π i (U) π i (V ), so we have f E (π i (U)) f E (π i (V )), that means different Y i are produced, which contradicts with our assumption. If i χ, we wi have π i (U) =. π i (V ), so we have f E (π i (U) πi(u) 1 ) f E (π i (V ) π i(v ) 1 ), which aso ead to contradictions. This competes the proof. Theorem 1 (Agorithm A). Let the sequence generated by a Markov chain be used as input to agorithm A, then the output of Agorithm A is an independent unbiased sequence. Proof: Assume the ength of input sequence is, then we want to show that for any Y, Y {0, 1} k, Y and Y have the same probabiity to be generated. Let S u (Y ) and S u (Y ) denote the sets of input sequences underying output sequences Y and Y. According to Lemma 8 above, there is a one-to-one mapping between the eements in S u (Y ) and S u (Y ). Based on Lemma 3, we know that this mapping does not change the probabiity for sequences to be generated. As a resut, from a Markov chain, the probabiity to generate a sequence in S u (Y ) is equa to the probabiity to generate a sequence in S u (Y ). So Y and Y have the same probabiity to be generated. It means that any sequence of the same ength k wi be generated with the same probabiity. The resut in the theorem is immediate from this concusion. Theorem 2 (Efficiency). Let X be a sequence of ength generated by a Markov chain, which is used as input to agorithm A. Suppose the ength of its output sequence is M, then the imiting efficiency η the upper bound H(X). = E[M] as reaizes Proof: Here, the upper bound H(X) is provided by Eias [5]. We can use the same argument in Eias s paper [5] to prove this theorem. Let X i denote the next state of s i. Obviousy, X i is a random variabe for 1 i n, whose entropy is denoted as H(X i ). Let u denote the stationary distribution of MC, then we have n H(X) im = u i H(X i ) When, there exists an ϵ which 0, such that with probabiity 1 ϵ, π i (X) > (u i ϵ ) for a 1 i n. Using Agorithm A, with probabiity 1 ϵ, the ength M of the output sequence is bounded beow by n (1 ϵ )( π i (X) 1)η i where η i is the efficiency of the f E (.) when the input is π i (X) or π i (X) πi(x) 1. According to Theorem 2 in Eias s paper [5], we know that as π i (X), η i H(X i ). So with probabiity 1 ϵ, the ength M of the output sequence is bounded beow by (1 ϵ )((u i ϵ ) 1)(1 ϵ )H(X i ) Then we have E[M] im im = im [ (1 ϵ ) 3 ((u i ϵ ) 1)H(X i )] H(X) At the same time, E[M] H(X) is upper bounded by. Then we have E[M] im = im H(X) which competes the proof. Given an input sequence, it is efficient to generate independent unbiased sequences using Agorithm A. However, it has some imitations: (1) The compete input sequence has to be stored. (2) For a ong input sequence it is computationay intensive as it depends on the input ength. (3) The method works for finite sequences and does not end itsef to stream processing. In order to address these imitations, we can improve Agorithm A into two variants. In the first variant of Agorithm A, instead of appying Eias s function directy to Λ i = π i (X) for i = χ (or Λ i = π i (X) πi(x) 1 for i χ), we first spit Λ i into severa segments with engths k i1, k i2,... then appy Eias s function to a of the segments separatey. It can be proved that this variant of Agorithm A can generate independent unbiased sequences from an arbitrary Markov chain, as ong as k i1, k i2,... do not depend on the order of eements in each exit sequence. For exampe, we can spit Λ i into two segments of engths λ i 2 and λi 2, we can aso spit it into three segments of engths (a, a, Λ i 2a)... Generay, the shorter each segment is, the faster we can obtain the fina output. But at the same time, we may have to sacrifice a itte information efficiency. The second variant of Agorithm A is based on the foowing idea: for a given sequence from a Markov chain, we can spit it into some sequences such that they are independent

8 of each other, therefore we can appy Agorithm A to a of the sequences and then concatenate their output sequences together as the fina one. In order to do this, given a sequence X = x 1 x 2..., we can use x 1 = s α as a specia state to cut the sequence X. For exampe, in practice, we can set a constant k, if there exists a minima integer i such that x i = s α and i > k, then we can spit X into two sequences x 1 x 2...x i and x i x i+1... (note that both of the sequences have the eement x i ). For the second sequence x i x i+1..., we can repeat the some procedure... Iterativey, we can spit a sequence X into severa sequences such that they are independent of each other. These sequences, with the exception of the ast one, start and end with s α. V. ALGORITHM B : GEERALIZATIO OF BLUM S ALGORITHM In [9], Bum proposed a beautifu agorithm to generate an independent unbiased sequence of 0 s and 1 s from any Markov chain by extending von eumann scheme. His agorithm can dea with infinitey ong sequences and use ony constant space and expected inear time. The ony drawback of his agorithm is that its efficiency is sti far from the information-theoretic upper bound, due to the imitation (compared to Eiass agorithm) of von eumman scheme. In this section, we propose Agorithm B by appying Eias scheme instead of von eumann scheme, as a generaization of Bum s agorithm. Agorithm B keeps some good properties of Bum s agorithm, at the same time, its efficiency can get approach to the information-theoretic upper bound. Agorithm B Input: A sequence (or a stream) x 1 x 2... produced by a Markov chain, where x i {s 1, s 2,..., s n }. Parameter: n positive integer functions (window size) ϖ i (k) with k 1 for 1 i n. Output: A sequence (or a stream) of 0 s and 1 s. Main Function: E i = ϕ (empty) for a 1 i n. k i = 1 for a 1 i n. c : the index of current state, namey, s c = x 1. whie next input symbo is s j ( nu) do if E j ϖ j (k j ) then Output f E (E j ). E j = ϕ. k j = k j + 1. end if E c = E c s j (Add s j to E c ). c = j. end whie For exampe, we set ϖ i (k) = 4 for a 1 i n and the input sequence is The exit sequences of X is X = s 1 s 1 s 1 s 2 s 2 s 2 s 1 s 2 s 2 π(x) = [s 1 s 1 s 2 s 2, s 2 s 2 s 1 s 2 ] (a) X (b) X Fig. 4. The simpified expressions for the exit sequences of X and X. If we treat X as input to the agorithm above, we can get the output sequence is f E (s 2 s 2 s 1 s 2 ). Here, the agorithm does not output f E (s 1 s 2 s 2 s 2 ) unti the Markov chain reaches state s 1 again. Timing is crucia. ote that Bum s agorithm is a specia case of Agorithm B by setting the window size functions ϖ i (k) = 2 for a 1 i n and k {1, 2,...}. amey, Agorithm B is a generaization of Bum s agorithm. Assume a sequence of symbos X = x 1 x 2...x with x = s χ have been read by the agorithm above, we want to show that for any, the output sequence is aways independent and unbiased. For a i with 1 i n, we can write π i (X) = F i1 F i2...f imi E i where F ij with 1 j m i are the segments used to generate outputs. For a i, j, we have F ij = ϖ i (j) and { 0 Ei < ϖ i (m i + 1) if i = χ 0 < E i ϖ i (m i + 1) otherwise See Fig. 4(a) for simpe iustration. Lemma 9 (Mapping for Agorithm B). In Agorithm B, assume input sequence X = x 1 x 2...x underies an output sequence Y {0, 1} k for some k, then for any Y {0, 1} k, there exists one and ony one sequence X = x 1x 2...x which underies Y such that 1) x 1 = x 1. 2) For a i with 1 i n 3) For a i, j, we have π i (X) = F i1 F i2...f imi E i π i (X ) = F i1f i2...f im i E i F ij F ij and f E (F ij ) = f E (F ij) Proof: Let s construct such a sequence X satisfying a the requirements in the emma: 1) Let S = {(i, j) 1 i n, 1 j m i }. Let w = i m i be the number of segments generated from X.

9 2) Construct a sequence X with π i (X) = F i1 F i2...f imi E i such that a) x 1 = x 1. b) For a i, j, { Fij if (i, j) S F ij = otherwise F ij as shown in Fig. 4(b). According to the equivaent statement of the main emma (Lemma 7), we know that such sequence X exists and it is unique. 3) Let X be the input sequence, and F i1 j 1, F i2 j 2,..., F iw j w be the ordered segments used to produce outputs, i.e., when the input sequence is X, the output sequence is f E (F i1 j 1 )f E (F i2 j 2 )...f E (F iw j w ) 4) Let k = max{m 1 m w, (i m, j m ) S}, then we do the foowing operations: a) Let S = S/(i k, j k ). b) Let = f E (F ik j k ), then set F i k j k such that and F i k j k F ik j k f E (F i k j k ) = Y [ Y + 1 : Y ] c) Let Y = Y [1 : Y ]. 5) Repeat step 2-4) unti S becomes empty (ϕ). 6) Repeat step 2) and 3) and et X = X. We want to prove that the X constructed using the procedure above satisfies a the requirements in Lemma 6. It is obvious that X satisfies the conditions 1) to 3) in Lemma 6, the ony thing that we need to prove is that X underies Y. ote that we have to repeat step 2) and 3) for w + 1 times. We want to prove that after the k th iteration of step 3), the foowing concusion is true: 1) Let G = f E (F iw k+2 j w k+2 )...f E (F iw j w ), which is a suffix of Y. 2) S = {(i m, j m ) 1 m w k + 1} Let s prove the concusion above by induction. First, after the first iteration, we can get that G = ϕ, so the concusion is true. ow, et s assume that this concusion is true after the k th iteration, we wi prove that this concusion is aso true after the k +1 th iteration when k w. In the foowing proof, a the symbos are for the k th iteration. In order to distinguish with the k th iteration, we use to mark the symbos for the k + 1 th iteration. According to our assumption, when input sequence is X, the output is f E (F i1 j 1 )f E (F i2 j 2 )...f E (F iw j w ) where {(i m, j m ) 1 m w k + 1} = S and G is a suffix of Y. According to step 4), we know that X is constructed from X by repacing F iw k+1 j w k+1 with F i w k+1 j w k+1. In the foowing, we prove that X satisfies the properties in the concusion. Let x α be the α th symbo in X such that once x α is provided the agorithm outputs f E (F iw k 1 j w k 1 ). When the input sequence is X α, which is a subsequence of X, the output sequence wi be f E (F i1 j 1 )f E (F i2 j 2 )...f E (F iw k 1 j w k 1 ) According to the equivaent statement of the main emma (Lemma 7), we know that if F iw k+1 j w k+1 is repaced with F i w k+1 j w k+1, we can get a new sequence U α with ength α such that when the input sequence is U α, the agorithm wi output f E (F i1 j 1 ),..., f E (F iw k j w k ) in some order and output f E (F i w k+1 j w k+1 ) at the end (after reading the ast symbo). ote that, after processing U α, a the vaues of the variabes stored are the same as those after processing X α. If we concatenate U α with X[α + 1 : ] we can get a new sequence X = U α X[α + 1 : ], which is exacty the X mentioned above. For this sequence X, it has the foowing properties: 1) Its output sequence ends with f E (F i w k+1 j w k+1 )F iw k+2 j w k+2 )...f E (F iw j w ) 2) S = S/(i w k+1, j w k+1 ). So far, it is not hard to see that f E (F i w k+1 j w k+1 )F iw k+2 j w k+2 )...f E (F iw j w ) is a suffix of Y and (i m, j m) = (i m, j m ) for a w k + 1 m w As a resut, the concusion above is true for the k + 1 th iteration. By induction, we know that the concusion is true for a 1 k w + 1. Let k = w + 1, we can get that f E (F i1j 1 ) +... + f E (F iwj w ) = Y i.e., X = X underies Y and this X satisfies a the requirements in the emma. ow, we show that there are at most one such sequence X satisfying a the requirements in the emma. Assume there are two sequences satisfying a the requirements, such that their outputs are and f E (G i1 j 1 )f E (G i2 j 2 )...f E (G iw j w ) f E (H i 1 j 1 )f E(H i 2 j 2 )...f E(H i w j w ) where G ij and H ij are segments to produce outputs for the two input sequences. By induction, we try to prove that for any 1 k w, i k = i k, j k = j k and G i k j k = H i k j k. When k = w, according to Lemma 7, it is not hard to get i w = i w, j w = j w. Therefore, G iwj w H i w j w. We aso know that f E(G iwj w ) = f E (H i w j w ), so G i k j k = H i k j k. Finay, the concusion is true for k = w. Assume the concusion is true for a < k w, using the same argument as above, we can aso get that the concusion is true for. As a resut, the concusion is true for a 1 k w. Based on Lemma 2 for uniqueness, we can say that the two sequences are the same, i.e, the sequence X is unique.

10 This competes the proof. Theorem 3 (Agorithm B). Let the sequence generated by a Markov chain be used as input to agorithm B, then Agorithm B can generate an independent unbiased sequence in expected inear time. Proof: In order to prove the theorem above, we can use the same method of the proof for Agorithm A. The ony difference is that in the proof we use the mapping obtained in Lemma 9. ormay, the window size functions ϖ i (k) for 1 i n can be any positive integer functions. Here, we fix these window size functions as a constant, namey, ϖ. By increasing the vaue of ϖ, we can increase the efficiency of the scheme, but at the same time it may cost more storage space and need more waiting time. It is important to anayze the reationship between scheme efficiency and window size ϖ. Theorem 4 (Efficiency). Let M C be a Markov chain with transition matrix P. Let the sequence generated by MC be used as input to agorithm B with constant window size ϖ, then as the ength of the sequence goes to infinity, the imiting efficiency of Agorithm B is n η(ϖ) = u i η i (ϖ) where u = (u 1, u 2,..., u n ) is the stationary distribution of this Markov chain, and η i (ϖ) is the efficiency of Eias s scheme when the input sequence of ength ϖ is generated by a n-face coin with distribution (p i1, p i2,..., p in ). Proof: Assume the input sequence is X. When, there exists an ϵ which 0, such that with probabiity 1 ϵ, (u i ϵ ) < π i (X) < (u i +ϵ ) for a 1 i n. The efficiency of Agorithm B can be written as η(ϖ), which satisfies n πi(x) 1 ϖ η i (ϖ)ϖ η(ϖ) With probabiity 1 ϵ, we have n ( (u i ϵ ) 1)η ϖ i(ϖ)ϖ η(ϖ) So when, we have that n η(ϖ) = u i η i (ϖ) n πi(x) ϖ η i(ϖ)ϖ n (u i ϵ ) η ϖ i(ϖ)ϖ This competes the proof. Let s define α() = n k 2 n k, where 2 n k is the standard binary expansion of. By anayzing Eias s scheme, we can get η i (ϖ) = 1 ϖ! α( ϖ k 1!k 2!...k n! )pk1 i1 pk2 i2...pkn in k 1+...+k n=ϖ Based on this formua, we can study the reationship between the imiting efficiency and the window size numericay (see Section VIII). When the window size becomes arge, the imiting efficiency (n ) can get approach to the informationtheoretic upper bound. VI. ALGORITHM C : OPTIMAL ALGORITHM In this section, we try to construct an optima agorithm, such that its information-efficiency is maximized, no matter how big the ength of the input sequence is. Foowing the definition of Pae and Loui in [10], we define randomizing functions: Definition 2 (Randomizing function [10]). A function f : {s 1, s 2,..., s n } {0, 1} is randomizing function if for each k and for each w {0, 1} k 1, P [Y [k] = 0 Y [1, k 1] = w] = P [Y [k] = 1 Y [1, k 1] = w] for any Markov chain, where Y {0, 1} is the output of the function. According to the definition, an agorithm can generate independent unbiased sequences if and ony if it can be treated as a randomizing function. Lemma 10 (ecessary condition for Randomizing function). If a function f : {s 1, s 2,..., s n } {0, 1} is a randomizing function, then for each w {0, 1}, B w0 = B w1, where B w = {X {s 1, s 2,..., s n } f(x) w = w} Let K = {k ij } be an n n non-negative integer matrix with n n j=1 k ij = 1. Let s define S K as S K = {X {s 1, s 2,..., s n } k j (π i (X)) = k ij } where k j (X) is the function to count the number of s j in X. Then for any such matrix K, we have B w0 SK = B w1 SK Proof: Pae and Loui proved this theorem in the case for coin fips. Using the same argument, we can prove the case for Markov chain. According to Lemma 2.2 in [10]: A function f is randomizing if and ony if P [X B w0 ] = P [X B w1 ]. Here we can write P [X B w0 ] = K n j=1 pk ij ij B w0 SK ϕ(k) where ϕ(k) = n is the probabiity to generate a sequence with exit sequences specified by K if such input sequence exists. It is easy to get that n n ϕ(k) = j=1 p k ij ij with p ij as the transition probabiity from state i to state j, and n i,j=1 k ij = 1 Simiary, P [X B w1 ] = K B w1 SK ϕ(k) So we have ( B w0 SK B w1 SK )ϕ(k) = 0 K The set of poynomias K {ϕ(k)} is ineary independent in the vector space of functions on [0, 1]. We can concude that

11 B w0 SK = B w1 SK. Since B w0 = K B w0 SK, we can aso get B w0 = B w1. Let s define α() = n k 2 n k, where 2 n k is the standard binary expansion of, then we have the foowing theorem. Lemma 11 (Sufficient condition for optima agorithm). If there exists a randomizing function f such that for any n n non-negative integer matrix K with n n j=1 k ij = 1, the foowing equation is satisfied, f (X) = α( S K ) X S K then f is optima in efficiency to generate independent unbiased sequence from a Markov chain with unknown transition probabiities. Proof: Let h denote an arbitrary randomizing function. According to Lemma 2.9 in [10], we know that X S K h(x) α( S K ) Then the average output ength of h is 1 h(x) ϕ(k) 1 α S K ϕ(k) K X S K K = 1 f (X) ϕ(k) K X S K So f is the optima one. This competes the proof. Based on the emmas above, we can construct the foowing agorithm (Agorithm C) which can generate unbiased random bits from an arbitrary Markov chain with maxima efficiency. Agorithm C Input: A sequence X = x 1 x 2..., x produced by a Markov chain, where x i S = {s 1, s 2,..., s n }. Output: A sequence of 0s and 1s. Main Function: 1) Get the matrix K = {k ij } with 2) Define S(X) as k ij = k j (π i (X)) S(X) = {X k j (π i (X )) = k ij i, j} then compute S(X). 3) Compute the rank r(x) of X in S(X) with respect to a given order. 4) According to S(X) and r(x), determine the output sequence. In Agorithm C, for an input sequence X with x = s χ, we can rank it with respect to the exicography order of θ(x) and σ(x). Here, we define θ(x) = (π 1 (X) π1 (X),..., π n (X) πn (X) ) which is the vector of the ast symbos of π i (X) for 1 i n. And σ(x) is the compement of θ(x) in π(x), namey, σ(x) = (π 1 (X) π 1(X) 1,..., π n (X) π n(x) 1 ) For exampe, when the input sequence is Its exit sequences is X = s 1 s 4 s 2 s 1 s 3 s 2 s 3 s 1 s 1 s 2 s 3 s 4 s 1 π(x) = [s 4 s 3 s 1 s 2, s 1 s 3 s 3, s 2 s 1 s 4, s 2 s 1 ] Then for this input sequence X, we have that θ(x) = s 2 s 3 s 4 s 1 σ(x) = [s 4 s 2 s 1, s 1 s 3, s 2 s 1, s 2 ] Based on the exicography order defined above, both S(X) and r(x) can be obtained using brute-force search. However, it needs too many computations. In the next section, we can show that both S(X) and r(x) is computabe in amost expected inear time. In the rest of this section, we show that Agorithm C can generate independent unbiased sequences from any Markov chain. Theorem 5 (Agorithm C). Let the sequence generated by a Markov chain be used as input to agorithm C, then the output of Agorithm C is an independent unbiased sequence. And more, Agorithm C is optima. Proof: Assume the ength of input sequence is, then we want to show that for any Y {0, 1} K and Y {0, 1} K, Y and Y have the same probabiity to be generated. Let S u (Y ) and S u (Y ) denote the set of input sequences underying output sequence Y and Y. For each sequence X S u (Y ), there exists one and ony one K such that X S K. According to Lemma 1, there exists one and ony one sequence X S K such that X underies Y. Since both X and X are in S K, they have the same probabiity to be generated. So a the eements in S u (Y ) and S u (Y ) are one-to-one mapping. As a resut, Y and Y have the same probabiity to be generated. Using the same argument as Theorem 1, we know that Agorithm C can generate independent unbiased sequences. Let s treat agorithm C as a function f C, then f C is a randomizing function. It is not hard to prove that f C satisfies the condition in Lemma 11. So f C is optima, furthermore, Agorithm C is optima. In Agorithm A, the imiting efficiency η = E[M] (as ) reaizes the bound H(X). Agorithm C is optima, so it has the same or higher efficiency. Therefore, the imiting efficiency of Agorithm C as aso reaizes the bound H(X). VII. FAST COMPUTATIO OF ALGORITHMS In [11], Ryabko and Matchikina showed that when the input sequence with ength is generated by a biased coin with two faces, then Eias s function is computabe in O( og 3 og og()) time. This resut is a itte surprising, because S k has O() bits and there are sti a ot of operations on S k to compute Eias s function. In this section, we wi first generaize Ryabko and Matchikina s procedure to the case that the input sequence with ength is generated by a biased n-face coin instead of a biased 2-face

12 coin. We wi show that their concusion about computationa compexity is sti true. As a resut, Agorithm A is computabe in O( og 3 og og()) time. We aso appy the same idea of fast computation to the optima agorithm (Agorithm C) such that the optima agorithm is aso computabe in O( og 3 og og()) time. A. Fast Computation of Eias s Function Let s define the order on states: (1) s i < s j iff i < j (2) s i = s j iff i = j. Given any sequence X {s 1, s 2,..., s n }, we et k i (X) be the number of s i s in X for a 1 i n. Then, we can write k(x) = (k 1 (X), k 2 (X),..., k n (X)) as a counting function. Given an input sequence X = x 1 x 2...x with x i {s 1, s 2,..., s n } from a n-face coin, we can produce the output sequence f E (X) based on the procedure in section II. In the procedure, the cass S(X) = {X k(x ) = k(x)} and r(x) is the rank of X in the cass S(X) with respect to the exicographica order. During this procedure, the computation of S(X) and r(x) dominates the computationa compexity of f E (X). In [12], P. Borwein pointed out that! is computabe in O((og og og ) 2 ) time. As a resut, S(X) is computabe in O((og og og ) 2 ) time. ow, et s consider the computation of r(x). Foowing the idea of [11], we suppose og to be an integer. Otherwise, we can add s 0 s at the beginning of X to make og an integer. It does not change the rank of X in S(X). Let s define r i (X) as the number of sequences X S(X) such that X i = X i and x i+1 < x i+1, then we can get that r(x) = r i (X) Let T i denote the subsequence of X from the ( i + 1) th symbo to the end, then we have k w (T i ) i! r i (X) = i k s w <x 1 (τ i )!...k n (τ i )! i+1 Define the vaues ρ 0 i = i k wi (T i ), λ0 i = k w (T i ) i s w<x i where w i is the index of the state x i+1, namey, x i+1 = s wi. Then r(x) = λ 0 i ρ 0 i ρ 0 i 1...ρ 0 1 In order to quicky cacuate r(x), the foowing cacuations are performed: ρ s i = ρ s 1 2i 1 ρs 1 2i, λ s i = λ s 1 2i 1 + λs 1 2i ρ s 1 2i s = 1, 2,..., og ; i = 1, 2,..., 2 s It is not difficut to get that 1 r(x) = λ og 1 Using the same argument as [11], we know that r(x) = λ og 1 is computabe in O( og 3 og og ) time, which eads us to the foowing emma. Lemma 12 (Computationa Compexity of f E ). Given any input sequence X = x 1 x 2...x with x i {s 1, s 2,..., s n } generated from a biased n-face coin, Eias s function f E (X) is computabe in O( og 3 og og ) time. Based on this emma, it is easy to get a coroary about Agorithm A. Theorem 6 (Computationa Compexity of Agorithm A). Let the sequence generated by a Markov chain be used as input to agorithm A. If the ength of input sequence is, then the output sequence can be obtained in O( og 3 og og ) time. B. Fast Computation of the Optima Agorithm ow, et s think about the computationa compexity of Agorithm C, which is aso dominated by S(X) and r(x). Intuitivey, in order to get the optima output sequence using Agorithm C, we may need much more computations than Agorithm A. Surprisingy, as the same as Agorithm A, we wi show that in Agorithm C, S(X) is computabe in O((og og og ) 2 ) time and r(x) is computabe in O( og 3 og og ) time. Lemma 13. S(X) in Agorithm C is computabe in O((og og og ) 2 ) time. Proof: The idea to compute S(X) in Agorithm C is that we can divide S(X) into different casses, denoted by S(X, θ) for different θ such that S(X, θ) = {X i, j, k j (π i (X )) = k ij, θ(x ) = θ} where k ij = k j (π i (X)) is the number of s j s in π i (X) for a 1 i, j n. θ(x) is the vector of the ast symbos of π(x) defined as in the ast section. As a resut, we have S(X) = θ S(X, θ). Athough it is not easy to cacuate S(X) directy, but it is much easier to compute S(X, θ) for a given θ. For a given θ = (θ 1, θ 2,..., θ n ), we need first determine whether S(X, θ) is empty or not. In order to do this, we efficienty construct a coection of exit sequences Λ = [Λ 1, Λ 2,..., Λ n ] by moving the first θ i in π i (X) to the end for a 1 i n. According to the main emma, we know that S(X, θ) is empty if and ony if π i (X) does not incude θ i for some i or (x 1, Λ) is not feasibe. If S(X, θ) is not empty, then (x 1, Λ) is feasibe. According to the main emma, we can do tai-fixed permutations on Λ 1, Λ 2,..., Λ n, and the yieded new sequences sti beong to S(X, θ). So we can get S(X, θ) = n (k i1 + k i2 +... + k in 1)! k i1!...(k iθi 1)!...k in! 1 Here, different from [11], we don t have the term S(X), because we use different expressions for ρ 0 i and λ0 i.