arxiv: v5 [cs.it] 28 Feb 2015

Size: px
Start display at page:

Download "arxiv: v5 [cs.it] 28 Feb 2015"

Transcription

1 Sampling with arbitrary precision Luc Devroye, Claude Gravel October, 28 arxiv: v5 [cs.it] 28 Feb 25 Abstract We study the problem of the generation of a continuous random variable when a source of independent fair coins is available. We first motivate the choice of a natural criterion for measuring accuracy, the Wasserstein L metric, and then show a universal lower bound for the expected number of required fair coins as a function of the accuracy. In the case of an absolutely continuous random variable with finite differential entropy, several algorithms are presented that match the lower bound up to a constant, which can be eliminated by generating random variables in batches. Keywords: random number generation, random bit model, differential entropy, partition entropy, inversion, probability integral transform, tree-based algorithms, random sampling Introduction Knuth and Yao [] showed that the expected number of independent Bernoulli /2 random bits needed to generate an integer-valued random variable X whose distribution is given by p i P{X i}, where i p i, is at least equal to the binary entropy of X: E EX E{p i } in i def i p i log 2 p i. They also exhibited an algorithm dubbed the DDG tree algorithm for which the expected number of random Bernoulli /2 bits is not more than E + 2. By grouping, one can thus develop algorithms for generating batches of m independent copies of X such that the expected number of Bernoulli /2 random bits per random variable does not exceed E + 2/m. While these results settle the discrete random variate case quite satisfactorily, the generation of continuous or mixed random variables has not been treated satisfactorily in the literature. The objective of this note is to study the number of Bernoulli /2 random bits to generate a random variable X R d with a given precision >, provided that we can define precision in a satisfactory manner. Note that any algorithm that takes as input the accuracy parameter >, returns a random variable Y f T B,..., B T, where B,B 2,...,B T are independent identically distributed or i.i.d. Bernoulli /2 bits, T is the number of bits needed, and f,f 2,... are given sequences of functions. For a vector v R d, McGill University Université de Montréal

2 let v p denote the l p -norm of v for p : v p d i v i p p. For p, the -norm is v sup i d v i. With d, all p-norms are the same for p [, ]. An algorithm with accuracy is such that for some coupling of X the target random variable and Y, X Y p, where p is usually 2 or. This natural notion of accuracy corresponds to the Wasserstein L metric between two probability measures. For a random variable X, we denote by LX the distribution of X. Let M denote the space of all distributions of pairs of random variables X, Y R d R d with fixed marginal distributions F and G, respectively. Then the Wasserstein L distance between X and Y, or between F and G, is W p F, G inf { ess sup X Y p : X, Y M}, where ess sup denotes the essential supremum. This is a distance metric: If distx, Y def W p F, G. distx, Y then there exists a random variable Y output coupled with X target such that ess sup X Y p, i.e., with probability one, X Y p <. This definition of distance should satisfy simulation professionals in the sense that if their calculations require the evaluation of ΨX,..., X d, where X,...,X d are given independent random variables and Ψ is a real-valued functions, then, with probability one, ΨY,..., Y d ΨX,..., X d sup Ψy,..., y d ΨX,..., X d y X p< which is usually a quantity that can easily be controlled. We believe that software packages should have the capability of accepting as input parameter in random variate generation. It is interesting that ET, the expected value of T, can be related to the entropy almost in the way Knuth and Yao did for discrete random variables in []. Our note provides the foundational background for such a study in terms of universal lower bounds and various useful upper bounds for particular algorithms. We include examples for the main distributions. Several authors have adressed the problem of arbitrary precision in sampling algorithms. These include Flajolet and Saheb [2], who explain how to generate the first k bits of an exponential random variable for an integer k. Karney [3] describes an algorithm for the standard normal distribution. Lumbroso s thesis [4] also discusses arbitrary precision sampling. The quantity that appears in our lower bound is the partition entropy. More precisely, let A be a partition of R d and let > be a fixed parameter for the precision. The partition entropy of X with respect to A is the quantity E A X P{X A} log 2. P{X A} A A 2

3 While our results apply to all distributions, we will mainly focus on absolutely continuous distributions, i.e., random variables X with density f. We recall the definition of differential entropy: Ef def fx log 2 dx. x R d fx The differential entropy can be ill-defined,, finite or +. For more information on differential entropy and entropy in general, one can read Cover and Thomas [5]. When X has compact support, then the case + cannot occur. When f is bounded, then the case is excluded. When E X,..., X d <, it can be shown that Ef is well-defined and either finite or ; see Rényi [6], Csiszàr [7] for a proof. Our main result is the following: Theorem. Let X R d be a target random vector with density f, and assume that E X,..., X d <. Consider any algorithm that for given > outputs Y Y using T random fair coins, such that W p X, Y. Then ET Ef + d log 2 2Γ p log + d 2 Γ d p + For p and p, the third term in the lower bound is log 2 2 d /d! and d, respectively. For d, it is. The second part of this note describes several algorithms that come within a constant term of this lower bound, and therefore, are basically optimal if grouping is used for generation. Before tackling all that, we introduce a brief section in which we recall the main exact sampling algorithms for discrete random variables, and their properties, as these will be essential for the understanding of the main algorithms. 2 Bounds for the discrete case In this section, we give simple proofs of two important results for generating discrete random variables the optimal algorithm of Knuth and Yao [] and the more practical but slightly suboptimal algorithm of Han and Hoshi [8]. We recall that we want to sample X with probability vector p, p 2,.... Every p i is decomposed into its binary representation p i j b i,j 2 j, where b i,j {, } for all i, j. Consider the new random variable Z with probability vector b, 2, b,2 2 2,..., b 2, 2, b 2,2 2 2,..., def p,, p,2,..., p 2,, p 2,2,...,, 2 with p i,j b i,j /2 j. Any algorithm for generating a discrete random variable using a source of i.i.d. fair coins B, B 2,... and that is based on a stopping time T when it returns an output can be viewed as a binary tree, where B, B 2,... uniquely determines an infinite path in the tree by the rule is left and is right. We refer to this general class of algorithm as tree-based algorithms they include all possible practical algorithms. Leaves in this tree correspond to outputs. The algorithm. 3

4 of Knuth and Yao can be implemented by a binary tree, a DDG tree, in which each leaf at level j corresponds to a bit b ij in 2. One randomly walks down this tree starting at the root and reaches the leaf for b ij with probability /2 j p ij. At that point, the value i is returned, and indeed, P{X i} p ij p i, as required. A tree-based algorithm is optimal if it minimizes ET for a given probability vector. Theorem Knuth and Yao []. The expected number of bits of an optimum tree-based algorithm for sampling p, p 2,... is bounded from below by E {p i } i and from above by E {pi } i + 2. Proof of Theorem. Given a probability vector p, p 2,..., p n with n possibly infinite, let the binary expansion of p i for i {,..., n} be b i,j p i 2 j and b i,j {, }. j If T denotes the number of bits required by an optimal algorithm for sampling p, p 2,..., p n, then for t Therefore, j number of leaves at level t P{T t} n i ET b i,t 2 t. 2 t tp{t t} t t t n i n i t b i,t 2 t tb i,t 2 t. 3 We now show that the quantity within parentheses of line 3 is lower bounded by p i log 2 /p i and upper bounded by p i log 2 /p i + 2p i and then the result follows. For convenience, let x [, ] and its binary expansion be x j x 2 j. To complete the proof, it remains to prove that x log 2 x j j jx j 2 j x log 2 + 2x. 4 x If m is the first non-zero coefficient of the binary expansion of x, then there are two cases: either x 2 m or 2 2 m < x < 2 m. The inequalities are strict for case 2 since x 2 m. For the first case, x m, and line 4 is obviously true. For the second case, 2 m < x < 2 2 m or, < m + log 2x <. 4

5 Then for the upper bound, j jx j 2 j < m 2 m + jm+ m + 2 m < xm + j 2 j < x2 log 2 x, and for the lower bound, j jx j 2 j jm m mx jx j 2 j jm x j 2 j > x log 2 x. We now recall the Han-Hoshi algorithm published in [8] that implements the inversion method. Given p,..., p n with n countably finite or infinite, the algorithm partitions the interval [, ] into a countable collection of disjoint subintervals [Q i, Q i where Q, and i Q i p k, i {,..., n}. k The idea behind the algorithm is to iteratively refine a random interval I [, and to stop when I [Q i, Q i for a certain i {,..., n}. The inversion principle says that if U uniformly distributed on [, ], then the unique i {,..., n} such that Q i U < Q i+ is distributed according to p,..., p n. For a binary random source of unbiased i.i.d. bits, their algorithm is as follow: Algorithm The Han-Hoshi algorithm using a binary source : T 2: α T 3: β T 4: repeat 5: T T + 6: B Random Bit 7: α T α T + β T α T B/2 8: β T α T + β T α T B + /2 9: I [α T, β T : until I [Q i, Q i : Return i. 5

6 The following two figures and 2 are examples that show the underlying DDG tree during the execution of the Han and Hoshi algorithm Figure : Illustration of the algorithm of Han and Hoshi on the vector p, p 2, p 3, p 4, p 5, p 6, p 7 6, 5 32, 5 32, 9 32, 3 6, 32, 8. The cumulative values are q , q , q , q , q , q , and q p p + p 2 p + p 2 + p 3 Figure 2: Illustration of the Han and Hoshi algorithm on the vector p, p 2, p 3, p 4 such that p., p + p 2., and p + p 2 + p 3.. Let T be the number of random coins needed and also the number of iteration of the repeat loop. For T, [ α T, β T [ αt +, β T +. To every node internal or external corresponds an interval [ α T, β T. The root corresponding to the interval [,. For each internal node corresponds an interval [ α T, β T that is not contained in one of the interval [ Qi, Q i+, and, if the source produces B, then the left child corresponds to the interval [ α T, α T + β T /2 [ α T +, β T + and, if B, then the right child corresponds to [ α T + β T /2, β T [ αt +, β T +. Each leaf external node corresponds to an interval [ α T, β T entirely contained in [ Qi, Q i+ upon which the integer i is returned with probability Q i. The following theorem was proved by Han and Hoshi [8]: 6

7 Theorem 2. For the Han-Hoshi algorithm, p i log 2 ET p i i p i log p i Proof of Theorem 2. Our new proof partitions the leaves L i for symbol i in the DDG tree arbitrarily into two sets, A i and B i, such that A i and B i each possess at least one leaf per level. Let α i u A i pu, β i u B i pu, where pu is the probability attached to leaf u, i.e. /2 depthu. We have p i α i + β i. By nesting and elementary calculations, p i log 2 α i log p 2 + β i log i α 2 i β i i i i i p i log p i i Let α i j be the j-th bit in the binary expansion of α i, and let β i j be the j-th bit for β i. Then ET As in the proof of Theorem, we have so that, using 5, j jα i j 2 j + α i log 2 I α i i β i log 2 II β i i p i log 2 ET p i i j jβ i j 2 j def I + II. α i log 2 + 2α i, α i i i β i log 2 + 2β i, β i i p i log p i p i log p i i i p i 3 Lower bound for generating continuous random vectors In this section, we give a lower bound for the complexity of sampling any continuous distribution to an arbitrary precision. Let A be a countable partition of R d, and let > be a fixed precision parameter. Consider the infinite graph G with as vertices the sets A A, and as edges all pairs A, B A A such that inf x A,y A x y p <. Therefore, if A, B is not an edge of G, then x y p for all x A, y B. Let be the maximal degree of any vertex of G. We now state a lemma that we shall use in conjunction with the Knuth-Yao result, mentioned and reproved in the previous section, in order to prove our main theorem mentioned in the introduction. 7

8 Lemma. Let X be a target random vector of R d. Let Y be an output with the property that, with probability one, X Y p <. Let T be the number of bits used to generate Y by an algorithm. Then { ET sup EA X log 2 + }, A where A and are as above. We can maximize the bound from Lemma, of course, by selecting the most advantageous partition A and combination. The bound from Lemma coincides with the bound in Shannon [9] when the distribution X is discrete with a finite number of atoms since, in that case, by choosing sufficiently small. Proof of Lemma. Let X and Y be two dependent random variables of R d, and denote by p AB P{X A, Y B}. Note that p AB if A, B is not an edge of G. Thus E A X E A Y P{Y B} P{X A, Y B} log 2 P{X A} A,B A A by Jensen s inequality P{X A, Y B} log 2 P{Y B} P{X A} A,B A A P{X A, Y B} log 2 P{Y B} P{X A} B A A A + log 2 B A P{Y B} log 2 +. If T is the random number of bits needed to generate a discrete random variable Y that outputs a vertex A of G with probability P{Y A}, then and therefore ET E A Y by Knuth-Yao, E A X log 2 +, { ET sup EA X log 2 + }. A 6 It is interesting to recall a general result from Csiszàr [7] about the hypercube partition entropy of an absolutely continuous random vector X that will become useful later. Of particular interest to us is the cubic partition A h partitioned by h >. The cells of this partition are of the form d [ ij h, i j + h, i,..., i d Z d. j 8

9 We recall that if X def X,..., X d has finite entropy a condition we refer to as Rényi s condition then if X has a density f, Ef f log 2 f is well-defined, i.e., it is either finite or. We have Lemma 2. Under Rényi s condition, for general partition A, and random variable X R d with density f, E A X Ef + P{X A} log 2, λa A A where λ denotes the Lebesgue measure. In particular, E A h X Ef + d log 2 h Proof of Lemma 2. Fix A A. If Z is uniform on A and Y fz, then P{X A} f λaey. Thus P{X A} λa A λa log 2 P{X A}. EY log 2, EY and, by Jensen s inequality and the concavity of x log 2 /x, EY log 2 E Y log EY 2 Y f log λa 2. A f The inequality follows by summing over A A. Lemma 3 Csiszàr [7]. Let X R d have density f, and let Rényi s condition be satisfied. If Ef >, then as h, E A h X Ef + d log 2 + o. h If Ef, then as h, E A h X d log 2. h Remark. The fifth theorem of Csiszàr [7] stipulates that if E X,..., X d < and f is not absolutely continuous, then, as h, E A h X d log 2. h For more information about the asymptotic theory for the entropy of partitioned distributions as the partitions become finer, one can consult Rényi [6], Csiszàr [7], Csiszàr [], and Linder and Zeger []. 9

10 Theorem 3. Let X R d have density f. Let Y be an output with the property that with probability one, X Y p <. Then, under Rényi s condition, ET Ef + d log 2 log 2 V d,p, where V d,p 2d Γ p + Γ d p + is the volume of the unit ball in R d, and T is the number of random bits needed to generate to Y. Proof of Theorem 3. Let A h be a cubic partition. Then ET sup h E A h X log 2 h + where h is the maximal degree in the graph on A h A h defined by connecting A A h with B A h if inf x A,y B x y p <. We set h /n and use ET lim sup E A n /n X log 2 /n +. Observe that if B r denotes the l p -ball of radius r centered at, then by elementary geometric considerations, λb h d h λ B+2hd /p h d so that as n, Also, /n n d λb V d,pn d. E A /n X Ef + d log 2 n so that E A /n X log 2 /n + n d + log 2 + V d,p n d + o Ef + d log2 log 2 V d,p. Ef + d log 2 n 4 Upper bound for partition-based algorithms Consider a random variable X R d. We call a partition A a -partition if for every set A A, there exists x A A called the center such that sup x A y p. y A

11 Then any algorithm that selects A A with probability pa def P{X A} A f can be used to generate a random variable Y that approximate X to within. After generating A, set Y x A. Then, necessarily, there is a coupling X, Y with X Y p. If the selection of A is done with the help of the method of Knuth and Yao, then, if T still denotes the number of random bits required, ET E A X + 2. For p, we can take A A 2, the cubic partition with sides 2. For d, a simple partition into intervals of length 2 can be used for all values of p. If X has a density f and p or d, then the procedure suggested above has, as, ET E A 2 X + 2 Ef + d log o, 2 Ef + d log d + o, where in the last step we assume Rényi s condition and Ef >. Compare with the lower bound ET Ef + d log 2 d, and note that the difference is 2 + o. For later reference, we recall these values of Ef for the main distributions: Uniform[, ]: Ef, Exponential: Normal, : Ef log 2 e, Ef log 2 2πe. Recall that for X R, a >, a scale factor a shows up as log 2 a in the upper and lower bounds because EaX EX + log 2 a. For general p [,, we can take A A, 2/d p the cubic partition with sides 2/d p. Under Rényi s condition and Ef >, we have ET E A 2/d p d log The difference with the lower bound is D 2 + d p log 2d + d log 2 Γ p + + Ef + 2 d + d p log 2d + o. d log 2 Γ p + + o.

12 Using Γ + u u/e u 2πu, u >, we obtain D 2 + d log 2 Γ p + ep p 2 log 2 2π d + o, p which unfortunately increases linearly with d. To avoid this growing differential which we did not have for p it seems necessary to consider partitions that better approximate l p -balls. 5 Upper bound for inversion-based algorithms The inversion method for generating a random variable X with distribution function F uses the property that X F U has distribution function F, where F denotes the inverse, and U is uniform [, ]. One can use this method as a basis for generating an approximation using only a few random bits. In particular, if U j U.U U 2 2 j, and U, U 2,... are independent Bernoulli/2 random variables, then setting U t.u U t, j then Note that U, U.. Graphically, we have U + t.u U t + 2 t.u U t, U t U U + t. Fx Ut + 2 t U Ut F Ut Xt F U X + t F Ut + 2 t x The number of random coins is Figure 3: Inversion method illustrated T min{t : F U + t F U t 2}. 2

13 If we define Y F U + t then X and Y are coupled in such a way that + F U t 2 X Y. The T defined above is also the number of bits needed to generate Y. Observe that the inversion method requires F in a black box, also called an oracle. On the other hand, it avoids the cumbersome calculation of the cell probabilities and the set-up of the Knuth-Yao DDG tree, and thus shines by its simplicity. In spirit, the inversion method mimics the method of Han and Hoshi, and indeed, this observation leads to a simple bound. Let A 2 be a cubic partition of R into intervals of equal length 2. Denote the probabilities of these intervals by pa P{X A}, A A 2. Assume that we select an interval from A 2 following this law by the method of Han and Hoshi using random bits U, U 2, U 3,... also used in the inversion method. It is easy to see that the number of bits needed before halting in the inversion method is smaller. Therefore, for the inversion method, we have ET E A 2 X + 3. From this and Lemma 3, we conclude Theorem 4. If X has a density f satisfying Rényi s condition and if f log 2 /f >, then as, ET log 2 + f log o. f Remark 2. Comparing Theorem 4 with the universal lower bound ET log 2 + Ef, we see that the difference is 3 + o. Remark 3. The partition-based method has un upper for ET that is one less. Moreover, the simplicity of the inversion method cannot be underestimated. In addition, one can tighten the analysis under additional conditions on f such unimodality, monotonicity, or for specific forms., Theorem 5. Assume that X has a bounded nonincreasing density f on [,. inversion method, as, ET log 2 + f log 2 + o. f Then for the Proof of Theorem 5. Define X t F U t, X + t F U t + as in Figure 3. Then ET P { X t + X t > 2 } t 3

14 t t { P fx t + < } 2 t 2 because X + 2 t t f fx t + X+ t X t X t { P fx < } { 2 t + P fx t + 2 < } 2 t 2 < fx I + II. t Now, { } I E + log 2 2fX log 2 + f log 2 f even if the latter integral is infinite. The Theorem follows if we can show that II o. To this end, note that { II P fx t + < } 2 t 2 fx t. t For a fixed value of t, we see that fx t + < 2 t 2 fx t only if X falls in the interval that captures the value 2 t 2, if such an interval exists. But the probability of each interval is precisely /2 t. However, if 2 t 2 > f, then no such interval exists. Thus, II {t 2 t } log 2 2f t 4f o. For the uniform [, ] density, we have Ef, and so the bound of Theorem 5 is ET log 2 + o. However the o can be omitted in this case as the following simple calculation shows: ET P{T > t} t t t t 2 t i 2 t i {F i+ 2 t F i 2 t >2} 2 t { i+ 2 t i 2 t >2} 2 t {t log2 2 } 4

15 log 2 + log 2 2 log Theorem 5 improves over the bound for the partition method for d by + o. Under other regularity conditions, one can hope to obtain similar bounds that beat the partition bound. For the exponential density, the inversion method yields ET log 2 + Ef + o log 2 + log 2 e + o, where log 2 e Flajolet and Saheb [2] proposed a method for the exponential law that has E{T } log ϕ, where ϕ.2 as. For the normal law, Karney [3] proposes a method that addresses the variable approximation issue but does not offer explicit bounds. Inversion would yield E{T } log 2 + log 2 2πe + o, but the drawback is that this requires the presence of an oracle for F, the inverse gaussian distribution function. Even the partition method requires a nontrivial oracle, namely F. To sidestep this, one can use a slightly more expensive method based on the Box-Müller [2], which states that the pair of random variables 2EV, 2EV 2 with E exponential and V, V 2 uniform on the unit circle, provides a standard gaussian in R 2 of zero mean and unit covariance matrix. The random variable 2E is Maxwell, i.e., it has density re r2 /2, r >, and its differential entropy is Ef Maxwell f Maxwell r log 2 dr log 2 log , f Maxwell r f Maxwell r log r γ log2 + + r2 2 where γ is the Euler-Mascheroni constant. We sketch the procedure, which also serves as an example for more complicated random variate generation problems. Assume that the two normals are required with -accuracy each this corresponds to the choice of d 2 and p. Then we first generate a Maxwell random variable M by inversion, noting that F r e r2 2, dr 5

16 F u 2 log u. The Maxwell random variable M is needed with 2-accuracy. The Maxwell law is unimodal with mode at r. Its left piece has probability e. So we first pick a piece randomly using on average no more than two bits. The we apply inversion on the appropriate piece. By Theorem 5, we use T random bits where 2 ET log 2 + E f Maxwell o. The generated approximation is called M. Next we generate a uniform random variable U [, 2π with accuracy /2 M + /2. The generated value U [, 2π has U U /2 M +/2. Since U has differential entropy log 22π, we see that the expected number of bits, T 2, needed is bounded by M + /2 ET 2 E log 2 /2 M + /2 E log 2 log log 2 2π + o + log 2 2π + o /2 + E log 2 M + log 2 2π + o by the dominated convergence theorem. Then we return M sinu, M cosu, and claim that jointly, To see this, note that and similarly for the cosine. Next, M sinu M sinu M cosu M cosu. sinu sinu U U /2 M + /2 M sinu M sinu M M sinu + M sinu sinu M M + M U U 2 + M /2 M + /

17 Putting everything together, we see that the total expected number of bits is not more than 2 ET + ET 2 2 log 2 + E log 2 M + Ef Maxwell log 2 2π + o 2 log log 2 2πe + o 2 log o. The lower bound for generating two independent gaussians is 4 + o less, i.e., 2 log log 2 2πe 2 2 log Batch generation 6. Randomness extraction Turning a sequence of i.i.d. random variables X, X 2,... into a sequence of i.i.d. Bernoulli /2 bits has been the subject of many papers. The setting of interest to us is the following. Let F, F 2,... be a possibly infinite number of cumulative distributions supported on the positive integers. Let p, p 2,... be a fixed probability vector. Let X, X, X 2,... be i.i.d. random integers drawn from p, p 2,.... Given X, X 2,..., X n for n N, draw Y, Y 2,..., Y n independently from the distributions F X, F X2,..., F Xn. As a special case, we have the classical setting when p and then Y, Y 2,..., Y n are i.i.d. according to F. Let F, F 2,... have binary entropies given by E, E 2,..., all assumed to be finite. In other words the entropy of Y X i is denoted by E i. Assume also that E def p i E i <. i Theorem 6. There exits an algorithm described below that, upon input X, Y, X 2, Y 2,..., X n, Y n outputs a sequence of R n i.i.d. Bernoulli /2 bits where R n n p E as n. Furthermore, these bits are independent of X,..., X n. Theorem 6 describes how many perfect random bits we can extract from Y, Y 2,..., Y n, i.e., R n should be near the information content, the entropy of Y, Y 2,... Y n. Not surprisingly then, the way to achieve this can be inspired by the optimal or near-optimal methods of compression, and, in particular, arithmetic coding. Note that one can assume Y i F X i V i for i {,..., n} where V, V 2,..., V n are i.i.d. uniform random variables on [, ]. For all i N, let the binary expansions of p i be p i j b ij 2 j where b ij {, }. 7

18 def For convenience, p ij b ij for all i, j N N. Also, for all i N, let 2 j { if j, F i j j p i k p ik if j. Following the methodology of arithmetic coding, associate a uniform[, ] random variable U with binary expansion.u U 2 with the infinite data sequence X, Y,..., X n, Y n. The bits U i, i, are i.i.d. Bernoulli/2 and independent of X, X 2, X 3,... To be more precise, consider this algorithm. Algorithm 2 Randomness extraction Input: A sequence of pairs X, Y,..., X n, Y n with X l and Y l as described previously for l {,..., n}. : U 2: U + 3: for l to n do 4: U l U l + U + l U l FXl Y l 5: U + l U l + U + l U l FXl Y l 6: end for 7: R n max { t : 2 t U n 2 t U + n } {R n is the number of bits of the longest prefix common to both U n and U + n.} 8: return 2 Rn U n To verify the correctness of Algorithm 2, the intervals [U l, U + l ] are nested. More precisely, for all l, [U l, U + l ] [U l+, U + l+ ]. Define U lim sup n U n For every iteration, we have lim inf n U + n, and note U [U n, U + n ] U U j U + j U j Since R n max { t : 2 t U n 2 t U + n }, The bits U, U 2,..., U Rn n [U l, U + l ]. l L Uniform[U j, U + j ]. 2 Rn 2Rn U Un U U n + 2 Rn U +. 2 Rn are clearly i.i.d. Proof of Theorem 6. Let t R and consider the two cases {R n t} and {R n < t}. We show that t is concentrated around ne. Before considering the two cases, we compute the useful quantity px E log 2 log p 2 p i + log 2 p ij X Y p i j ij p i log 2 p i + p ij log 2 p ij i i j 8

19 EX + p i EX + def E. i j p i E i + EX i p ij pi log p 2 + log i p 2 ij pi Note that { Rn t } {U + n U n < 2 t } { n l l p Xl Y l p Xl < 2 t } { n pxl log 2 p Xl Y l } > t. 7 The pairs X, Y,..., X n, Y n are i.i.d. and therefore, by the previous calculation, px E log 2 E. p X Y By the law of large numbers, for all >, P{R n > ne + } as n. We have { Rn < t } { } 2 t U n + > 2 t Un. By the law of large numbers, for all >, if t ne, P {U n + Un } { n pxl 2 t P log 2 p Xl Y l For an arbitrary fixed integer k >, { P{R n < t + k} P U + n U n 2 t+k o k, li } t. } { + P U n + Un < } 2 t+k, 2t U n + > 2 t Un { which is as small as desired by the choice of k. The 2/2 k term is due to the fact that the event U + n Un <, 2 t U + 2 t+k n > 2 t Un } occurs only if U m 2 and m N. t 2 t+k 6.2 Batch generation algorithm based on a general DDG tree algorithm Assume given a random variable X N with fixed probability vector p, p 2,... of finite entropy denoted by EX. Assume that we employ a given DDG tree based algorithm for the generation of X. In this tree, let L be the set of leaves and let labelu be the label of leaf u L. Define Let du be the depth of u L. Then we have L i {u L : labelu i} for i N. P{X i} def p i u L i 2 du. 9

20 If the algorithm returns a variable X, then we know that we must have exited via a leaf in L X. Given X i, each exit-leaf has a given probability: P { Exit via leaf u L i } /2 du p i. Let us call the random exit-leaf Y. The DDG tree algorithm thus returns a pair X, Y. We have p i EY X i i p i i u L i p i 2 du log 2 pi 2 du 2 du log 2 2 du + p i log 2 p i i u L i i 2 du log 2 2 du EX u L EY EX. For example, for the Knuth-Yao DDG algorithm, we have EY EX 2, while for the Han- Hoshi algorithm, we have EY EX 3. Our method of batch generation will be valid for any DDG tree with finite EY. The algorithm for batch generating i.i.d. random variables X, X 2,..., X n uses an atomic operation FetchBit that first gets a bit from a queue Q if the this queue is not empty, and otherwise it gets a bit from a Bernoulli /2 generator. It is understood that that FetchBit drives the DDG tree algorithm. Algorithm 3 Batch generation : Q {Initially, the queue is empty.} 2: R {There is no recycled bit initially.} 3: for i to n do 4: Generate X i, Y i by a DDG tree algorithm. {The DDG algorithm uses the operation FetchBit to get bits either from the source or from the queue Q.} 5: return X i 6: Feed X i, Y i to the retrieval algorithm randomness extraction procedure, and recover R i R i bits which are added to Q. 7: end for Theorem 7. The batch generation algorithm uses N n random bits, where whenever EY <. N n n p EX as n, Remark 4. By the Knuth-Yao lower bound, EN n nex, and therefore, the procedure is asymptotically optimal to within o p n bits. The symbol o p n means it is on in probability as n. 2

21 Proof of Theorem 7. We choose a large integer constant k and look at N nk. Let Q t be the size of the queue at time t, and set Q. For j {,..., nk}, let T j be the number of bits needed for generating X j without extraction. The T j are i.i.d. random variables. Then we have the following simple identity: nk N nk T j R nk + Q nk. By the law of large numbers, j T + T T nk ET + T T nk p as n. We note that ET + T T nk nkey because random variables T j are i.i.d. By Theorem 6, we also have that R nk /nk p EY EX as n. Therefore, N nk nk EY + o p EY EX + o p + Q nk nk EX + o p + Q nk nk. The result follows if Q nk /nk p as n. For this, we need only consider an upper bound, since Q nk, and then Q nj Q nj + R nj R nj min j k {T nj T nj, Q nj }. Since R n /n p EY EX, we have R max nj R nj j k n EY EX p 8 and T max nj T nj j k n EY p. 9 Fix >, and let A be the event that both lefthand sides in 8 and 9 are less than, so that P{A c } o. The critical observation is that on A, Q nj + EY EX + n EY n Q nj if Q nj EY n, Q nj + EY EX + n Q nj else. and therefore, max { Q nj, EY EX + }, if 2 EX, max Q nj EY EX + n j k and Q nk nk EY EX +. k 2

22 If we choose k large enough such that EY EX + /k, then { } Qnk P nk > P{A c } o. If batch generation is applied to the partition method for continuous distributions, and the Knuth-Yao or Han-Hoshi method are used for the discrete part of that method, then the expected number of random bits needed per random variable, under Rényi s condition and Ef >, is bounded from above by d log 2 + Ef d + o, thus matching the lower bound for the case p, and all dimensions. 7 Conclusion and outlook Using the notion of maximal coupling between a generated random variable and a theoretical target random variable, we were able to lay the groundwork for a theory of generating with universal lower bound in terms of the number of random Bernoulli/2 bits needed. In order to grace the world s software libraries with variable accuracy generators, much more work is needed, both algorithmic and theoretic. We have shown that the well-known inversion method is nearly optimal in an information-theoretic sense, and will submit further work and other paradigms in the near future. Appendix: Generation of an exponential by convolution The method given in this appendix is based on taking the sum of independent random variables. Using convolution does not require an oracle for F or F which were required by the algorithms given in this paper. Its range of applicability seems however restricted to the uniform and the exponential as suggested by Kakutani s result [3] that can be stated as follows: Theorem 8 Kakutani 948. For all i N, let p i [, ] and let X i be independent Bernoulli random variables such that P{X i } p i. If X i X i2 i, then X is singular X is absolutely continuous X is discrete i i p i 2 2 diverges, p i 2 2 converges, 2 + pi >. 2 First of all, as shown in [4], if X is an exponential mean random variable, then X is distributed as a geometric random variable with parameter /e, and {X}, the fractional part of X, is distributed as a truncated exponential random variable on the interval [,, and X and {X} are independent. We concentrate on the fractional part therefore. The following theorem tells us that the fractional part is the convolution of independent Bernoulli random variables. i 22

23 Theorem 9. Let X,..., X j,... be a sequence of independent Bernoulli distributed random variables with Let X j X j. If P{X j 2 j } p j [, ], and P{X j } p j for all j N. p j p j e /2j then X is a truncated exponential random variable, i.e., the p.d.f. of X is Proof of Theorem 9. Since the Fourier transform of X j is f X x e x for x [, ]. e p j e /2j e /2j +, Ee ıx jt p j e ıt/2j + p j Since X is the sum of the independent X j s, we have Ee ıxt which is the Fourier transform of f X x. e +ıt/2j +. e /2j + Ee ıxjt j j e +ıt/2j + e /2j + e +ıt e + ıt, We can thus generate {X} with precision if we set k log 2, and let Y k X j j k Bernoullip j. j Since a raw Bernoulli random variable requires 2 bits on average, this simple method, which has accuracy guarantee Y {X}, uses an average not more than 2k 2 log 2 bits. On the other hand, the lower bound for {X} is ET log 2 + E{X}, where E{X} e e log 2e. 23

24 The factor 2 in 2 log 2 can be avoided when batch generation is used. It can also be eliminated at a tremendous storage cost if the vector 2 X,..., 2 k X k is generated using the algorithm of Knuth-Yao since we know the individual probabilities, i.e., P { 2 X,..., 2 k X k 2 x,..., 2 k x k } k j for all x,..., x k {, } k. One can show that E 2 X,..., 2 k X k p x j2 j j p j x j2 j, k EX j j k j log 2 e e Thus, for the Knuth-Yao method for this vector, ET k EX j + 2 j p j log 2 + pj p j log2 p j + k + log22 k. e log log 2 + o. e Again using Knuth-Yao, a geometric/e random variable can be generated exactly using not more than e e log 2 + log e 2 e + 2 random bits. An exponential random variable generated by this method has an overall expected complexity ET log o as. Acknowledgment Both authors thank Tamas Linder for his help. Claude Gravel wants to thank Gilles Brassard from Université de Montréal for his financial support. References [] D. E. Knuth and A. C.-C. Yao, The complexity of nonuniform random number generation, in Algorithms and Complexity: New Directions and Recent Results., J. F. Traub, Ed. New York: Carnegie-Mellon University, Computer Science Department, Academic Press, 976, pp , reprinted in Knuth s Selected Papers on Analysis of Algorithms CSLI, 2. 24

25 [2] P. Flajolet and N. Saheb, The complexity of generating an exponentially distributed variate, Journal of Algorithms, vol. 7, pp , 986. [3] C. F. F. Karney, Sampling exactly from the normal distribution, 23, arxiv. [Online]. Available: [4] J. Lumbroso, Probabilistic algorithms for data sreaming and random generation, Ph.D. dissertation, Université Pierre et Marie Curie - Paris 6, 22. [5] T. M. Cover and J. A. Thomas, Elements of Information Theory. New-York: Wiley, 99. [6] A. Rényi, On the dimension and entropy of probability distributions, Acta Mathematica Academiae Scientiarum Hungarica, vol., pp , 959. [7] I. Csiszár, Some remarks on the dimension and entropy of random variables, Acta Mathematica Academiae Scientiarum Hungarica, vol. 2, pp , 96. [8] T. S. Han and M. Hoshi, Interval algorithm for random number generation, IEEE Transactions on Information Theory, vol. 43, no. 2, pp , March 997. [9] C. E. Shannon, A mathematical theory of communication, Bell. Sys. Tech. Journal, vol. 27, pp , , 948. [] I. Csiszár, On the dimension and entropy of order α of the mixture of probability distributions, Acta Mathematica Academiae Scientiarum Hungarica, vol. 3, pp , 962. [] T. Linder and K. Zeger, Asymptotic entropy-constrained performance of tessellating and universal randomized lattice quantization, IEEE Transactions of Information Theory, vol. 4, no. 2, 994. [2] G. E. Box and M. E. Muller, A note on the generation of random normal deviates, Ann. Math. Stat, vol. 29, pp. 6 6, 958. [3] S. Kakutani, On equivalence of infinite product measures, Annals of Mathematics, pp , 948. [4] L. Devroye, Non-Uniform Random Variate Generation. Springer,

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract

More information

An Entropy Bound for Random Number Generation

An Entropy Bound for Random Number Generation 244 An Entropy Bound for Random Number Generation Sung-il Pae, Hongik University, Seoul, Korea Summary Many computer applications use random numbers as an important computational resource, and they often

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables Deli Li 1, Yongcheng Qi, and Andrew Rosalsky 3 1 Department of Mathematical Sciences, Lakehead University,

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

Multiple Choice Tries and Distributed Hash Tables

Multiple Choice Tries and Distributed Hash Tables Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.

More information

General Principles in Random Variates Generation

General Principles in Random Variates Generation General Principles in Random Variates Generation E. Moulines and G. Fort Telecom ParisTech June 2015 Bibliography : Luc Devroye, Non-Uniform Random Variate Generator, Springer-Verlag (1986) available on

More information

arxiv: v4 [cs.it] 17 Oct 2015

arxiv: v4 [cs.it] 17 Oct 2015 Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

< k 2n. 2 1 (n 2). + (1 p) s) N (n < 1

< k 2n. 2 1 (n 2). + (1 p) s) N (n < 1 List of Problems jacques@ucsd.edu Those question with a star next to them are considered slightly more challenging. Problems 9, 11, and 19 from the book The probabilistic method, by Alon and Spencer. Question

More information

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,

More information

Elementary Analysis Math 140D Fall 2007

Elementary Analysis Math 140D Fall 2007 Elementary Analysis Math 140D Fall 2007 Bernard Russo Contents 1 Friday September 28, 2007 1 1.1 Course information............................ 1 1.2 Outline of the course........................... 1

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

The small ball property in Banach spaces (quantitative results)

The small ball property in Banach spaces (quantitative results) The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence

More information

Optimum Binary-Constrained Homophonic Coding

Optimum Binary-Constrained Homophonic Coding Optimum Binary-Constrained Homophonic Coding Valdemar C. da Rocha Jr. and Cecilio Pimentel Communications Research Group - CODEC Department of Electronics and Systems, P.O. Box 7800 Federal University

More information

Week 2: Sequences and Series

Week 2: Sequences and Series QF0: Quantitative Finance August 29, 207 Week 2: Sequences and Series Facilitator: Christopher Ting AY 207/208 Mathematicians have tried in vain to this day to discover some order in the sequence of prime

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,

More information

Combinatorics in Banach space theory Lecture 12

Combinatorics in Banach space theory Lecture 12 Combinatorics in Banach space theory Lecture The next lemma considerably strengthens the assertion of Lemma.6(b). Lemma.9. For every Banach space X and any n N, either all the numbers n b n (X), c n (X)

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Math212a1413 The Lebesgue integral.

Math212a1413 The Lebesgue integral. Math212a1413 The Lebesgue integral. October 28, 2014 Simple functions. In what follows, (X, F, m) is a space with a σ-field of sets, and m a measure on F. The purpose of today s lecture is to develop the

More information

Introduction to Information Entropy Adapted from Papoulis (1991)

Introduction to Information Entropy Adapted from Papoulis (1991) Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION

More information

Analysis Qualifying Exam

Analysis Qualifying Exam Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,

More information

Bounded Expected Delay in Arithmetic Coding

Bounded Expected Delay in Arithmetic Coding Bounded Expected Delay in Arithmetic Coding Ofer Shayevitz, Ram Zamir, and Meir Feder Tel Aviv University, Dept. of EE-Systems Tel Aviv 69978, Israel Email: {ofersha, zamir, meir }@eng.tau.ac.il arxiv:cs/0604106v1

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Notes on Gaussian processes and majorizing measures

Notes on Gaussian processes and majorizing measures Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,

More information

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION JAINUL VAGHASIA Contents. Introduction. Notations 3. Background in Probability Theory 3.. Expectation and Variance 3.. Convergence

More information

Lebesgue Measure on R n

Lebesgue Measure on R n CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

Chapter 2 Metric Spaces

Chapter 2 Metric Spaces Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Walker Ray Econ 204 Problem Set 3 Suggested Solutions August 6, 2015

Walker Ray Econ 204 Problem Set 3 Suggested Solutions August 6, 2015 Problem 1. Take any mapping f from a metric space X into a metric space Y. Prove that f is continuous if and only if f(a) f(a). (Hint: use the closed set characterization of continuity). I make use of

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

PCA sets and convexity

PCA sets and convexity F U N D A M E N T A MATHEMATICAE 163 (2000) PCA sets and convexity by Robert K a u f m a n (Urbana, IL) Abstract. Three sets occurring in functional analysis are shown to be of class PCA (also called Σ

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Hard-Core Model on Random Graphs

Hard-Core Model on Random Graphs Hard-Core Model on Random Graphs Antar Bandyopadhyay Theoretical Statistics and Mathematics Unit Seminar Theoretical Statistics and Mathematics Unit Indian Statistical Institute, New Delhi Centre New Delhi,

More information

MATH5011 Real Analysis I. Exercise 1 Suggested Solution

MATH5011 Real Analysis I. Exercise 1 Suggested Solution MATH5011 Real Analysis I Exercise 1 Suggested Solution Notations in the notes are used. (1) Show that every open set in R can be written as a countable union of mutually disjoint open intervals. Hint:

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1 Math 8B Solutions Charles Martin March 6, Homework Problems. Let (X i, d i ), i n, be finitely many metric spaces. Construct a metric on the product space X = X X n. Proof. Denote points in X as x = (x,

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( ) Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical

More information

The expansion of random regular graphs

The expansion of random regular graphs The expansion of random regular graphs David Ellis Introduction Our aim is now to show that for any d 3, almost all d-regular graphs on {1, 2,..., n} have edge-expansion ratio at least c d d (if nd is

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Math 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015

Math 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015 Math 30-: Midterm Practice Solutions Northwestern University, Winter 015 1. Give an example of each of the following. No justification is needed. (a) A metric on R with respect to which R is bounded. (b)

More information

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources Wei Kang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

P-adic Functions - Part 1

P-adic Functions - Part 1 P-adic Functions - Part 1 Nicolae Ciocan 22.11.2011 1 Locally constant functions Motivation: Another big difference between p-adic analysis and real analysis is the existence of nontrivial locally constant

More information

A LITTLE REAL ANALYSIS AND TOPOLOGY

A LITTLE REAL ANALYSIS AND TOPOLOGY A LITTLE REAL ANALYSIS AND TOPOLOGY 1. NOTATION Before we begin some notational definitions are useful. (1) Z = {, 3, 2, 1, 0, 1, 2, 3, }is the set of integers. (2) Q = { a b : aεz, bεz {0}} is the set

More information

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis Supplementary Notes for W. Rudin: Principles of Mathematical Analysis SIGURDUR HELGASON In 8.00B it is customary to cover Chapters 7 in Rudin s book. Experience shows that this requires careful planning

More information

MAGIC010 Ergodic Theory Lecture Entropy

MAGIC010 Ergodic Theory Lecture Entropy 7. Entropy 7. Introduction A natural question in mathematics is the so-called isomorphism problem : when are two mathematical objects of the same class the same (in some appropriately defined sense of

More information

OF PROBABILITY DISTRIBUTIONS

OF PROBABILITY DISTRIBUTIONS EPSILON ENTROPY OF PROBABILITY DISTRIBUTIONS 1. Introduction EDWARD C. POSNER and EUGENE R. RODEMICH JET PROPULSION LABORATORY CALIFORNIA INSTITUTE OF TECHNOLOGY This paper summarizes recent work on the

More information

Math 564 Homework 1. Solutions.

Math 564 Homework 1. Solutions. Math 564 Homework 1. Solutions. Problem 1. Prove Proposition 0.2.2. A guide to this problem: start with the open set S = (a, b), for example. First assume that a >, and show that the number a has the properties

More information

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1 Lecture I jacques@ucsd.edu Notation: Throughout, P denotes probability and E denotes expectation. Denote (X) (r) = X(X 1)... (X r + 1) and let G n,p denote the Erdős-Rényi model of random graphs. 10 Random

More information

Streaming Algorithms for Optimal Generation of Random Bits

Streaming Algorithms for Optimal Generation of Random Bits Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou, and Jehoshua Bruck, Fellow, IEEE arxiv:09.0730v [cs.i] 4 Sep 0 Abstract Generating random bits from a source of biased coins (the

More information

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( ) Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr Pr = Pr Pr Pr() Pr Pr. We are given three coins and are told that two of the coins are fair and the

More information

Scientific Computing

Scientific Computing 2301678 Scientific Computing Chapter 2 Interpolation and Approximation Paisan Nakmahachalasint Paisan.N@chula.ac.th Chapter 2 Interpolation and Approximation p. 1/66 Contents 1. Polynomial interpolation

More information

Maximization of a Strongly Unimodal Multivariate Discrete Distribution

Maximization of a Strongly Unimodal Multivariate Discrete Distribution R u t c o r Research R e p o r t Maximization of a Strongly Unimodal Multivariate Discrete Distribution Mine Subasi a Ersoy Subasi b András Prékopa c RRR 12-2009, July 2009 RUTCOR Rutgers Center for Operations

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

COS597D: Information Theory in Computer Science October 19, Lecture 10

COS597D: Information Theory in Computer Science October 19, Lecture 10 COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

A Criterion for the Compound Poisson Distribution to be Maximum Entropy

A Criterion for the Compound Poisson Distribution to be Maximum Entropy A Criterion for the Compound Poisson Distribution to be Maximum Entropy Oliver Johnson Department of Mathematics University of Bristol University Walk Bristol, BS8 1TW, UK. Email: O.Johnson@bristol.ac.uk

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables

Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables László Antal, atalie Shlomo, Mark Elliot University of Manchester laszlo.antal@postgrad.manchester.ac.uk 8 September

More information

On Rejection Sampling Algorithms for Centered Discrete Gaussian Distribution over Integers

On Rejection Sampling Algorithms for Centered Discrete Gaussian Distribution over Integers On Rejection Sampling Algorithms for Centered Discrete Gaussian Distribution over Integers Yusong Du and Baodian Wei School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China

More information

Math Bootcamp 2012 Miscellaneous

Math Bootcamp 2012 Miscellaneous Math Bootcamp 202 Miscellaneous Factorial, combination and permutation The factorial of a positive integer n denoted by n!, is the product of all positive integers less than or equal to n. Define 0! =.

More information

IRRATIONAL ROTATION OF THE CIRCLE AND THE BINARY ODOMETER ARE FINITARILY ORBIT EQUIVALENT

IRRATIONAL ROTATION OF THE CIRCLE AND THE BINARY ODOMETER ARE FINITARILY ORBIT EQUIVALENT IRRATIONAL ROTATION OF THE CIRCLE AND THE BINARY ODOMETER ARE FINITARILY ORBIT EQUIVALENT MRINAL KANTI ROYCHOWDHURY Abstract. Two invertible dynamical systems (X, A, µ, T ) and (Y, B, ν, S) where X, Y

More information

The Poisson Channel with Side Information

The Poisson Channel with Side Information The Poisson Channel with Side Information Shraga Bross School of Enginerring Bar-Ilan University, Israel brosss@macs.biu.ac.il Amos Lapidoth Ligong Wang Signal and Information Processing Laboratory ETH

More information

Succinct Data Structures for Approximating Convex Functions with Applications

Succinct Data Structures for Approximating Convex Functions with Applications Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca

More information

MAT 271E Probability and Statistics

MAT 271E Probability and Statistics MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday

More information

A LeVeque-type lower bound for discrepancy

A LeVeque-type lower bound for discrepancy reprinted from Monte Carlo and Quasi-Monte Carlo Methods 998, H. Niederreiter and J. Spanier, eds., Springer-Verlag, 000, pp. 448-458. A LeVeque-type lower bound for discrepancy Francis Edward Su Department

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

ASYMPTOTIC MAXIMUM PRINCIPLE

ASYMPTOTIC MAXIMUM PRINCIPLE Annales Academiæ Scientiarum Fennicæ Mathematica Volumen 27, 2002, 249 255 ASYMPTOTIC MAXIMUM PRINCIPLE Boris Korenblum University at Albany, Department of Mathematics and Statistics Albany, NY 12222,

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0. Notes on Complexity Theory Last updated: October, 2005 Jonathan Katz Handout 5 1 An Improved Upper-Bound on Circuit Size Here we show the result promised in the previous lecture regarding an upper-bound

More information

1 Complex Networks - A Brief Overview

1 Complex Networks - A Brief Overview Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,

More information

PROBABILITY VITTORIA SILVESTRI

PROBABILITY VITTORIA SILVESTRI PROBABILITY VITTORIA SILVESTRI Contents Preface. Introduction 2 2. Combinatorial analysis 5 3. Stirling s formula 8 4. Properties of Probability measures Preface These lecture notes are for the course

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

U e = E (U\E) e E e + U\E e. (1.6)

U e = E (U\E) e E e + U\E e. (1.6) 12 1 Lebesgue Measure 1.2 Lebesgue Measure In Section 1.1 we defined the exterior Lebesgue measure of every subset of R d. Unfortunately, a major disadvantage of exterior measure is that it does not satisfy

More information

HAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS

HAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS HAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS Colin Cooper School of Mathematical Sciences, Polytechnic of North London, London, U.K. and Alan Frieze and Michael Molloy Department of Mathematics, Carnegie-Mellon

More information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Lec 05 Arithmetic Coding

Lec 05 Arithmetic Coding ECE 5578 Multimedia Communication Lec 05 Arithmetic Coding Zhu Li Dept of CSEE, UMKC web: http://l.web.umkc.edu/lizhu phone: x2346 Z. Li, Multimedia Communciation, 208 p. Outline Lecture 04 ReCap Arithmetic

More information