arxiv: v5 [cs.it] 28 Feb 2015
|
|
- Virgil Harmon
- 5 years ago
- Views:
Transcription
1 Sampling with arbitrary precision Luc Devroye, Claude Gravel October, 28 arxiv: v5 [cs.it] 28 Feb 25 Abstract We study the problem of the generation of a continuous random variable when a source of independent fair coins is available. We first motivate the choice of a natural criterion for measuring accuracy, the Wasserstein L metric, and then show a universal lower bound for the expected number of required fair coins as a function of the accuracy. In the case of an absolutely continuous random variable with finite differential entropy, several algorithms are presented that match the lower bound up to a constant, which can be eliminated by generating random variables in batches. Keywords: random number generation, random bit model, differential entropy, partition entropy, inversion, probability integral transform, tree-based algorithms, random sampling Introduction Knuth and Yao [] showed that the expected number of independent Bernoulli /2 random bits needed to generate an integer-valued random variable X whose distribution is given by p i P{X i}, where i p i, is at least equal to the binary entropy of X: E EX E{p i } in i def i p i log 2 p i. They also exhibited an algorithm dubbed the DDG tree algorithm for which the expected number of random Bernoulli /2 bits is not more than E + 2. By grouping, one can thus develop algorithms for generating batches of m independent copies of X such that the expected number of Bernoulli /2 random bits per random variable does not exceed E + 2/m. While these results settle the discrete random variate case quite satisfactorily, the generation of continuous or mixed random variables has not been treated satisfactorily in the literature. The objective of this note is to study the number of Bernoulli /2 random bits to generate a random variable X R d with a given precision >, provided that we can define precision in a satisfactory manner. Note that any algorithm that takes as input the accuracy parameter >, returns a random variable Y f T B,..., B T, where B,B 2,...,B T are independent identically distributed or i.i.d. Bernoulli /2 bits, T is the number of bits needed, and f,f 2,... are given sequences of functions. For a vector v R d, McGill University Université de Montréal
2 let v p denote the l p -norm of v for p : v p d i v i p p. For p, the -norm is v sup i d v i. With d, all p-norms are the same for p [, ]. An algorithm with accuracy is such that for some coupling of X the target random variable and Y, X Y p, where p is usually 2 or. This natural notion of accuracy corresponds to the Wasserstein L metric between two probability measures. For a random variable X, we denote by LX the distribution of X. Let M denote the space of all distributions of pairs of random variables X, Y R d R d with fixed marginal distributions F and G, respectively. Then the Wasserstein L distance between X and Y, or between F and G, is W p F, G inf { ess sup X Y p : X, Y M}, where ess sup denotes the essential supremum. This is a distance metric: If distx, Y def W p F, G. distx, Y then there exists a random variable Y output coupled with X target such that ess sup X Y p, i.e., with probability one, X Y p <. This definition of distance should satisfy simulation professionals in the sense that if their calculations require the evaluation of ΨX,..., X d, where X,...,X d are given independent random variables and Ψ is a real-valued functions, then, with probability one, ΨY,..., Y d ΨX,..., X d sup Ψy,..., y d ΨX,..., X d y X p< which is usually a quantity that can easily be controlled. We believe that software packages should have the capability of accepting as input parameter in random variate generation. It is interesting that ET, the expected value of T, can be related to the entropy almost in the way Knuth and Yao did for discrete random variables in []. Our note provides the foundational background for such a study in terms of universal lower bounds and various useful upper bounds for particular algorithms. We include examples for the main distributions. Several authors have adressed the problem of arbitrary precision in sampling algorithms. These include Flajolet and Saheb [2], who explain how to generate the first k bits of an exponential random variable for an integer k. Karney [3] describes an algorithm for the standard normal distribution. Lumbroso s thesis [4] also discusses arbitrary precision sampling. The quantity that appears in our lower bound is the partition entropy. More precisely, let A be a partition of R d and let > be a fixed parameter for the precision. The partition entropy of X with respect to A is the quantity E A X P{X A} log 2. P{X A} A A 2
3 While our results apply to all distributions, we will mainly focus on absolutely continuous distributions, i.e., random variables X with density f. We recall the definition of differential entropy: Ef def fx log 2 dx. x R d fx The differential entropy can be ill-defined,, finite or +. For more information on differential entropy and entropy in general, one can read Cover and Thomas [5]. When X has compact support, then the case + cannot occur. When f is bounded, then the case is excluded. When E X,..., X d <, it can be shown that Ef is well-defined and either finite or ; see Rényi [6], Csiszàr [7] for a proof. Our main result is the following: Theorem. Let X R d be a target random vector with density f, and assume that E X,..., X d <. Consider any algorithm that for given > outputs Y Y using T random fair coins, such that W p X, Y. Then ET Ef + d log 2 2Γ p log + d 2 Γ d p + For p and p, the third term in the lower bound is log 2 2 d /d! and d, respectively. For d, it is. The second part of this note describes several algorithms that come within a constant term of this lower bound, and therefore, are basically optimal if grouping is used for generation. Before tackling all that, we introduce a brief section in which we recall the main exact sampling algorithms for discrete random variables, and their properties, as these will be essential for the understanding of the main algorithms. 2 Bounds for the discrete case In this section, we give simple proofs of two important results for generating discrete random variables the optimal algorithm of Knuth and Yao [] and the more practical but slightly suboptimal algorithm of Han and Hoshi [8]. We recall that we want to sample X with probability vector p, p 2,.... Every p i is decomposed into its binary representation p i j b i,j 2 j, where b i,j {, } for all i, j. Consider the new random variable Z with probability vector b, 2, b,2 2 2,..., b 2, 2, b 2,2 2 2,..., def p,, p,2,..., p 2,, p 2,2,...,, 2 with p i,j b i,j /2 j. Any algorithm for generating a discrete random variable using a source of i.i.d. fair coins B, B 2,... and that is based on a stopping time T when it returns an output can be viewed as a binary tree, where B, B 2,... uniquely determines an infinite path in the tree by the rule is left and is right. We refer to this general class of algorithm as tree-based algorithms they include all possible practical algorithms. Leaves in this tree correspond to outputs. The algorithm. 3
4 of Knuth and Yao can be implemented by a binary tree, a DDG tree, in which each leaf at level j corresponds to a bit b ij in 2. One randomly walks down this tree starting at the root and reaches the leaf for b ij with probability /2 j p ij. At that point, the value i is returned, and indeed, P{X i} p ij p i, as required. A tree-based algorithm is optimal if it minimizes ET for a given probability vector. Theorem Knuth and Yao []. The expected number of bits of an optimum tree-based algorithm for sampling p, p 2,... is bounded from below by E {p i } i and from above by E {pi } i + 2. Proof of Theorem. Given a probability vector p, p 2,..., p n with n possibly infinite, let the binary expansion of p i for i {,..., n} be b i,j p i 2 j and b i,j {, }. j If T denotes the number of bits required by an optimal algorithm for sampling p, p 2,..., p n, then for t Therefore, j number of leaves at level t P{T t} n i ET b i,t 2 t. 2 t tp{t t} t t t n i n i t b i,t 2 t tb i,t 2 t. 3 We now show that the quantity within parentheses of line 3 is lower bounded by p i log 2 /p i and upper bounded by p i log 2 /p i + 2p i and then the result follows. For convenience, let x [, ] and its binary expansion be x j x 2 j. To complete the proof, it remains to prove that x log 2 x j j jx j 2 j x log 2 + 2x. 4 x If m is the first non-zero coefficient of the binary expansion of x, then there are two cases: either x 2 m or 2 2 m < x < 2 m. The inequalities are strict for case 2 since x 2 m. For the first case, x m, and line 4 is obviously true. For the second case, 2 m < x < 2 2 m or, < m + log 2x <. 4
5 Then for the upper bound, j jx j 2 j < m 2 m + jm+ m + 2 m < xm + j 2 j < x2 log 2 x, and for the lower bound, j jx j 2 j jm m mx jx j 2 j jm x j 2 j > x log 2 x. We now recall the Han-Hoshi algorithm published in [8] that implements the inversion method. Given p,..., p n with n countably finite or infinite, the algorithm partitions the interval [, ] into a countable collection of disjoint subintervals [Q i, Q i where Q, and i Q i p k, i {,..., n}. k The idea behind the algorithm is to iteratively refine a random interval I [, and to stop when I [Q i, Q i for a certain i {,..., n}. The inversion principle says that if U uniformly distributed on [, ], then the unique i {,..., n} such that Q i U < Q i+ is distributed according to p,..., p n. For a binary random source of unbiased i.i.d. bits, their algorithm is as follow: Algorithm The Han-Hoshi algorithm using a binary source : T 2: α T 3: β T 4: repeat 5: T T + 6: B Random Bit 7: α T α T + β T α T B/2 8: β T α T + β T α T B + /2 9: I [α T, β T : until I [Q i, Q i : Return i. 5
6 The following two figures and 2 are examples that show the underlying DDG tree during the execution of the Han and Hoshi algorithm Figure : Illustration of the algorithm of Han and Hoshi on the vector p, p 2, p 3, p 4, p 5, p 6, p 7 6, 5 32, 5 32, 9 32, 3 6, 32, 8. The cumulative values are q , q , q , q , q , q , and q p p + p 2 p + p 2 + p 3 Figure 2: Illustration of the Han and Hoshi algorithm on the vector p, p 2, p 3, p 4 such that p., p + p 2., and p + p 2 + p 3.. Let T be the number of random coins needed and also the number of iteration of the repeat loop. For T, [ α T, β T [ αt +, β T +. To every node internal or external corresponds an interval [ α T, β T. The root corresponding to the interval [,. For each internal node corresponds an interval [ α T, β T that is not contained in one of the interval [ Qi, Q i+, and, if the source produces B, then the left child corresponds to the interval [ α T, α T + β T /2 [ α T +, β T + and, if B, then the right child corresponds to [ α T + β T /2, β T [ αt +, β T +. Each leaf external node corresponds to an interval [ α T, β T entirely contained in [ Qi, Q i+ upon which the integer i is returned with probability Q i. The following theorem was proved by Han and Hoshi [8]: 6
7 Theorem 2. For the Han-Hoshi algorithm, p i log 2 ET p i i p i log p i Proof of Theorem 2. Our new proof partitions the leaves L i for symbol i in the DDG tree arbitrarily into two sets, A i and B i, such that A i and B i each possess at least one leaf per level. Let α i u A i pu, β i u B i pu, where pu is the probability attached to leaf u, i.e. /2 depthu. We have p i α i + β i. By nesting and elementary calculations, p i log 2 α i log p 2 + β i log i α 2 i β i i i i i p i log p i i Let α i j be the j-th bit in the binary expansion of α i, and let β i j be the j-th bit for β i. Then ET As in the proof of Theorem, we have so that, using 5, j jα i j 2 j + α i log 2 I α i i β i log 2 II β i i p i log 2 ET p i i j jβ i j 2 j def I + II. α i log 2 + 2α i, α i i i β i log 2 + 2β i, β i i p i log p i p i log p i i i p i 3 Lower bound for generating continuous random vectors In this section, we give a lower bound for the complexity of sampling any continuous distribution to an arbitrary precision. Let A be a countable partition of R d, and let > be a fixed precision parameter. Consider the infinite graph G with as vertices the sets A A, and as edges all pairs A, B A A such that inf x A,y A x y p <. Therefore, if A, B is not an edge of G, then x y p for all x A, y B. Let be the maximal degree of any vertex of G. We now state a lemma that we shall use in conjunction with the Knuth-Yao result, mentioned and reproved in the previous section, in order to prove our main theorem mentioned in the introduction. 7
8 Lemma. Let X be a target random vector of R d. Let Y be an output with the property that, with probability one, X Y p <. Let T be the number of bits used to generate Y by an algorithm. Then { ET sup EA X log 2 + }, A where A and are as above. We can maximize the bound from Lemma, of course, by selecting the most advantageous partition A and combination. The bound from Lemma coincides with the bound in Shannon [9] when the distribution X is discrete with a finite number of atoms since, in that case, by choosing sufficiently small. Proof of Lemma. Let X and Y be two dependent random variables of R d, and denote by p AB P{X A, Y B}. Note that p AB if A, B is not an edge of G. Thus E A X E A Y P{Y B} P{X A, Y B} log 2 P{X A} A,B A A by Jensen s inequality P{X A, Y B} log 2 P{Y B} P{X A} A,B A A P{X A, Y B} log 2 P{Y B} P{X A} B A A A + log 2 B A P{Y B} log 2 +. If T is the random number of bits needed to generate a discrete random variable Y that outputs a vertex A of G with probability P{Y A}, then and therefore ET E A Y by Knuth-Yao, E A X log 2 +, { ET sup EA X log 2 + }. A 6 It is interesting to recall a general result from Csiszàr [7] about the hypercube partition entropy of an absolutely continuous random vector X that will become useful later. Of particular interest to us is the cubic partition A h partitioned by h >. The cells of this partition are of the form d [ ij h, i j + h, i,..., i d Z d. j 8
9 We recall that if X def X,..., X d has finite entropy a condition we refer to as Rényi s condition then if X has a density f, Ef f log 2 f is well-defined, i.e., it is either finite or. We have Lemma 2. Under Rényi s condition, for general partition A, and random variable X R d with density f, E A X Ef + P{X A} log 2, λa A A where λ denotes the Lebesgue measure. In particular, E A h X Ef + d log 2 h Proof of Lemma 2. Fix A A. If Z is uniform on A and Y fz, then P{X A} f λaey. Thus P{X A} λa A λa log 2 P{X A}. EY log 2, EY and, by Jensen s inequality and the concavity of x log 2 /x, EY log 2 E Y log EY 2 Y f log λa 2. A f The inequality follows by summing over A A. Lemma 3 Csiszàr [7]. Let X R d have density f, and let Rényi s condition be satisfied. If Ef >, then as h, E A h X Ef + d log 2 + o. h If Ef, then as h, E A h X d log 2. h Remark. The fifth theorem of Csiszàr [7] stipulates that if E X,..., X d < and f is not absolutely continuous, then, as h, E A h X d log 2. h For more information about the asymptotic theory for the entropy of partitioned distributions as the partitions become finer, one can consult Rényi [6], Csiszàr [7], Csiszàr [], and Linder and Zeger []. 9
10 Theorem 3. Let X R d have density f. Let Y be an output with the property that with probability one, X Y p <. Then, under Rényi s condition, ET Ef + d log 2 log 2 V d,p, where V d,p 2d Γ p + Γ d p + is the volume of the unit ball in R d, and T is the number of random bits needed to generate to Y. Proof of Theorem 3. Let A h be a cubic partition. Then ET sup h E A h X log 2 h + where h is the maximal degree in the graph on A h A h defined by connecting A A h with B A h if inf x A,y B x y p <. We set h /n and use ET lim sup E A n /n X log 2 /n +. Observe that if B r denotes the l p -ball of radius r centered at, then by elementary geometric considerations, λb h d h λ B+2hd /p h d so that as n, Also, /n n d λb V d,pn d. E A /n X Ef + d log 2 n so that E A /n X log 2 /n + n d + log 2 + V d,p n d + o Ef + d log2 log 2 V d,p. Ef + d log 2 n 4 Upper bound for partition-based algorithms Consider a random variable X R d. We call a partition A a -partition if for every set A A, there exists x A A called the center such that sup x A y p. y A
11 Then any algorithm that selects A A with probability pa def P{X A} A f can be used to generate a random variable Y that approximate X to within. After generating A, set Y x A. Then, necessarily, there is a coupling X, Y with X Y p. If the selection of A is done with the help of the method of Knuth and Yao, then, if T still denotes the number of random bits required, ET E A X + 2. For p, we can take A A 2, the cubic partition with sides 2. For d, a simple partition into intervals of length 2 can be used for all values of p. If X has a density f and p or d, then the procedure suggested above has, as, ET E A 2 X + 2 Ef + d log o, 2 Ef + d log d + o, where in the last step we assume Rényi s condition and Ef >. Compare with the lower bound ET Ef + d log 2 d, and note that the difference is 2 + o. For later reference, we recall these values of Ef for the main distributions: Uniform[, ]: Ef, Exponential: Normal, : Ef log 2 e, Ef log 2 2πe. Recall that for X R, a >, a scale factor a shows up as log 2 a in the upper and lower bounds because EaX EX + log 2 a. For general p [,, we can take A A, 2/d p the cubic partition with sides 2/d p. Under Rényi s condition and Ef >, we have ET E A 2/d p d log The difference with the lower bound is D 2 + d p log 2d + d log 2 Γ p + + Ef + 2 d + d p log 2d + o. d log 2 Γ p + + o.
12 Using Γ + u u/e u 2πu, u >, we obtain D 2 + d log 2 Γ p + ep p 2 log 2 2π d + o, p which unfortunately increases linearly with d. To avoid this growing differential which we did not have for p it seems necessary to consider partitions that better approximate l p -balls. 5 Upper bound for inversion-based algorithms The inversion method for generating a random variable X with distribution function F uses the property that X F U has distribution function F, where F denotes the inverse, and U is uniform [, ]. One can use this method as a basis for generating an approximation using only a few random bits. In particular, if U j U.U U 2 2 j, and U, U 2,... are independent Bernoulli/2 random variables, then setting U t.u U t, j then Note that U, U.. Graphically, we have U + t.u U t + 2 t.u U t, U t U U + t. Fx Ut + 2 t U Ut F Ut Xt F U X + t F Ut + 2 t x The number of random coins is Figure 3: Inversion method illustrated T min{t : F U + t F U t 2}. 2
13 If we define Y F U + t then X and Y are coupled in such a way that + F U t 2 X Y. The T defined above is also the number of bits needed to generate Y. Observe that the inversion method requires F in a black box, also called an oracle. On the other hand, it avoids the cumbersome calculation of the cell probabilities and the set-up of the Knuth-Yao DDG tree, and thus shines by its simplicity. In spirit, the inversion method mimics the method of Han and Hoshi, and indeed, this observation leads to a simple bound. Let A 2 be a cubic partition of R into intervals of equal length 2. Denote the probabilities of these intervals by pa P{X A}, A A 2. Assume that we select an interval from A 2 following this law by the method of Han and Hoshi using random bits U, U 2, U 3,... also used in the inversion method. It is easy to see that the number of bits needed before halting in the inversion method is smaller. Therefore, for the inversion method, we have ET E A 2 X + 3. From this and Lemma 3, we conclude Theorem 4. If X has a density f satisfying Rényi s condition and if f log 2 /f >, then as, ET log 2 + f log o. f Remark 2. Comparing Theorem 4 with the universal lower bound ET log 2 + Ef, we see that the difference is 3 + o. Remark 3. The partition-based method has un upper for ET that is one less. Moreover, the simplicity of the inversion method cannot be underestimated. In addition, one can tighten the analysis under additional conditions on f such unimodality, monotonicity, or for specific forms., Theorem 5. Assume that X has a bounded nonincreasing density f on [,. inversion method, as, ET log 2 + f log 2 + o. f Then for the Proof of Theorem 5. Define X t F U t, X + t F U t + as in Figure 3. Then ET P { X t + X t > 2 } t 3
14 t t { P fx t + < } 2 t 2 because X + 2 t t f fx t + X+ t X t X t { P fx < } { 2 t + P fx t + 2 < } 2 t 2 < fx I + II. t Now, { } I E + log 2 2fX log 2 + f log 2 f even if the latter integral is infinite. The Theorem follows if we can show that II o. To this end, note that { II P fx t + < } 2 t 2 fx t. t For a fixed value of t, we see that fx t + < 2 t 2 fx t only if X falls in the interval that captures the value 2 t 2, if such an interval exists. But the probability of each interval is precisely /2 t. However, if 2 t 2 > f, then no such interval exists. Thus, II {t 2 t } log 2 2f t 4f o. For the uniform [, ] density, we have Ef, and so the bound of Theorem 5 is ET log 2 + o. However the o can be omitted in this case as the following simple calculation shows: ET P{T > t} t t t t 2 t i 2 t i {F i+ 2 t F i 2 t >2} 2 t { i+ 2 t i 2 t >2} 2 t {t log2 2 } 4
15 log 2 + log 2 2 log Theorem 5 improves over the bound for the partition method for d by + o. Under other regularity conditions, one can hope to obtain similar bounds that beat the partition bound. For the exponential density, the inversion method yields ET log 2 + Ef + o log 2 + log 2 e + o, where log 2 e Flajolet and Saheb [2] proposed a method for the exponential law that has E{T } log ϕ, where ϕ.2 as. For the normal law, Karney [3] proposes a method that addresses the variable approximation issue but does not offer explicit bounds. Inversion would yield E{T } log 2 + log 2 2πe + o, but the drawback is that this requires the presence of an oracle for F, the inverse gaussian distribution function. Even the partition method requires a nontrivial oracle, namely F. To sidestep this, one can use a slightly more expensive method based on the Box-Müller [2], which states that the pair of random variables 2EV, 2EV 2 with E exponential and V, V 2 uniform on the unit circle, provides a standard gaussian in R 2 of zero mean and unit covariance matrix. The random variable 2E is Maxwell, i.e., it has density re r2 /2, r >, and its differential entropy is Ef Maxwell f Maxwell r log 2 dr log 2 log , f Maxwell r f Maxwell r log r γ log2 + + r2 2 where γ is the Euler-Mascheroni constant. We sketch the procedure, which also serves as an example for more complicated random variate generation problems. Assume that the two normals are required with -accuracy each this corresponds to the choice of d 2 and p. Then we first generate a Maxwell random variable M by inversion, noting that F r e r2 2, dr 5
16 F u 2 log u. The Maxwell random variable M is needed with 2-accuracy. The Maxwell law is unimodal with mode at r. Its left piece has probability e. So we first pick a piece randomly using on average no more than two bits. The we apply inversion on the appropriate piece. By Theorem 5, we use T random bits where 2 ET log 2 + E f Maxwell o. The generated approximation is called M. Next we generate a uniform random variable U [, 2π with accuracy /2 M + /2. The generated value U [, 2π has U U /2 M +/2. Since U has differential entropy log 22π, we see that the expected number of bits, T 2, needed is bounded by M + /2 ET 2 E log 2 /2 M + /2 E log 2 log log 2 2π + o + log 2 2π + o /2 + E log 2 M + log 2 2π + o by the dominated convergence theorem. Then we return M sinu, M cosu, and claim that jointly, To see this, note that and similarly for the cosine. Next, M sinu M sinu M cosu M cosu. sinu sinu U U /2 M + /2 M sinu M sinu M M sinu + M sinu sinu M M + M U U 2 + M /2 M + /
17 Putting everything together, we see that the total expected number of bits is not more than 2 ET + ET 2 2 log 2 + E log 2 M + Ef Maxwell log 2 2π + o 2 log log 2 2πe + o 2 log o. The lower bound for generating two independent gaussians is 4 + o less, i.e., 2 log log 2 2πe 2 2 log Batch generation 6. Randomness extraction Turning a sequence of i.i.d. random variables X, X 2,... into a sequence of i.i.d. Bernoulli /2 bits has been the subject of many papers. The setting of interest to us is the following. Let F, F 2,... be a possibly infinite number of cumulative distributions supported on the positive integers. Let p, p 2,... be a fixed probability vector. Let X, X, X 2,... be i.i.d. random integers drawn from p, p 2,.... Given X, X 2,..., X n for n N, draw Y, Y 2,..., Y n independently from the distributions F X, F X2,..., F Xn. As a special case, we have the classical setting when p and then Y, Y 2,..., Y n are i.i.d. according to F. Let F, F 2,... have binary entropies given by E, E 2,..., all assumed to be finite. In other words the entropy of Y X i is denoted by E i. Assume also that E def p i E i <. i Theorem 6. There exits an algorithm described below that, upon input X, Y, X 2, Y 2,..., X n, Y n outputs a sequence of R n i.i.d. Bernoulli /2 bits where R n n p E as n. Furthermore, these bits are independent of X,..., X n. Theorem 6 describes how many perfect random bits we can extract from Y, Y 2,..., Y n, i.e., R n should be near the information content, the entropy of Y, Y 2,... Y n. Not surprisingly then, the way to achieve this can be inspired by the optimal or near-optimal methods of compression, and, in particular, arithmetic coding. Note that one can assume Y i F X i V i for i {,..., n} where V, V 2,..., V n are i.i.d. uniform random variables on [, ]. For all i N, let the binary expansions of p i be p i j b ij 2 j where b ij {, }. 7
18 def For convenience, p ij b ij for all i, j N N. Also, for all i N, let 2 j { if j, F i j j p i k p ik if j. Following the methodology of arithmetic coding, associate a uniform[, ] random variable U with binary expansion.u U 2 with the infinite data sequence X, Y,..., X n, Y n. The bits U i, i, are i.i.d. Bernoulli/2 and independent of X, X 2, X 3,... To be more precise, consider this algorithm. Algorithm 2 Randomness extraction Input: A sequence of pairs X, Y,..., X n, Y n with X l and Y l as described previously for l {,..., n}. : U 2: U + 3: for l to n do 4: U l U l + U + l U l FXl Y l 5: U + l U l + U + l U l FXl Y l 6: end for 7: R n max { t : 2 t U n 2 t U + n } {R n is the number of bits of the longest prefix common to both U n and U + n.} 8: return 2 Rn U n To verify the correctness of Algorithm 2, the intervals [U l, U + l ] are nested. More precisely, for all l, [U l, U + l ] [U l+, U + l+ ]. Define U lim sup n U n For every iteration, we have lim inf n U + n, and note U [U n, U + n ] U U j U + j U j Since R n max { t : 2 t U n 2 t U + n }, The bits U, U 2,..., U Rn n [U l, U + l ]. l L Uniform[U j, U + j ]. 2 Rn 2Rn U Un U U n + 2 Rn U +. 2 Rn are clearly i.i.d. Proof of Theorem 6. Let t R and consider the two cases {R n t} and {R n < t}. We show that t is concentrated around ne. Before considering the two cases, we compute the useful quantity px E log 2 log p 2 p i + log 2 p ij X Y p i j ij p i log 2 p i + p ij log 2 p ij i i j 8
19 EX + p i EX + def E. i j p i E i + EX i p ij pi log p 2 + log i p 2 ij pi Note that { Rn t } {U + n U n < 2 t } { n l l p Xl Y l p Xl < 2 t } { n pxl log 2 p Xl Y l } > t. 7 The pairs X, Y,..., X n, Y n are i.i.d. and therefore, by the previous calculation, px E log 2 E. p X Y By the law of large numbers, for all >, P{R n > ne + } as n. We have { Rn < t } { } 2 t U n + > 2 t Un. By the law of large numbers, for all >, if t ne, P {U n + Un } { n pxl 2 t P log 2 p Xl Y l For an arbitrary fixed integer k >, { P{R n < t + k} P U + n U n 2 t+k o k, li } t. } { + P U n + Un < } 2 t+k, 2t U n + > 2 t Un { which is as small as desired by the choice of k. The 2/2 k term is due to the fact that the event U + n Un <, 2 t U + 2 t+k n > 2 t Un } occurs only if U m 2 and m N. t 2 t+k 6.2 Batch generation algorithm based on a general DDG tree algorithm Assume given a random variable X N with fixed probability vector p, p 2,... of finite entropy denoted by EX. Assume that we employ a given DDG tree based algorithm for the generation of X. In this tree, let L be the set of leaves and let labelu be the label of leaf u L. Define Let du be the depth of u L. Then we have L i {u L : labelu i} for i N. P{X i} def p i u L i 2 du. 9
20 If the algorithm returns a variable X, then we know that we must have exited via a leaf in L X. Given X i, each exit-leaf has a given probability: P { Exit via leaf u L i } /2 du p i. Let us call the random exit-leaf Y. The DDG tree algorithm thus returns a pair X, Y. We have p i EY X i i p i i u L i p i 2 du log 2 pi 2 du 2 du log 2 2 du + p i log 2 p i i u L i i 2 du log 2 2 du EX u L EY EX. For example, for the Knuth-Yao DDG algorithm, we have EY EX 2, while for the Han- Hoshi algorithm, we have EY EX 3. Our method of batch generation will be valid for any DDG tree with finite EY. The algorithm for batch generating i.i.d. random variables X, X 2,..., X n uses an atomic operation FetchBit that first gets a bit from a queue Q if the this queue is not empty, and otherwise it gets a bit from a Bernoulli /2 generator. It is understood that that FetchBit drives the DDG tree algorithm. Algorithm 3 Batch generation : Q {Initially, the queue is empty.} 2: R {There is no recycled bit initially.} 3: for i to n do 4: Generate X i, Y i by a DDG tree algorithm. {The DDG algorithm uses the operation FetchBit to get bits either from the source or from the queue Q.} 5: return X i 6: Feed X i, Y i to the retrieval algorithm randomness extraction procedure, and recover R i R i bits which are added to Q. 7: end for Theorem 7. The batch generation algorithm uses N n random bits, where whenever EY <. N n n p EX as n, Remark 4. By the Knuth-Yao lower bound, EN n nex, and therefore, the procedure is asymptotically optimal to within o p n bits. The symbol o p n means it is on in probability as n. 2
21 Proof of Theorem 7. We choose a large integer constant k and look at N nk. Let Q t be the size of the queue at time t, and set Q. For j {,..., nk}, let T j be the number of bits needed for generating X j without extraction. The T j are i.i.d. random variables. Then we have the following simple identity: nk N nk T j R nk + Q nk. By the law of large numbers, j T + T T nk ET + T T nk p as n. We note that ET + T T nk nkey because random variables T j are i.i.d. By Theorem 6, we also have that R nk /nk p EY EX as n. Therefore, N nk nk EY + o p EY EX + o p + Q nk nk EX + o p + Q nk nk. The result follows if Q nk /nk p as n. For this, we need only consider an upper bound, since Q nk, and then Q nj Q nj + R nj R nj min j k {T nj T nj, Q nj }. Since R n /n p EY EX, we have R max nj R nj j k n EY EX p 8 and T max nj T nj j k n EY p. 9 Fix >, and let A be the event that both lefthand sides in 8 and 9 are less than, so that P{A c } o. The critical observation is that on A, Q nj + EY EX + n EY n Q nj if Q nj EY n, Q nj + EY EX + n Q nj else. and therefore, max { Q nj, EY EX + }, if 2 EX, max Q nj EY EX + n j k and Q nk nk EY EX +. k 2
22 If we choose k large enough such that EY EX + /k, then { } Qnk P nk > P{A c } o. If batch generation is applied to the partition method for continuous distributions, and the Knuth-Yao or Han-Hoshi method are used for the discrete part of that method, then the expected number of random bits needed per random variable, under Rényi s condition and Ef >, is bounded from above by d log 2 + Ef d + o, thus matching the lower bound for the case p, and all dimensions. 7 Conclusion and outlook Using the notion of maximal coupling between a generated random variable and a theoretical target random variable, we were able to lay the groundwork for a theory of generating with universal lower bound in terms of the number of random Bernoulli/2 bits needed. In order to grace the world s software libraries with variable accuracy generators, much more work is needed, both algorithmic and theoretic. We have shown that the well-known inversion method is nearly optimal in an information-theoretic sense, and will submit further work and other paradigms in the near future. Appendix: Generation of an exponential by convolution The method given in this appendix is based on taking the sum of independent random variables. Using convolution does not require an oracle for F or F which were required by the algorithms given in this paper. Its range of applicability seems however restricted to the uniform and the exponential as suggested by Kakutani s result [3] that can be stated as follows: Theorem 8 Kakutani 948. For all i N, let p i [, ] and let X i be independent Bernoulli random variables such that P{X i } p i. If X i X i2 i, then X is singular X is absolutely continuous X is discrete i i p i 2 2 diverges, p i 2 2 converges, 2 + pi >. 2 First of all, as shown in [4], if X is an exponential mean random variable, then X is distributed as a geometric random variable with parameter /e, and {X}, the fractional part of X, is distributed as a truncated exponential random variable on the interval [,, and X and {X} are independent. We concentrate on the fractional part therefore. The following theorem tells us that the fractional part is the convolution of independent Bernoulli random variables. i 22
23 Theorem 9. Let X,..., X j,... be a sequence of independent Bernoulli distributed random variables with Let X j X j. If P{X j 2 j } p j [, ], and P{X j } p j for all j N. p j p j e /2j then X is a truncated exponential random variable, i.e., the p.d.f. of X is Proof of Theorem 9. Since the Fourier transform of X j is f X x e x for x [, ]. e p j e /2j e /2j +, Ee ıx jt p j e ıt/2j + p j Since X is the sum of the independent X j s, we have Ee ıxt which is the Fourier transform of f X x. e +ıt/2j +. e /2j + Ee ıxjt j j e +ıt/2j + e /2j + e +ıt e + ıt, We can thus generate {X} with precision if we set k log 2, and let Y k X j j k Bernoullip j. j Since a raw Bernoulli random variable requires 2 bits on average, this simple method, which has accuracy guarantee Y {X}, uses an average not more than 2k 2 log 2 bits. On the other hand, the lower bound for {X} is ET log 2 + E{X}, where E{X} e e log 2e. 23
24 The factor 2 in 2 log 2 can be avoided when batch generation is used. It can also be eliminated at a tremendous storage cost if the vector 2 X,..., 2 k X k is generated using the algorithm of Knuth-Yao since we know the individual probabilities, i.e., P { 2 X,..., 2 k X k 2 x,..., 2 k x k } k j for all x,..., x k {, } k. One can show that E 2 X,..., 2 k X k p x j2 j j p j x j2 j, k EX j j k j log 2 e e Thus, for the Knuth-Yao method for this vector, ET k EX j + 2 j p j log 2 + pj p j log2 p j + k + log22 k. e log log 2 + o. e Again using Knuth-Yao, a geometric/e random variable can be generated exactly using not more than e e log 2 + log e 2 e + 2 random bits. An exponential random variable generated by this method has an overall expected complexity ET log o as. Acknowledgment Both authors thank Tamas Linder for his help. Claude Gravel wants to thank Gilles Brassard from Université de Montréal for his financial support. References [] D. E. Knuth and A. C.-C. Yao, The complexity of nonuniform random number generation, in Algorithms and Complexity: New Directions and Recent Results., J. F. Traub, Ed. New York: Carnegie-Mellon University, Computer Science Department, Academic Press, 976, pp , reprinted in Knuth s Selected Papers on Analysis of Algorithms CSLI, 2. 24
25 [2] P. Flajolet and N. Saheb, The complexity of generating an exponentially distributed variate, Journal of Algorithms, vol. 7, pp , 986. [3] C. F. F. Karney, Sampling exactly from the normal distribution, 23, arxiv. [Online]. Available: [4] J. Lumbroso, Probabilistic algorithms for data sreaming and random generation, Ph.D. dissertation, Université Pierre et Marie Curie - Paris 6, 22. [5] T. M. Cover and J. A. Thomas, Elements of Information Theory. New-York: Wiley, 99. [6] A. Rényi, On the dimension and entropy of probability distributions, Acta Mathematica Academiae Scientiarum Hungarica, vol., pp , 959. [7] I. Csiszár, Some remarks on the dimension and entropy of random variables, Acta Mathematica Academiae Scientiarum Hungarica, vol. 2, pp , 96. [8] T. S. Han and M. Hoshi, Interval algorithm for random number generation, IEEE Transactions on Information Theory, vol. 43, no. 2, pp , March 997. [9] C. E. Shannon, A mathematical theory of communication, Bell. Sys. Tech. Journal, vol. 27, pp , , 948. [] I. Csiszár, On the dimension and entropy of order α of the mixture of probability distributions, Acta Mathematica Academiae Scientiarum Hungarica, vol. 3, pp , 962. [] T. Linder and K. Zeger, Asymptotic entropy-constrained performance of tessellating and universal randomized lattice quantization, IEEE Transactions of Information Theory, vol. 4, no. 2, 994. [2] G. E. Box and M. E. Muller, A note on the generation of random normal deviates, Ann. Math. Stat, vol. 29, pp. 6 6, 958. [3] S. Kakutani, On equivalence of infinite product measures, Annals of Mathematics, pp , 948. [4] L. Devroye, Non-Uniform Random Variate Generation. Springer,
Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation
Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract
More informationAn Entropy Bound for Random Number Generation
244 An Entropy Bound for Random Number Generation Sung-il Pae, Hongik University, Seoul, Korea Summary Many computer applications use random numbers as an important computational resource, and they often
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationMMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only
MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University
More informationSTAT 7032 Probability Spring Wlodek Bryc
STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,
More informationOn the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables
On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables Deli Li 1, Yongcheng Qi, and Andrew Rosalsky 3 1 Department of Mathematical Sciences, Lakehead University,
More informationk-protected VERTICES IN BINARY SEARCH TREES
k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from
More informationMultiple Choice Tries and Distributed Hash Tables
Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.
More informationGeneral Principles in Random Variates Generation
General Principles in Random Variates Generation E. Moulines and G. Fort Telecom ParisTech June 2015 Bibliography : Luc Devroye, Non-Uniform Random Variate Generator, Springer-Verlag (1986) available on
More informationarxiv: v4 [cs.it] 17 Oct 2015
Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More information< k 2n. 2 1 (n 2). + (1 p) s) N (n < 1
List of Problems jacques@ucsd.edu Those question with a star next to them are considered slightly more challenging. Problems 9, 11, and 19 from the book The probabilistic method, by Alon and Spencer. Question
More informationOn the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method
On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,
More informationElementary Analysis Math 140D Fall 2007
Elementary Analysis Math 140D Fall 2007 Bernard Russo Contents 1 Friday September 28, 2007 1 1.1 Course information............................ 1 1.2 Outline of the course........................... 1
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationThe small ball property in Banach spaces (quantitative results)
The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence
More informationOptimum Binary-Constrained Homophonic Coding
Optimum Binary-Constrained Homophonic Coding Valdemar C. da Rocha Jr. and Cecilio Pimentel Communications Research Group - CODEC Department of Electronics and Systems, P.O. Box 7800 Federal University
More informationWeek 2: Sequences and Series
QF0: Quantitative Finance August 29, 207 Week 2: Sequences and Series Facilitator: Christopher Ting AY 207/208 Mathematicians have tried in vain to this day to discover some order in the sequence of prime
More informationTopological properties of Z p and Q p and Euclidean models
Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory
UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,
More informationCombinatorics in Banach space theory Lecture 12
Combinatorics in Banach space theory Lecture The next lemma considerably strengthens the assertion of Lemma.6(b). Lemma.9. For every Banach space X and any n N, either all the numbers n b n (X), c n (X)
More informationSome Background Material
Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationMath212a1413 The Lebesgue integral.
Math212a1413 The Lebesgue integral. October 28, 2014 Simple functions. In what follows, (X, F, m) is a space with a σ-field of sets, and m a measure on F. The purpose of today s lecture is to develop the
More informationIntroduction to Information Entropy Adapted from Papoulis (1991)
Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION
More informationAnalysis Qualifying Exam
Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,
More informationBounded Expected Delay in Arithmetic Coding
Bounded Expected Delay in Arithmetic Coding Ofer Shayevitz, Ram Zamir, and Meir Feder Tel Aviv University, Dept. of EE-Systems Tel Aviv 69978, Israel Email: {ofersha, zamir, meir }@eng.tau.ac.il arxiv:cs/0604106v1
More informationLecture 6 Basic Probability
Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic
More informationNotes on Gaussian processes and majorizing measures
Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,
More informationTHE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION
THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION JAINUL VAGHASIA Contents. Introduction. Notations 3. Background in Probability Theory 3.. Expectation and Variance 3.. Convergence
More informationLebesgue Measure on R n
CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More informationInformation Theory and Statistics Lecture 2: Source coding
Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection
More informationChapter 2 Metric Spaces
Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationWalker Ray Econ 204 Problem Set 3 Suggested Solutions August 6, 2015
Problem 1. Take any mapping f from a metric space X into a metric space Y. Prove that f is continuous if and only if f(a) f(a). (Hint: use the closed set characterization of continuity). I make use of
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More informationPCA sets and convexity
F U N D A M E N T A MATHEMATICAE 163 (2000) PCA sets and convexity by Robert K a u f m a n (Urbana, IL) Abstract. Three sets occurring in functional analysis are shown to be of class PCA (also called Σ
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationHard-Core Model on Random Graphs
Hard-Core Model on Random Graphs Antar Bandyopadhyay Theoretical Statistics and Mathematics Unit Seminar Theoretical Statistics and Mathematics Unit Indian Statistical Institute, New Delhi Centre New Delhi,
More informationMATH5011 Real Analysis I. Exercise 1 Suggested Solution
MATH5011 Real Analysis I Exercise 1 Suggested Solution Notations in the notes are used. (1) Show that every open set in R can be written as a countable union of mutually disjoint open intervals. Hint:
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationLecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)
3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More informationMath 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1
Math 8B Solutions Charles Martin March 6, Homework Problems. Let (X i, d i ), i n, be finitely many metric spaces. Construct a metric on the product space X = X X n. Proof. Denote points in X as x = (x,
More informationSIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding
SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,
More informationMathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )
Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical
More informationThe expansion of random regular graphs
The expansion of random regular graphs David Ellis Introduction Our aim is now to show that for any d 3, almost all d-regular graphs on {1, 2,..., n} have edge-expansion ratio at least c d d (if nd is
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationChapter 5: Data Compression
Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,
More informationINDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
More informationMath 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015
Math 30-: Midterm Practice Solutions Northwestern University, Winter 015 1. Give an example of each of the following. No justification is needed. (a) A metric on R with respect to which R is bounded. (b)
More informationA Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources
A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources Wei Kang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationP-adic Functions - Part 1
P-adic Functions - Part 1 Nicolae Ciocan 22.11.2011 1 Locally constant functions Motivation: Another big difference between p-adic analysis and real analysis is the existence of nontrivial locally constant
More informationA LITTLE REAL ANALYSIS AND TOPOLOGY
A LITTLE REAL ANALYSIS AND TOPOLOGY 1. NOTATION Before we begin some notational definitions are useful. (1) Z = {, 3, 2, 1, 0, 1, 2, 3, }is the set of integers. (2) Q = { a b : aεz, bεz {0}} is the set
More informationSupplementary Notes for W. Rudin: Principles of Mathematical Analysis
Supplementary Notes for W. Rudin: Principles of Mathematical Analysis SIGURDUR HELGASON In 8.00B it is customary to cover Chapters 7 in Rudin s book. Experience shows that this requires careful planning
More informationMAGIC010 Ergodic Theory Lecture Entropy
7. Entropy 7. Introduction A natural question in mathematics is the so-called isomorphism problem : when are two mathematical objects of the same class the same (in some appropriately defined sense of
More informationOF PROBABILITY DISTRIBUTIONS
EPSILON ENTROPY OF PROBABILITY DISTRIBUTIONS 1. Introduction EDWARD C. POSNER and EUGENE R. RODEMICH JET PROPULSION LABORATORY CALIFORNIA INSTITUTE OF TECHNOLOGY This paper summarizes recent work on the
More informationMath 564 Homework 1. Solutions.
Math 564 Homework 1. Solutions. Problem 1. Prove Proposition 0.2.2. A guide to this problem: start with the open set S = (a, b), for example. First assume that a >, and show that the number a has the properties
More informationA = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1
Lecture I jacques@ucsd.edu Notation: Throughout, P denotes probability and E denotes expectation. Denote (X) (r) = X(X 1)... (X r + 1) and let G n,p denote the Erdős-Rényi model of random graphs. 10 Random
More informationStreaming Algorithms for Optimal Generation of Random Bits
Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou, and Jehoshua Bruck, Fellow, IEEE arxiv:09.0730v [cs.i] 4 Sep 0 Abstract Generating random bits from a source of biased coins (the
More informationTheorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )
Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr Pr = Pr Pr Pr() Pr Pr. We are given three coins and are told that two of the coins are fair and the
More informationScientific Computing
2301678 Scientific Computing Chapter 2 Interpolation and Approximation Paisan Nakmahachalasint Paisan.N@chula.ac.th Chapter 2 Interpolation and Approximation p. 1/66 Contents 1. Polynomial interpolation
More informationMaximization of a Strongly Unimodal Multivariate Discrete Distribution
R u t c o r Research R e p o r t Maximization of a Strongly Unimodal Multivariate Discrete Distribution Mine Subasi a Ersoy Subasi b András Prékopa c RRR 12-2009, July 2009 RUTCOR Rutgers Center for Operations
More informationProbability and Measure
Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real
More informationCOS597D: Information Theory in Computer Science October 19, Lecture 10
COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept
More informationThe Canonical Gaussian Measure on R
The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More informationA Criterion for the Compound Poisson Distribution to be Maximum Entropy
A Criterion for the Compound Poisson Distribution to be Maximum Entropy Oliver Johnson Department of Mathematics University of Bristol University Walk Bristol, BS8 1TW, UK. Email: O.Johnson@bristol.ac.uk
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationTheorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1
Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationMeasuring Disclosure Risk and Information Loss in Population Based Frequency Tables
Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables László Antal, atalie Shlomo, Mark Elliot University of Manchester laszlo.antal@postgrad.manchester.ac.uk 8 September
More informationOn Rejection Sampling Algorithms for Centered Discrete Gaussian Distribution over Integers
On Rejection Sampling Algorithms for Centered Discrete Gaussian Distribution over Integers Yusong Du and Baodian Wei School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China
More informationMath Bootcamp 2012 Miscellaneous
Math Bootcamp 202 Miscellaneous Factorial, combination and permutation The factorial of a positive integer n denoted by n!, is the product of all positive integers less than or equal to n. Define 0! =.
More informationIRRATIONAL ROTATION OF THE CIRCLE AND THE BINARY ODOMETER ARE FINITARILY ORBIT EQUIVALENT
IRRATIONAL ROTATION OF THE CIRCLE AND THE BINARY ODOMETER ARE FINITARILY ORBIT EQUIVALENT MRINAL KANTI ROYCHOWDHURY Abstract. Two invertible dynamical systems (X, A, µ, T ) and (Y, B, ν, S) where X, Y
More informationThe Poisson Channel with Side Information
The Poisson Channel with Side Information Shraga Bross School of Enginerring Bar-Ilan University, Israel brosss@macs.biu.ac.il Amos Lapidoth Ligong Wang Signal and Information Processing Laboratory ETH
More informationSuccinct Data Structures for Approximating Convex Functions with Applications
Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca
More informationMAT 271E Probability and Statistics
MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday
More informationA LeVeque-type lower bound for discrepancy
reprinted from Monte Carlo and Quasi-Monte Carlo Methods 998, H. Niederreiter and J. Spanier, eds., Springer-Verlag, 000, pp. 448-458. A LeVeque-type lower bound for discrepancy Francis Edward Su Department
More informationTree sets. Reinhard Diestel
1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked
More informationASYMPTOTIC MAXIMUM PRINCIPLE
Annales Academiæ Scientiarum Fennicæ Mathematica Volumen 27, 2002, 249 255 ASYMPTOTIC MAXIMUM PRINCIPLE Boris Korenblum University at Albany, Department of Mathematics and Statistics Albany, NY 12222,
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationHandout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.
Notes on Complexity Theory Last updated: October, 2005 Jonathan Katz Handout 5 1 An Improved Upper-Bound on Circuit Size Here we show the result promised in the previous lecture regarding an upper-bound
More information1 Complex Networks - A Brief Overview
Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,
More informationIntroduction and Preliminaries
Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis
More informationLattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function
Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,
More informationPROBABILITY VITTORIA SILVESTRI
PROBABILITY VITTORIA SILVESTRI Contents Preface. Introduction 2 2. Combinatorial analysis 5 3. Stirling s formula 8 4. Properties of Probability measures Preface These lecture notes are for the course
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More information7 Convergence in R d and in Metric Spaces
STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a
More informationU e = E (U\E) e E e + U\E e. (1.6)
12 1 Lebesgue Measure 1.2 Lebesgue Measure In Section 1.1 we defined the exterior Lebesgue measure of every subset of R d. Unfortunately, a major disadvantage of exterior measure is that it does not satisfy
More informationHAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS
HAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS Colin Cooper School of Mathematical Sciences, Polytechnic of North London, London, U.K. and Alan Frieze and Michael Molloy Department of Mathematics, Carnegie-Mellon
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationLec 05 Arithmetic Coding
ECE 5578 Multimedia Communication Lec 05 Arithmetic Coding Zhu Li Dept of CSEE, UMKC web: http://l.web.umkc.edu/lizhu phone: x2346 Z. Li, Multimedia Communciation, 208 p. Outline Lecture 04 ReCap Arithmetic
More information