Entropy and Limit theorems in Probability Theory
|
|
- Evelyn Young
- 5 years ago
- Views:
Transcription
1 Entropy and Limit theorems in Probability Theory Introduction Shigeki Aida Important Notice : Solve at least one problem from the following Problems -8 and submit the report to me until June 9. What is entropy? Entropy represents the uncertainty of probabilistic phenomena. The following definition is due to Shannon. Definition. Shannon Let us consider a finite set E = {A,..., A N }. A nonnegative function P on E is called a probability distribution if N P {A i} =. Each A i is called an elementary event. A subset of E is called an event. Then, for this probability distribution P, we define the entropy by HP = P {A i } log P {A i }.. emark. We use the convention, 0 log 0 = 0. If we do not mention about the base of the logarithmic function, we mean by log the natural logarithm, log e. We define for a nonnegative sequence {p i } N, Hp,..., p N = p i log p i.. Example. Coin tossing: E = {H, T } and P {H} = P {T } = /. We have HP = log. Dice: E = {,,, 4, 5, 6}. P {i} = /6 i 6. Then we have HP = log 6. Unfair Dice: E = {,,, 4, 5, 6}. P {} = 9/0, P {i} = /50 i 6. [ 0 ] 9/0 HP = log 50 /0 0 log 9 9 < log = HP. Problem For unfair dice E = {,,, 4, 5, 6} with the probability P 4 {} = 8/0, P 4 {} = /0, P 4 {i} = /40 i =, 4, 5, 6, calculate the entropy HP 4. Is HP 4 bigger than HP? In the above examples and, the entropies are nothing but log# all elementary events, because all elementary events have equal probabilities. The notion of entropy appeared in statistical mechanics also. Of course, the discovery is before that in the information theory. The following is a basic property of the entropy. Theorem.4 Suppose that E = N. Then for any probability distribution P, we have 0 HP log N..4 Then the minimum value is attained by probability measures such that P {A i } = for some i. The maximum is attained by the uniform distribution P, namely, P A i = /N for all i N. This is one of lectures of Mathematics B in Graduate School of Science in Tohoku University in 00.
2 We refer the proof to the proof of Theorem. in the next section. The notion of entropy is used to solve the following problem: Problem Here are eight gold coins and a balance. One of coins is an imitation and it is slightly lighter than the others. How many times do you need to use the balance to find the imitation? Solution: In information theory, the entropy stands for the quantity of the information. In the above problem, we have eight equal possibilities such that each coin may be imitation. So the entropy is log 8. We get some information by using the balance. By using the balance one time, we can get the following three informations:.the same weight,.the left one is lighter,.the right one is lighter. So it contains information log. Thus, by using k-times of the balance, we get information which is amount of k log. So if k log < log 8, we do not get full information. So we need k. Also it is not difficult to see that two times is enough. If the number of coins N satisfies n < N n, then n-times is enough. We refer the detail and other various examples to [5]. Problem In the above problem, how many times do you need to use the balance in the case where n = 7? Also present a method how to use the balance. In order to get into the detail, we recall basic notions in probability theory. Basic notions in probability theory Mathematically, probability space is defined to be a measure space whose total measure equals. We refer the audiences to some text books e.g. [4], [], [] for the precise definition. Definition. A triplet Ω, F, P is called a probability space if the following hold. Ω is a set and F is a family of some subsets of Ω satisfying that i If A, A,..., A i,... F, then A i F. ii If A F, then A c F. iii Ω, F. For each A F, a nonnegative number P A is asssigned and satisfying that i 0 P A for all A A. ii P Ω =. iii σ-additivity If A, A,..., A i,... F and A i A j = i j, then P A i = P A i.
3 The nonnegative function P : F [0, ] is called a probability measure on Ω. A F is called an event and P A is called the probability of A. A function X on Ω is called a random variable or measurable function if for any I = [a, b], X I := {ω Ω Xω I} F holds. For random variable X, define µ X I = P X I,. where I denotes an interval on. µ X is also a probability on and it is called the law of X or the distribution of X. Problem Let Ω, F, P be a probability space. A i F. Prove that if A i F i =,,..., then Problem 4 Let Ω, F, P be a probability space. Let A, B F. Under the assumption that A B, prove that P A P B. In this course, we consider the following random variables. The case where X is a discrete-type random variable: Namely, X takes finite number values {a,..., a n }. Then p i := P {ω Ω Xω = a i }=: P X = a i satisfies that n p i =. The probability distribution µ X of X is given by µ X {a i } = p i i n. The case where X has a density function: That is, there exists a nonnegative function fx on such that for any interval [a, b]. P {ω Ω Xω [a, b]} = b a fxdx Definition. For a random variable X, we denote the expectation, the variance by m and σ respectively. Namely, i The case where X is a discrete-type random variable and takes values a,..., a n : m = E[X] = a i P X = a i,. σ = E[X m ] = a i m P X = a i.. ii The case where X is a continuous-type random variable which has the density function f: m = E[X] = xfxdx,.4 σ = E[X m ] = x m fxdx..5
4 Definition. Independence of random variables Let {X i } N be random variables on a probability space Ω, F, P. N is a natural number or N =. {X i } N are said to be independent if for any {X ik } m k= {X i} N m N and < a k < b k <, the following hold: m P X i [a, b ],, X im [a m, b m ] = P X ik [a k, b k ]..6 Definition.4 Let µ be a probability distribution on. Let {X i } be independent random variables and the probability distribution of X i is equal to the same distribution µ for all i. Then {X i } is said to be i.i.d. = independent and identically distributed random variables with the distribution µ. Entropy and Law of large numbers Shannon and McMillan s theorem Suppose we are given a set of numbers A = {,..., N} N. We call A the alphabet and the element is called a letter. A finite sequence {ω, ω,..., ω n } ω i A is called a sentence with the length n. The set of the sentences whose length are n is the product space A n := {ω,..., ω n ω i A}. Let P be a probability distribution on A. We denote P {i} = p i. In this section, we define the entropy of P by using the logarithmic function to the base N: We can prove that HP = P {i} log N P {i}.. Theorem. For every P, 0 HP holds. HP = 0 holds if and only if P {i} = for some i A. HP = holds if and only if P is the uniform distribution, that is, p i = /N for all i. Lemma. Let gx = x log x, or gx = log x. Then for any {m i } N with m i 0 and N m i = and nonnegative sequence {x i } N, we have N g m i x i m i gx i.. Furthermore, when m i > 0 for all i, the equality of. holds if and only if x = = x N. Proof of Theorem.: First, we consider the lower bound. Applying. to the case where m i = x i = p i and gx = log x, we have Hp,..., p N log N p i log = 0.. 4
5 Clearly, in., the equality holds if and only if p i = for some i. Next, we consider the upper bound. By applying Lemma. to the case where m i = /N, x i = p i and gx = x log x, for any nonnegative probability distribution {p i }, we have Since N p i =, this implies g p i N N gp i..4 N log N N p i log p i. Thus, N p i log p i log N and N p i log N p i. By the last assertion of Lemma., the equality holds iff p i = /N for all i. Now we consider the following situation. Here is a memoryless information source S which sends out the letter according to the probability distribution P at each time independently. Namely, mathematically, we consider i.i.d. {X i } with the distribution P. We consider coding problem of the sequence of letters. Basic observation: Suppose that P {} = and P {i} = 0 i N. Then the random sequence X i is, actually, a deterministic sequence {,,...,,...}. Thus, the variety of sequence is nothing. In this case, we do not need to send the all sequences. In fact, if we know that the entropy of P is 0, then immediately after getting the first letter, we know that subsequent all letters are. Namely, we can encode all sentences, whatever the lengths are, to just one letter. Suppose that N and consider a probability measure such that P {} = P {} = / and P {i} = 0 for i N. Then note that the sequences contain i are not sent out. Thus the number of possible sequences under P whose lengths are n are n. Note that the number of all sequences of alphabet A whose lengths are k is N k. Thus, if N k n k n log N = HP, then all possible sentences whose lengths are n can be encoded into the sentences whose lengths are k n. Also the decode is also possible. More precisely, we can prove the following claim. Claim such that If k n HP, then there exists an encoder ϕ : An A k and a decoder ψ : A k A n P ψϕx,..., X n X,..., X n = 0..5 The probability P ψϕx,..., X n X,..., X n is called the error probability. For general P, we can prove the following theorem [6]. Theorem. Shannon and McMillan Take a positive number > HP. For any ε > 0, there exists M N such that for all n M and k satisfying that k n, there exists ϕ : An A k and ψ : A k A n such that P ψϕx,..., X n X,..., X n ε..6 5
6 is called the coding rate. theorem. We need the following estimates for the proof of the above Lemma.4 Let {Z i } be i.i.d. Suppose that E[ Z i ] < and E[ Z i ] <. Then Z + Z n P m n δ σ nδ,.7 where m = E[Z i ], σ = E[Z i m ]. Problem 5 Prove Lemma.4 using the Chebyshev lemma. This lemma immediately implies the following weak law of large numbers. Theorem.5 Assume the same assumption on {Z i } as in Lemma.4. Then lim P Z + Z n m n n δ = 0..8 Proof of Theorem.: Take n N. The probability distribution of the i.i.d. subsequence {X i } n is the probability distribution P n defined on A n such that for any {a i } n, n P n {ω = a,..., ω n = a n } = P {a i }..9 Let us consider random variables on A n, Z i ω = log N P {ω i } i n. Then {Z i } n are i.i.d. and the expectation and the variance are finite. In fact, we have m = E[Z i ] = P {ω i } log n P {ω i } = HP σ = E[Z i E[Z i ] ] = Take δ > 0 such that > HP + δ. By Lemma.4, P n log n N P {ω i } HP + δ log N p i p i HP..0 σ nδ.. Hence, for any ε > 0, there exists M N such that P n log n N P {ω i } HP + δ ε for all n M.. Noting { ω,..., ω n = { { n } log N P {ω i } < HP + δ ω,..., ω n ω,..., ω n } n P {ω i } > N nhp +δ } n P {ω i } N n =: C n,. 6
7 by., we have, for n M, P X,..., X n C n { = P n ω..., ω n A n n } P {ω i } N n P n { On the other hand, we have C n N n P n { Hence we have ω,..., ω n A n n } P {ω i } N nhp +δ ε.4 ω,..., ω n A n n } P {ω i } N n.5 C n N n..6 By this estimate, if k n, then, there exists an injective map φ : C n A k and a map ψ : A k C n such that ψφω,..., ω n = ω,..., ω n for any ω,..., ω n C n. By taking a map ϕ : A n A k which satisfies ϕ Cn = φ, we have P ψϕx,..., X n = X,..., X n P X,..., X n C n ε..7 This completes the proof. 4 Entropy and central limit theorem Let {X i } be i.i.d. such that m = E[X i] = 0 and σ = E[Xi ] =. Let Then we have S n = X + X n n. Theorem 4. Central limit theorem=clt For any < a < b <, lim P S n [a, b] = n b a π e x dx. 4. The probability distribution whose density is Gx = π e x dx is called a normal distribution Gaussian distribution with mean 0 and variance and the notation N0, stands for the distribution. A standard proof of CLT is given by using the characteristic function of S n, ϕt = E[e ts n ] t. Below, we assume Assumption 4. The distribution of X i has the density function f, namely, P X i [a, b] = b a fxdx. 7
8 Then, we can prove that the distribution of S n also has the density function f n x by Lemma 4.. In this case, 4. is equivalent to b lim n a f n xdx = b a Gxdx. 4. By using the entropy of S n, we can prove a stronger result f n x Gx dx = 0 4. lim n under additional assumptions on f. For the distribution P which has the density fx, and a random variable X whose distribution is P, we define the entropy H and Fisher s information I by HP = fx log fxdx, 4.4 IP = f x dx. 4.5 fx We may denote HP by HX, Hf and may denote IP by IX, If. We summarize basic results on H and I. Lemma 4. If random variables X and Y have the density functions f and g respectively then ax + Y a > 0 has the density function hx = x f a a y gydy. Gibbs s lemma Let fx be a density of a probability whose mean 0 and the variance is, that is, xfxdx = 0, 4.6 x fx =. 4.7 Then we have, Hf HG. 4.8 The equality holds if and only if fx = Gx for all x. Shannon-Stam s inequality Let X, Y be independent random variables whose density functions satisfy 4.6 and 4.7. Then for a, b with a + b =, we have a HX + b HY HaX + by. 4.9 The equality holds iff the laws of X and Y are N0,. 4 Blachman-Stam s inequality Let X, Y be independent random variables whose density functions satisfy 4.6 and 4.7. Then for a, b with a + b =, IaX + by a IX + b IY Csiszár-Kullback-Pinsker For a probability density function f satisfying 4.6 and 4.7, we have fx Gx dx HG Hf. 4. 8
9 6 Stam s inequality For a probability density function f, we have e Hf If. 4. πe 7 Gross s inequality For a probability density function f, we have Hf If log πe. 4. Problem 6 Using Stam s inequality and an elementary inequality log x x x > 0, prove Gross s inequality. Problem 7 Prove Stam s inequality by using Gross s inequality in the following way. Show that f t x = tf txis a probability density function for any t > 0. Next substituting f t x to Gross s inequality, get an upper bound estimate for Hf. Optimize the estimate for Hf choosing suitable t and prove Stam s inequality. Problem 8 Prove that Gross s inequality is equivalent to the following Gross s logarithmic Sobolev inequality: For all u = ux with ux dµx =, we have ux log ux dµx u x dµx, 4.4 where dµx = π e x dx. emark 4.4 N f = e Hf is called the Shannon s entropy power functional. Stam [9] proved his inequality in 959. This inequality reveals a relation between the Fisher information and the Shannon entropy. Later, Gross [5] proved his log-sobolev inequality in the form 4.4 in 975. He also proved that the log-sobolev inequality is equivalent to the hypercontractivity of the corresponding Ornstein-Uhlenbeck semi-group. The hypercontractivity is very important in the study of quantum field theory. Gross s motivation is in the study of quantum field theory. After Gross s work, the importance of the log-sobolev inequality was widely known. Meanwhile, the contribution of the Stam had been forgotton but some people noted his contribution like in [, ] and some people call the inequality as Stam-Gross logarithmic Sobolev inequality as in [0]. My recent work [] is related with semiclassical problem of P φ Hamiltonians which appear in the constructive quantum field theory. In the work, I used logarithmic Sobolev inequality to determine the asymptotic behavior of the ground state energy =the lowest eigenvalue of the Hamiltonian under the semiclassical limit 0. By Lemma 4., S n has a density function f n. Also note that HS nm HS n for any n, m N. We can prove f n converges to G. Theorem 4.5 Assume that f is C -function and If <. Then for any n, f n is a continuous function. Moreover we have lim f nx = Gx for all x 4.5 n lim Hf n = HG, 4.6 n lim f n x Gx dx = n 9
10 Sketch of Proof: By Shannon-Stam s inequality, we can prove that 4.6. This and Csiszár- Kullback-Pinsker inequality implies 4.7. On the other hand, Blachman-Stam inequality implies If n If. This and 4.7 implies 4.5. See [], [7] and references in them for the detail. 5 Boltzmann s H-theorem, Markov chain and entropy We recall kinetic theory of rarefied gases. Let v i xt, v i yt, v i zt be the velocity of the i- th molecule at time t i N. N denotes the number of molecules. The velocities v i t = v xt, v i yt, v i zt obey Newton s equation of motion, but N is very big and it is almost meaningless to know each behavior of v i t. Boltzmann considered the probability distribution of the velocity, say, f t v x, v y, v z dv x dv y dv z and derived the following his H-theorem: Theorem 5. Boltzmann Let Ht = f t v x, v y, v z log f t v x, v y, v z dv x dv y dv z. Then d Ht 0. dt emark 5. In statistical mechanics, the entropy is nothing but kht, where k is the Boltzmann s constant =.8 0 J K. Therefore, the above theorem implies that the entropy increases. Some people raised questions about the H-theorem. i Newton s equation of motion for the particles xt = x i t N moving in a potential U reads as follows: m i ẍ i t = U xt x i 5. x i 0 = x i,0, 5. x i 0 = v i, 5. where x i,0, v i are initial positions and the initial velocities. m i is the mass of x i. The time reversed dynamics x t is the solution of 5. with initial velocity v i. Clearly, it is impossible that both entropies of xt and x t increase. This is a contradiction. ii The second question is based on Poincaré s recurrence theorem: There exists a time t 0 such that xt 0 is close to x0. Then at that time t 0, the entropy also should be close to each other. Therefore, the monotone property of entropy does not hold. The reason why the above paradox appeared is in the statistical treatment of rarefied gases. We refer the recent progress based on probabilistic models to [4]. Note: Poincaré s recurrence theorem holds under some additional assumptions on U. From now on, we consider stochastic dynamics which are called Markov chains and prove that the entropy of the dynamics increases when time goes on. 0
11 Example 5. There are datum on the weather in a certain city. Today Fine day ainy day Tomorrow Fine day Probability ainy day Probability Fine day Probability ainy day Probability Today=0-th dayis fine, then how much is the probability that the n-th day is also fine? Solution: Let p k be the probability that the k-th day is fine and set q k = p k, that is the probability that k-th day is rainy day. Then p k, q k satisfies the following recurrence relation: So we obtain p k, q k = p k, q k p k, q k =, 0 k Definition 5.4 Let E = {S,..., S N } be a finite set. E is called a state space. We consider a random motion of a particle on S. Let {p ij } i,j E be nonnegative numbers satisfying that p ij = for all i N. 5.5 j= p ij denotes the probability that the particle moves from S i to S j. If the probability that the particle locates at S i is π i i N at the time n, then the probability that the particle locates at S j at the time n + is N π ip ij. That is, if the initial distribution of the particle is π0 = π,..., π N, then the distribution πn = π n,..., π N n at time n is given by πn = π0p n, 5.6 where P denotes the N N matrix whose i, j-element is p ij. Below, we denote by p n ij i, j-element of P n. the We prove the following. Theorem 5.5 Assume that A p ij = for all j N.
12 A Mixing property of the Markov chain There exists n 0 N such that p n 0 ij > 0 for any i, j {,..., N},. Then for any initial distribution π = π,..., π N, we have Note that πn i N = πn = πp n. emark 5.6 A is equivalent to Also A holds if p ij = p ji for all i, j E. lim πn i =. for all i N. 5.7 n N,..., =,..., P. This theorem can be proved by using the entropy of πn and Lemma.. ecall Hπ = π i log π i. Lemma 5.7 Let Q = q ij be a N, N-probability matrix, that is, q ij 0 for all i, j and N j= q ij = for all i. Assume N q ij = for any j. Then for any π, HπQ Hπ. In addition, if q ij > 0 for all i, j, then, HπQ > Hπ for all π /N,..., /N. Proof. HπQ = = πq i log πq i N N π k q ki log π k q ki. 5.8 k= Since N k= q ki =, by applying Lemma. to the case where m k = q ki, x k = π k, N N π k q ki log π k q ki q ki π k log π k. 5.9 k= k= 5.8, 5.9 and N q ki = imply HπQ Hπ. The last assertion follows from the last assertion of Lemma.. By using this lemma, we prove Theorem 5.5. Proof of Theorem 5.5 Since πn moves in a bounded subset in N, there exist the accumulation points. That is, there exist x = x,..., x N and a subsequence {πnk} k= such that lim k πnk = x. x is also a probability and satisfies that Hx = lim k Hπnk. Note that P n 0 is a probability matrix and all elements of P n 0 are positive. So if x /N,..., /N, then HxP n 0 > Hx. But xp n 0 = lim k πp nk+n 0 and HxP n 0 = lim k HπP nk+n 0. Since Hπn is an increasing sequence, HxP n 0 = Hx. This is a contradiction. So x = /N,..., /N which completes the proof. k= k=
13 eferences [] S. Aida, Semi-classical limit of the lowest eigenvalue of a Schrdinger operator on a Wiener space. II. P φ -model on a finite volume. J. Funct. Anal , no. 0, [] A. Barron, Entropy and the central limit theorem. Ann. Probab , no., 6 4. [] E. Carlen, Super additivity of Fisher s Information and logarithmic Sobolev inequalities, Journal of Functional Analysis, 0 99, 94. [4],,,.,, 00 [5] L. Gross, Logarithmic Sobolev inequalities. Amer. J. Math , no. 4, [6] A.I. Khinchin, Mathematical foundations of information theory, Dover books on advanced mathematics, 957. [7] P.L. Lions and G. Toscani, A strengthened central limit theorem for smooth densities, Journal of Functional Analysis, 9 995, [8] C.E. Shannon, The mathematical theory of communication, Bell Syst, Techn. Journ. 7, 79 4, [9] A.J. Stam, Some inequalities satisfied by the quantities of information of Fisher and Shannon, Information and Control 959, 0. [0] C. Villani, Topics in optimal transportation. Graduate Studies in Mathematics, 58. American Mathematical Society, Providence, I, 00. [] D. Williams, Probability with martingales, Cambridge Mathematical Textbooks, 99. [] K. Yoshida, Markoff process with a stable distribution, Proc. Imp. Acad. Tokyo, 6 940, [],,. [4],,. [5],,.
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationNational Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation
Maximum Entropy and Spectral Estimation 1 Introduction What is the distribution of velocities in the gas at a given temperature? It is the Maxwell-Boltzmann distribution. The maximum entropy distribution
More informationWeek 12-13: Discrete Probability
Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible
More informationChapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University
Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationA concavity property for the reciprocal of Fisher information and its consequences on Costa s EPI
A concavity property for the reciprocal of Fisher information and its consequences on Costa s EPI Giuseppe Toscani October 0, 204 Abstract. We prove that the reciprocal of Fisher information of a logconcave
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationEE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions
EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where
More informationLecture 11: Continuous-valued signals and differential entropy
Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components
More information1 Stat 605. Homework I. Due Feb. 1, 2011
The first part is homework which you need to turn in. The second part is exercises that will not be graded, but you need to turn it in together with the take-home final exam. 1 Stat 605. Homework I. Due
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationLogarithmic Sobolev Inequalities
Logarithmic Sobolev Inequalities M. Ledoux Institut de Mathématiques de Toulouse, France logarithmic Sobolev inequalities what they are, some history analytic, geometric, optimal transportation proofs
More informationStrong log-concavity is preserved by convolution
Strong log-concavity is preserved by convolution Jon A. Wellner Abstract. We review and formulate results concerning strong-log-concavity in both discrete and continuous settings. Although four different
More informationFisher information and Stam inequality on a finite group
Fisher information and Stam inequality on a finite group Paolo Gibilisco and Tommaso Isola February, 2008 Abstract We prove a discrete version of Stam inequality for random variables taking values on a
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationNorthwestern University Department of Electrical Engineering and Computer Science
Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability
More informationEntropy Rate of Stochastic Processes
Entropy Rate of Stochastic Processes Timo Mulder tmamulder@gmail.com Jorn Peters jornpeters@gmail.com February 8, 205 The entropy rate of independent and identically distributed events can on average be
More informationThe Central Limit Theorem: More of the Story
The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit
More informationCOMMENT ON : HYPERCONTRACTIVITY OF HAMILTON-JACOBI EQUATIONS, BY S. BOBKOV, I. GENTIL AND M. LEDOUX
COMMENT ON : HYPERCONTRACTIVITY OF HAMILTON-JACOBI EQUATIONS, BY S. BOBKOV, I. GENTIL AND M. LEDOUX F. OTTO AND C. VILLANI In their remarkable work [], Bobkov, Gentil and Ledoux improve, generalize and
More informationInformation Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST
Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it
More informationLecture 8: Channel Capacity, Continuous Random Variables
EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel
More informationGaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26
Gaussian channel Information theory 2013, lecture 6 Jens Sjölund 8 May 2013 Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26 Outline 1 Definitions 2 The coding theorem for Gaussian channel 3 Bandlimited
More informationELEC546 Review of Information Theory
ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random
More informationNotes 1 : Measure-theoretic foundations I
Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,
More informationScore functions, generalized relative Fisher information and applications
Score functions, generalized relative Fisher information and applications Giuseppe Toscani January 19, 2016 Abstract Generalizations of the linear score function, a well-known concept in theoretical statistics,
More informationOn the Complete Monotonicity of Heat Equation
[Pop-up Salon, Maths, SJTU, 2019/01/10] On the Complete Monotonicity of Heat Equation Fan Cheng John Hopcroft Center Computer Science and Engineering Shanghai Jiao Tong University chengfan@sjtu.edu.cn
More informationEntropy and Large Deviations
Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015 Entropy comes up in many different contexts.
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationLecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN
Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and
More informationHeat equation and the sharp Young s inequality
Noname manuscript No. will be inserted by the editor) Heat equation and the sharp Young s inequality Giuseppe Toscani the date of receipt and acceptance should be inserted later Abstract We show that the
More informationMAGIC010 Ergodic Theory Lecture Entropy
7. Entropy 7. Introduction A natural question in mathematics is the so-called isomorphism problem : when are two mathematical objects of the same class the same (in some appropriately defined sense of
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More informationDiscrete Memoryless Channels with Memoryless Output Sequences
Discrete Memoryless Channels with Memoryless utput Sequences Marcelo S Pinho Department of Electronic Engineering Instituto Tecnologico de Aeronautica Sao Jose dos Campos, SP 12228-900, Brazil Email: mpinho@ieeeorg
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationINTRODUCTION TO INFORMATION THEORY
INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks
More informationEntropy-dissipation methods I: Fokker-Planck equations
1 Entropy-dissipation methods I: Fokker-Planck equations Ansgar Jüngel Vienna University of Technology, Austria www.jungel.at.vu Introduction Boltzmann equation Fokker-Planck equations Degenerate parabolic
More informationA note on the convex infimum convolution inequality
A note on the convex infimum convolution inequality Naomi Feldheim, Arnaud Marsiglietti, Piotr Nayar, Jing Wang Abstract We characterize the symmetric measures which satisfy the one dimensional convex
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationFRAMES IN QUANTUM AND CLASSICAL INFORMATION THEORY
FRAMES IN QUANTUM AND CLASSICAL INFORMATION THEORY Emina Soljanin Mathematical Sciences Research Center, Bell Labs April 16, 23 A FRAME 1 A sequence {x i } of vectors in a Hilbert space with the property
More informationMathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )
Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical
More informationRandom Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.
Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationInformation measures in simple coding problems
Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random
More informationMathematical Methods for Physics and Engineering
Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory
More information10-704: Information Processing and Learning Fall Lecture 9: Sept 28
10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy
More informationA Concise Course on Stochastic Partial Differential Equations
A Concise Course on Stochastic Partial Differential Equations Michael Röckner Reference: C. Prevot, M. Röckner: Springer LN in Math. 1905, Berlin (2007) And see the references therein for the original
More informationInternational Competition in Mathematics for Universtiy Students in Plovdiv, Bulgaria 1994
International Competition in Mathematics for Universtiy Students in Plovdiv, Bulgaria 1994 1 PROBLEMS AND SOLUTIONS First day July 29, 1994 Problem 1. 13 points a Let A be a n n, n 2, symmetric, invertible
More informationLecture 11: Quantum Information III - Source Coding
CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationKinetic models of Maxwell type. A brief history Part I
. A brief history Part I Department of Mathematics University of Pavia, Italy Porto Ercole, June 8-10 2008 Summer School METHODS AND MODELS OF KINETIC THEORY Outline 1 Introduction Wild result The central
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationDimensional behaviour of entropy and information
Dimensional behaviour of entropy and information Sergey Bobkov and Mokshay Madiman Note: A slightly condensed version of this paper is in press and will appear in the Comptes Rendus de l Académies des
More informationLecture 11: Polar codes construction
15-859: Information Theory and Applications in TCS CMU: Spring 2013 Lecturer: Venkatesan Guruswami Lecture 11: Polar codes construction February 26, 2013 Scribe: Dan Stahlke 1 Polar codes: recap of last
More informationLecture 22: Variance and Covariance
EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce
More informationBASICS OF PROBABILITY
October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,
More informationLarge Deviations for Small-Noise Stochastic Differential Equations
Chapter 21 Large Deviations for Small-Noise Stochastic Differential Equations This lecture is at once the end of our main consideration of diffusions and stochastic calculus, and a first taste of large
More informationLarge Deviations for Small-Noise Stochastic Differential Equations
Chapter 22 Large Deviations for Small-Noise Stochastic Differential Equations This lecture is at once the end of our main consideration of diffusions and stochastic calculus, and a first taste of large
More information1 Sequences of events and their limits
O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For
More informationEntropy and Ergodic Theory Lecture 15: A first look at concentration
Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that
More informationA SIMPLE, DIRECT PROOF OF UNIQUENESS FOR SOLUTIONS OF THE HAMILTON-JACOBI EQUATIONS OF EIKONAL TYPE
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 100, Number 2, June 1987 A SIMPLE, DIRECT PROOF OF UNIQUENESS FOR SOLUTIONS OF THE HAMILTON-JACOBI EQUATIONS OF EIKONAL TYPE HITOSHI ISHII ABSTRACT.
More informationlossless, optimal compressor
6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.
More informationElectrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7
Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling
More informationKinetic theory of gases
Kinetic theory of gases Toan T. Nguyen Penn State University http://toannguyen.org http://blog.toannguyen.org Graduate Student seminar, PSU Jan 19th, 2017 Fall 2017, I teach a graduate topics course: same
More informationOF PROBABILITY DISTRIBUTIONS
EPSILON ENTROPY OF PROBABILITY DISTRIBUTIONS 1. Introduction EDWARD C. POSNER and EUGENE R. RODEMICH JET PROPULSION LABORATORY CALIFORNIA INSTITUTE OF TECHNOLOGY This paper summarizes recent work on the
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationMixing in Product Spaces. Elchanan Mossel
Poincaré Recurrence Theorem Theorem (Poincaré, 1890) Let f : X X be a measure preserving transformation. Let E X measurable. Then P[x E : f n (x) / E, n > N(x)] = 0 Poincaré Recurrence Theorem Theorem
More informationChp 4. Expectation and Variance
Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.
More informationA Note on the Central Limit Theorem for a Class of Linear Systems 1
A Note on the Central Limit Theorem for a Class of Linear Systems 1 Contents Yukio Nagahata Department of Mathematics, Graduate School of Engineering Science Osaka University, Toyonaka 560-8531, Japan.
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationRandom Bernstein-Markov factors
Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit
More informationEE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported
More informationLecture 17: Differential Entropy
Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information
More information1 Elementary probability
1 Elementary probability Problem 1.1 (*) A coin is thrown several times. Find the probability, that at the n-th experiment: (a) Head appears for the first time (b) Head and Tail have appeared equal number
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationLecture 2: Convergence of Random Variables
Lecture 2: Convergence of Random Variables Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Introduction to Stochastic Processes, Fall 2013 1 / 9 Convergence of Random Variables
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationEntropy, Compound Poisson Approximation, Log-Sobolev Inequalities and Measure Concentration
Entropy, Compound Poisson Approximation, Log-Sobolev Inequalities and Measure Concentration Ioannis Kontoyiannis 1 and Mokshay Madiman Division of Applied Mathematics, Brown University 182 George St.,
More informationBLOWUP THEORY FOR THE CRITICAL NONLINEAR SCHRÖDINGER EQUATIONS REVISITED
BLOWUP THEORY FOR THE CRITICAL NONLINEAR SCHRÖDINGER EQUATIONS REVISITED TAOUFIK HMIDI AND SAHBI KERAANI Abstract. In this note we prove a refined version of compactness lemma adapted to the blowup analysis
More information1.1 Limits and Continuity. Precise definition of a limit and limit laws. Squeeze Theorem. Intermediate Value Theorem. Extreme Value Theorem.
STATE EXAM MATHEMATICS Variant A ANSWERS AND SOLUTIONS 1 1.1 Limits and Continuity. Precise definition of a limit and limit laws. Squeeze Theorem. Intermediate Value Theorem. Extreme Value Theorem. Definition
More informationLecture 4: Fourier Transforms.
1 Definition. Lecture 4: Fourier Transforms. We now come to Fourier transforms, which we give in the form of a definition. First we define the spaces L 1 () and L 2 (). Definition 1.1 The space L 1 ()
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationMATH4210 Financial Mathematics ( ) Tutorial 7
MATH40 Financial Mathematics (05-06) Tutorial 7 Review of some basic Probability: The triple (Ω, F, P) is called a probability space, where Ω denotes the sample space and F is the set of event (σ algebra
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationBefore you begin read these instructions carefully.
MATHEMATICAL TRIPOS Part IA Friday, 29 May, 2015 1:30 pm to 4:30 pm PAPER 2 Before you begin read these instructions carefully. The examination paper is divided into two sections. Each question in Section
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationVerona Course April Lecture 1. Review of probability
Verona Course April 215. Lecture 1. Review of probability Viorel Barbu Al.I. Cuza University of Iaşi and the Romanian Academy A probability space is a triple (Ω, F, P) where Ω is an abstract set, F is
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationTHE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION
THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION JAINUL VAGHASIA Contents. Introduction. Notations 3. Background in Probability Theory 3.. Expectation and Variance 3.. Convergence
More information3. Review of Probability and Statistics
3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture
More informationHeat Flows, Geometric and Functional Inequalities
Heat Flows, Geometric and Functional Inequalities M. Ledoux Institut de Mathématiques de Toulouse, France heat flow and semigroup interpolations Duhamel formula (19th century) pde, probability, dynamics
More informationRandom Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay
1 / 13 Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay August 8, 2013 2 / 13 Random Variable Definition A real-valued
More informationLecture 4 Noisy Channel Coding
Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem
More informationKernel Density Estimation
EECS 598: Statistical Learning Theory, Winter 2014 Topic 19 Kernel Density Estimation Lecturer: Clayton Scott Scribe: Yun Wei, Yanzhen Deng Disclaimer: These notes have not been subjected to the usual
More information