Concentration Inequalities
|
|
- Marilynn Robbins
- 5 years ago
- Views:
Transcription
1 Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does X concentrate around its mean? That is, assuming w.l.o.g. that E[X] = 0, how well can we bound b. Chernoff bounds: P(X t?. First, recall Markov s inequality, that if X 0, then P(X t E[X]/t.. Extension: use moment generating funtions to get exponential tails (Chernoff bound. For any λ > 0, P(X t = P(e λx e λt E[eλX ] e λt = ϕ X (λe λt, where ϕ X := E[e λx ] is the moment generating function of X. 3. In particular, we have P(X t inf λ 0 exp(logϕ X(λ λt. c. Sub-Gaussian and sub-exponential random variables. A mean-zero random variable X is sub-gaussian with parameter σ if for all λ R, E[e λx σ ] exp. If X N(0,σ, then this holds with equality.. A mean-zero random variable X is sub-exponential with parameters (τ,b if for all λ such that λ /b, E[e λx τ ] exp. Any sub-gaussian random variable is sub-exponential d. Examples and consequences 0
2 Stanford Statistics 3/Electrical Engineering 377. Example. (Bounded random variables: Suppose that X [a, b], where < a b < + and E[X] = 0. Then Hoeffding s lemma states that E[e λx (b a ] exp, 8 so that X is (b a /4-sub-Gaussian.. Chernoff bounds extend naturally to sums of independent random variables. For example, Hoeffding s inequality is the following. Proposition.. Let X i be independent, mean-zero, σi -sub-gaussian random variables. Then ( ( P X i t exp t n σ i Proof We simply apply the Chernoff technique repeatedly, then optimize over λ 0. Indeed, we have ( [ ( P X i t E exp λ [ ( = E exp n X i ] e λt. λ X i ]e λt E[exp(λX n ] σn exp λt n exp σ i [ ( E exp λ λt, n ] X i where we have used induction. Taking derivatives of the terms inside the exponent with respect to λ, we see that λ = t/ n σ i 0 minimizes the expression, whence we obtain the desired result. As an immediate corollary of this proposition and Example., we obtain the usual Hoeffding bound: if X i [a i,b i ] and E[X i ] = 0, then II. Entropy and concentration ( ( t P X i t exp n (b i a i. a. We would like to develop techniques to give control over more complicated functions than simply sums of the X i ; suppose we have Z = f(x,...,x n and we would like to know if Z is concentrated around its mean. b. Let φ : R R be a convex function. The φ-entropy of a random variable X is assuming the relevant expectations exist. H φ (X := E[φ(X] φ(e[x], (.0.
3 Stanford Statistics 3/Electrical Engineering 377. Example: φ(t = t, then H φ (X = E[X ] E[X] = Var(X. Note that H φ (X 0 always by Jensen s inequality, and strictly so for non-constant X with strictly convex φ. c. Idea: if X is concentrated around its mean, then H φ (X should be small as well, at least for nice φ.. Entropy we focus on: use φ(t = tlogt, which gives us the entropy H(X = E[XlogX] E[X]logE[X] as long as X 0.. In particular, consider the transformation e λx. Then assuming E[e λx ] <, we study H(e λx. d. The Herbst arguments (making rigorous the idea that H(X being small should imply concentration of X. Proposition.3. Let X be a random variable and assume that there exists a constant σ < such that H(e λx λ σ ϕ X(λ. (.0. for all λ R (or λ R + where ϕ X (λ = E[e λx ] denotes the moment generating function of X. Then X E[X] is σ -sub-gaussian. Proof Let ϕ = ϕ X for shorthand. The proof procedes by an integration argument, where we show that logϕ(λ λ σ. First, note that so that inequality (.0. is equivalent to ϕ (λ = E[Xe λx ], λϕ (λ ϕ(λlogϕ(λ = H(e λx λ σ ϕ(λ, and dividing both sides by λ ϕ(λ yields the equivalent statement But by inspection, we have Moreover, we have that ϕ (λ λϕ(λ σ logϕ(λ λ. λλ logϕ(λ = ϕ (λ λϕ(λ λ logϕ(λ. log ϕ(λ logϕ(λ logϕ(0 lim = lim = ϕ (0 λ 0 λ λ 0 λ ϕ(0 = E[X]. Integrating from 0 to any λ 0, we thus obtain λ0 [ logϕ(λ 0 E[X] = λ 0 0 λ ] λ logϕ(λ dλ λ0 0 σ dλ = σ λ 0.
4 Stanford Statistics 3/Electrical Engineering 377 Multiplying each side by λ 0 gives as desired. loge[e λ 0(X E[X] ] = loge[e λ 0X ] λ 0 E[X] σ λ 0,. Note: can be extended to sub-exponential random variables III. Information theoretic inequalities a. Idea: let us relate divergences to entropy quantities. For this part, let be the collection of all variables except X i. b. Intermediate step: Han s inequality X \i = (X,...,X i,x i+,...,x n Proposition.4. Let X,...,X n be discrete random variables. Then H(X n n H(X \i. Proof The proof is a consequence of the chain rule for entropy and that conditioning reduces entropy. We have H(X n = H(X i X \i +H(X \i H(X i X i +H(X \i. Writing this inequality for each i =,...,n, we obtain nh(x n H(X \i + H(X i X i = H(X \i +H(X, n and subtractin H(X n from both sides gives the result. c. Intermediate step: a divergence version of Han s inequality. Let Q be an arbitrary distribution over X n and P = P P n be a product distribution. For A X n, definining the marginal densities Q (i (A := Q(X \i A and P (i (A = P(X \i A. Proposition.5. With the above definitions, D kl (Q P [D kl (Q P D kl ( Q (i P (i]. 3
5 Stanford Statistics 3/Electrical Engineering 377 Proof We have seen earlier in the notes (recall the definition (.. of the KL divergence as a supremum over all quantizers and the surrounding discussion that it is no loss of generality to assume that X is discrete. Thus, noting that the probability mass functions q (i (x \i = x q(x i,x,x n i+ and p (i (x \i = j ip j (x j, we have that Han s inequality (Proposition.4 is equivalent to (n x n q(x n logq(x n x \i q (i (x \i logq (i (x \i. Now, by subtracting q(x n logp(xn from both sides of the preceding display, we obtain (n D kl (Q P = (n x n q(x n logq(x n (n x n q(x n logp(x n q (i (x \i logq (i (x \i (n q(x n logp(x n. x \i We expand the final term. Indeed, by the product nature of the distributions p, we have x n (n x n q(x n logp(x n = (n x n = x n q(x n q(x n j i logp i (x i logp i (x i = } {{ } =logp (i (x \i x \i q (i (x \i logp (i (x \i. Noting that q (i (x \i logq (i (x \i q (i (x \i logp (i (x \i = D kl (Q (i P (i x \i x \i and rearranging gives the desired result. d. Tilting a distribution. Frequent idea (large deviations, statistics, reliability, heavy-tailed data.. Intuition: Let Y = f(x,...,x n 0. If Y is concentrated around its mean (for distribution P, we would expect that f constant under the distribution P, that is, f(x n p(xn cp(xn, and thus the distribution should have D kl (Q P small q(x n := f(xn p(xn E P [f(x n ] These insights allow us to tensorize the entropy: 4
6 Stanford Statistics 3/Electrical Engineering 377 Theorem.6. Let X,...,X n be independent random variables and Y = f(x n, where f is a non-negative function. Define H(Y X \i = E[Y logy X \i ]. Then [ ] H(Y E H(Y X \i. (.0.3 Proof It is clear that if inequality (.0.3 holds for Y, it also holds identically for cy, so we assume without loss of generality that E P [Y] =. Thus, by defining the tilted distribution q(x n = f(xn p(xn, we have Q(Xn =, and moreover, we have D kl (Q P = q(x n log q(xn p(x n dxn = f(x n p(x n logf(x n dx n = H(Y, and similarly, if φ(t = tlogt, then D kl (Q P D kl ( Q (i P (i = E[φ(Y] X n ( f(x i,x,x n i+p i (xdx log p(i (x \i f(x i,x,x n i+ p i(xdx p (i p (i (x \i dx \i (x \i = E[φ(Y] E[Y x \i ]loge[y x \i ]p (i (x \i dx \i X n = E[φ(Y] E[φ(E[Y X \i ]]. Noting by the tower property of expectations that E[φ(Y] E[φ(E[Y X \i ]] = E[E[φ(Y X \i ] E[φ(E[Y X \i ]]] = E[H(Y X \i ] and using Han s inequality for relative entropies (Proposition.4 gives H(Y = D kl (Q P [D kl (Q P D kl ( Q (i P (i] = E[H(Y X \i ], which is our desired result. 3. Some intuition: if we can show that individually H(Y X \i is not too big, then the Herbst argument (Proposition.3 coupled with the Hoeffding-type bound will give strong sub-gaussian tails. IV. Convex functions and concentration 5
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported
More informationInformation Theory and Communication
Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information
More informationConcentration inequalities and the entropy method
Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationProbability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27
Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationLecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write
Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationConcentration inequalities and tail bounds
Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationEE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions
EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More information8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x)
8. Limit Laws 8.1. Basic Limit Laws. If f and g are two functions and we know the it of each of them at a given point a, then we can easily compute the it at a of their sum, difference, product, constant
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationEntropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information
Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture
More informationChapter 3: Random Variables 1
Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationLectures 22-23: Conditional Expectations
Lectures 22-23: Conditional Expectations 1.) Definitions Let X be an integrable random variable defined on a probability space (Ω, F 0, P ) and let F be a sub-σ-algebra of F 0. Then the conditional expectation
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationSTAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)
STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 07 Néhémy Lim Moment functions Moments of a random variable Definition.. Let X be a rrv on probability space (Ω, A, P). For a given r N, E[X r ], if it
More informationA NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER
A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationSolution for Problem 7.1. We argue by contradiction. If the limit were not infinite, then since τ M (ω) is nondecreasing we would have
362 Problem Hints and Solutions sup g n (ω, t) g(ω, t) sup g(ω, s) g(ω, t) µ n (ω). t T s,t: s t 1/n By the uniform continuity of t g(ω, t) on [, T], one has for each ω that µ n (ω) as n. Two applications
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationNational Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation
Maximum Entropy and Spectral Estimation 1 Introduction What is the distribution of velocities in the gas at a given temperature? It is the Maxwell-Boltzmann distribution. The maximum entropy distribution
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationChapter 3: Random Variables 1
Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationExponential Distribution and Poisson Process
Exponential Distribution and Poisson Process Stochastic Processes - Lecture Notes Fatih Cavdur to accompany Introduction to Probability Models by Sheldon M. Ross Fall 215 Outline Introduction Exponential
More informationChapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University
Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real
More informationIE 521 Convex Optimization
Lecture 5: Convex II 6th February 2019 Convex Local Lipschitz Outline Local Lipschitz 1 / 23 Convex Local Lipschitz Convex Function: f : R n R is convex if dom(f ) is convex and for any λ [0, 1], x, y
More information3. Review of Probability and Statistics
3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture
More informationX n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)
14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is
More informationErrata for FIRST edition of A First Look at Rigorous Probability
Errata for FIRST edition of A First Look at Rigorous Probability NOTE: the corrections below (plus many more improvements) were all encorporated into the second edition: J.S. Rosenthal, A First Look at
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More information4 Expectation & the Lebesgue Theorems
STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does
More informationContinuous Random Variables and Continuous Distributions
Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable
More informationHigh Dimensional Probability
High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability
More informationOn the Bennett-Hoeffding inequality
On the Bennett-Hoeffding inequality Iosif 1,2,3 1 Department of Mathematical Sciences Michigan Technological University 2 Supported by NSF grant DMS-0805946 3 Paper available at http://arxiv.org/abs/0902.4058
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationBMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs
Lecture #7 BMIR Lecture Series on Probability and Statistics Fall 2015 Department of Biomedical Engineering and Environmental Sciences National Tsing Hua University 7.1 Function of Single Variable Theorem
More informationComplex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationSelected Exercises on Expectations and Some Probability Inequalities
Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ
More informationarxiv: v1 [math.pr] 11 Feb 2019
A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke
More information2 Continuous Random Variables and their Distributions
Name: Discussion-5 1 Introduction - Continuous random variables have a range in the form of Interval on the real number line. Union of non-overlapping intervals on real line. - We also know that for any
More information2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).
Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent
More informationHomework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015
10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,
More informationLecture Notes for Statistics 311/Electrical Engineering 377. John Duchi
Lecture Notes for Statistics 311/Electrical Engineering 377 March 13, 019 Contents 1 Introduction and setting 6 1.1 Information theory..................................... 6 1. Moving to statistics....................................
More information1 Large Deviations. Korea Lectures June 2017 Joel Spencer Tuesday Lecture III
Korea Lectures June 017 Joel Spencer Tuesday Lecture III 1 Large Deviations We analyze how random variables behave asymptotically Initially we consider the sum of random variables that take values of 1
More informationHomework 2: Solution
0-704: Information Processing and Learning Sring 0 Lecturer: Aarti Singh Homework : Solution Acknowledgement: The TA graciously thanks Rafael Stern for roviding most of these solutions.. Problem Hence,
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationChapter 4. Continuous Random Variables 4.1 PDF
Chapter 4 Continuous Random Variables In this chapter we study continuous random variables. The linkage between continuous and discrete random variables is the cumulative distribution (CDF) which we will
More information18.175: Lecture 8 Weak laws and moment-generating/characteristic functions
18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 18.175 Lecture 8 1 Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationExample: Letter Frequencies
Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o
More informationChapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for
More informationExample: Letter Frequencies
Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o
More informationProving the central limit theorem
SOR3012: Stochastic Processes Proving the central limit theorem Gareth Tribello March 3, 2019 1 Purpose In the lectures and exercises we have learnt about the law of large numbers and the central limit
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationOn the number of ways of writing t as a product of factorials
On the number of ways of writing t as a product of factorials Daniel M. Kane December 3, 005 Abstract Let N 0 denote the set of non-negative integers. In this paper we prove that lim sup n, m N 0 : n!m!
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationAN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES
Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More informationOutline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.
Outline Piotr 1 1 Lane Department of Computer Science and Electrical Engineering West Virginia University 8 April, 01 Outline Outline 1 Tail Inequalities Outline Outline 1 Tail Inequalities General Outline
More informationLecture 14 February 28
EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationInformation Theory: Entropy, Markov Chains, and Huffman Coding
The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationStochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring 2018 version. Luc Rey-Bellet
Stochastic Processes and Monte-Carlo Methods University of Massachusetts: Spring 2018 version Luc Rey-Bellet April 5, 2018 Contents 1 Simulation and the Monte-Carlo method 3 1.1 Review of probability............................
More informationh(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote
Real Variables, Fall 4 Problem set 4 Solution suggestions Exercise. Let f be of bounded variation on [a, b]. Show that for each c (a, b), lim x c f(x) and lim x c f(x) exist. Prove that a monotone function
More information1 Review of Probability
1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x
More informationSharpening Jensen s Inequality
Sharpening Jensen s Inequality arxiv:1707.08644v [math.st] 4 Oct 017 J. G. Liao and Arthur Berg Division of Biostatistics and Bioinformatics Penn State University College of Medicine October 6, 017 Abstract
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationStochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet
Stochastic Processes and Monte-Carlo Methods University of Massachusetts: Spring 2010 Luc Rey-Bellet Contents 1 Random variables and Monte-Carlo method 3 1.1 Review of probability............................
More informationMAS113 Introduction to Probability and Statistics. Proofs of theorems
MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a
More informationEntropy and Ergodic Theory Lecture 15: A first look at concentration
Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that
More information8 Laws of large numbers
8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable
More informationTwo hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45
Two hours Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER PROBABILITY 2 14 January 2015 09:45 11:45 Answer ALL four questions in Section A (40 marks in total) and TWO of the THREE questions
More informationPROOF OF ZADOR-GERSHO THEOREM
ZADOR-GERSHO THEOREM FOR VARIABLE-RATE VQ For a stationary source and large R, the least distortion of k-dim'l VQ with nth-order entropy coding and rate R or less is δ(k,n,r) m k * σ 2 η kn 2-2R = Z(k,n,R)
More informationA Gentle Introduction to Concentration Inequalities
A Gentle Introduction to Concentration Inequalities Karthik Sridharan Abstract This notes is ment to be a review of some basic inequalities and bounds on Random variables. A basic understanding of probability
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationWe introduce methods that are useful in:
Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more
More information