COMP2610/COMP Information Theory

Similar documents
MAS113 Introduction to Probability and Statistics

Kousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13

Review. December 4 th, Review

Mathematical statistics

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Quick Tour of Basic Probability Theory and Linear Algebra

Mathematical statistics

COMPSCI 240: Reasoning Under Uncertainty

X = X X n, + X 2

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Lecture 2 Sep 5, 2017

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Introduction to Probability

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Probability and Statistics

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

CSE 312 Final Review: Section AA

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Discrete Probability Refresher

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 4: Sampling, Tail Inequalities

Expectation of Random Variables

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Lecture 11. Probability Theory: an Overveiw

CSE 312: Foundations of Computing II Quiz Section #10: Review Questions for Final Exam (solutions)

Econometrics I, Estimation

6 The normal distribution, the central limit theorem and random samples

Fundamental Tools - Probability Theory IV

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Lecture Notes 3 Convergence (Chapter 5)

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Expectation of geometric distribution

Discrete Probability distribution Discrete Probability distribution

Bayesian Inference and MCMC

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x.

Twelfth Problem Assignment

an introduction to bayesian inference

Lecture 4: September Reminder: convergence of sequences

STAT 285: Fall Semester Final Examination Solutions

Bias Variance Trade-off

(Practice Version) Midterm Exam 2

EE514A Information Theory I Fall 2013

Statistics: Learning models from data

Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages

Name: Matriculation Number: Tutorial Group: A B C D E

Statistics II Lesson 1. Inference on one population. Year 2009/10

Confidence Intervals and Hypothesis Tests

U Logo Use Guidelines

Limit Theorems. STATISTICS Lecture no Department of Econometrics FEM UO Brno office 69a, tel

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

5.2 Fisher information and the Cramer-Rao bound

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples

Mathematical Statistics 1 Math A 6330

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Chapter 8. Some Approximations to Probability Distributions: Limit Theorems

Elements of statistics (MATH0487-1)

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Topic 7: Convergence of Random Variables

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 120 minutes.

3. Review of Probability and Statistics

U Logo Use Guidelines

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Lecture 13 Fundamentals of Bayesian Inference

Parametric Models: from data to models

Lecture 2: Review of Basic Probability Theory

MAS223 Statistical Inference and Modelling Exercises

Introduction to Simple Linear Regression

THEORY OF PROBABILITY VLADIMIR KOBZAR

Review of Discrete Probability (contd.)

Review of Probabilities and Basic Statistics

Bernoulli and Binomial Distributions. Notes. Bernoulli Trials. Bernoulli/Binomial Random Variables Bernoulli and Binomial Distributions.

Probability Theory for Machine Learning. Chris Cremer September 2015

Basic Probability. Introduction

Regression #3: Properties of OLS Estimator

Random Variable. Pr(X = a) = Pr(s)

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Regression Estimation Least Squares and Maximum Likelihood

Lecture 22: Final Review

Recitation 2: Probability

Methods of Estimation

Machine Learning

Refresher on Discrete Probability

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

Closed book and notes. 120 minutes. Cover page, five pages of exam. No calculators.

Chapter 3: Random Variables 1

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Transcription:

COMP2610/COMP6261 - Information Theory Lecture 9: Probabilistic Inequalities Mark Reid and Aditya Menon Research School of Computer Science The Australian National University August 19th, 2014 Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 1 / 37

Last time Mutual information chain rule Jensen s inequality Information cannot hurt Data processing inequality Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 2 / 37

Review: Data-Processing Inequality Theorem if X Y Z then: I (X ; Y ) I (X ; Z) X is the state of the world, Y is the data gathered and Z is the processed data No clever manipulation of the data can improve the inferences that can be made from the data No processing of Y, deterministic or random, can increase the information that Y contains about X Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 3 / 37

This time Markov s inequality Chebyshev s inequality Law of large numbers Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 4 / 37

Outline 1 Properties of expectation and variance 2 Markov s inequality 3 Chebyshev s inequality 4 Law of large numbers 5 Wrapping Up Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 5 / 37

1 Properties of expectation and variance 2 Markov s inequality 3 Chebyshev s inequality 4 Law of large numbers 5 Wrapping Up Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 6 / 37

Expectation and Variance Let X be a random variable over X, with probability distribution p Expected value: E[X ] = x X x p(x). Variance: V[X ] = E[(X E[X ]) 2 ] = E[X 2 ] (E[X ]) 2. Standard deviation is V[X ] Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 7 / 37

Properties of expectation A key property of expectations is linearity: [ n ] n E X i = E [X i ]. i=1 i=1 This holds even if the variables are dependent! We have for any a R, E[aX ] = a E[X ]. Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 8 / 37

Properties of variance We have linearity of variance for independent random variables: [ n ] n V X i = V [X i ]. i=1 i=1 Does not hold if the variables are dependent We have for any a R, V[aX ] = a 2 V[X ]. Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 9 / 37

1 Properties of expectation and variance 2 Markov s inequality 3 Chebyshev s inequality 4 Law of large numbers 5 Wrapping Up Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 10 / 37

Markov s Inequality Motivation 1000 school students sit an examination The busy principal is only told that the average score is 40 out of 100 The principal wants to estimate the maximum number of students who scored more than 80 Would it make sense to ask about the minimum number of students? Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 11 / 37

Markov s Inequality Motivation Call x the number of students who score more than 80 We know: 40 1000 = 80x + S, where S is the total score of students scoring less than 80 Exam scores are nonnegative, so certainly S 0 Thus, 80x 40 1000, or x 500. Can we formalise this more generally? Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 12 / 37

Markov s Inequality Theorem Let X be a nonnegative random variable. Then, for any λ > 0, p(x λ) E[X ] λ. Bounds probability of observing a large outcome Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 13 / 37

Markov s Inequality Alternate Statement Corollary Let X be a nonnegative random variable. Then, for any λ > 0, p(x λ E[X ]) 1 λ. Observations of nonnegative random variable unlikely to be much larger than expected value Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 14 / 37

Markov s Inequality Proof E[X ] = x p(x) x X = x p(x) + x p(x) x<λ x λ x λ x p(x) nonneg. of random variable x λ λ p(x) = λ p(x λ). Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 15 / 37

Markov s Inequality Illustration from http://justindomke.wordpress.com/ Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 16 / 37

Markov s Inequality Illustration from http://justindomke.wordpress.com/ Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 17 / 37

Markov s Inequality Illustration from http://justindomke.wordpress.com/ Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 18 / 37

Markov s Inequality Illustration from http://justindomke.wordpress.com/ Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 19 / 37

1 Properties of expectation and variance 2 Markov s inequality 3 Chebyshev s inequality 4 Law of large numbers 5 Wrapping Up Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 20 / 37

Chebyshev s Inequality Motivation Markov s inequality only uses the mean of the distribution What about the spread of the distribution (variance)? 0 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 21 / 37

Chebyshev s Inequality Theorem Let X be a random variable with E[X ] <. Then, for any λ > 0, p( X E[X ] λ) V[X ] λ 2. Bounds the probability of observing an unexpected outcome Do not require non negativity Two-sided bound Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 22 / 37

Chebyshev s Inequality Alternate Statement Corollary Let X be a random variable with E[X ] <. Then, for any λ > 0, p( X E[X ] λ V[X ]) 1 λ 2. Observations are unlikely to occur several standard deviations away from the mean Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 23 / 37

Chebyshev s Inequality Proof Define Y = (X E[X ]) 2. Then, by Markov s inequality, for any ν > 0, But, Also, Thus, setting λ = ν, p(y ν) E[Y ] ν. E[Y ] = V[X ]. Y ν X E[X ] ν. p( X E[X ] λ) V[X ] λ 2. Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 24 / 37

Chebyshev s Inequality Illustration For a binomial with N trials and success probability θ, we have e.g. p( X Nθ 2Nθ(1 θ)) 1 2. Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 25 / 37

Chebyshev s Inequality Example Suppose we have a coin with bias θ, i.e. p(x = 1) = θ Say we flip the coin n times, and observe x 1,..., x n {0, 1} We use the maximum likelihood estimator of θ: ˆθ n = x 1 +... + x n n Estimate how large n should be such that 1% probability of a 5% error p( ˆθ n θ 0.05) 0.01? Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 26 / 37

Chebyshev s Inequality Example Observe that E[ˆθ n ] = V[ˆθ n ] = n i=1 E[x i] = θ n n i=1 V[x i] θ(1 θ) n 2 =. n Thus, applying Chebyshev s inequality to ˆθ n, p( ˆθ n θ > 0.05) We are guaranteed this is less than 0.01 if When θ = 0.5, n 10, 000! n θ(1 θ) (0.05) 2 (0.01). θ(1 θ) (0.05) 2 n. Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 27 / 37

1 Properties of expectation and variance 2 Markov s inequality 3 Chebyshev s inequality 4 Law of large numbers 5 Wrapping Up Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 28 / 37

Independent and Identically Distributed Let X 1,..., X n be random variables such that: Each X i is independent of X j The distribution of X i is the same as that of X j Then, we say that X 1,..., X n are independent and identically distributed (or iid) Example: For n independent flips of an unbiased coin, X 1,..., X n are iid from Bernoulli ( ) 1 2 Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 29 / 37

Law of Large Numbers Theorem Let X 1,..., X n be a sequence of iid random variables, with and V[X i ] <. Define E[X i ] = µ Then, for any ɛ > 0, X n = X 1 +... + X n. n lim p( X n µ > ɛ) = 0. n Given enough trials, the empirical success frequency will be close to the expected value Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 30 / 37

Law of Large Numbers Proof Since X i s are identically distributed, E[ X n ] = µ. Since the X i s are independent, [ ] X1 +... + X n V[ X n ] = V n = V [X 1 +... + X n ] n 2 = nσ2 n 2 = σ2 n. Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 31 / 37

Law of Large Numbers Proof Applying Chebyshev s inequality to X n, p( X n µ ɛ) V[ X n ] ɛ 2 = σ2 nɛ 2. As n, the right hand side 0. Thus, p( X n µ < ɛ) 1. Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 32 / 37

Law of Large Numbers Illustration N = 1000 trials with Bernoulli random variable with parameter 1 2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 100 200 300 400 500 600 700 800 900 1000 Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 33 / 37

Law of Large Numbers Illustration N = 50000 trials with Bernoulli random variable with parameter 1 2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 4 x 10 Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 34 / 37

1 Properties of expectation and variance 2 Markov s inequality 3 Chebyshev s inequality 4 Law of large numbers 5 Wrapping Up Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 35 / 37

Summary & Conclusions Markov s inequality Chebyshev s inequality Law of large numbers Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 36 / 37

Next time Ensembles and sequences Typical sets Approximation Equipartition (AEP) Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 37 / 37