Chain Rules for Entropy

Similar documents
3. Basic Concepts: Consequences and Properties

Entropy, Relative Entropy and Mutual Information

D KL (P Q) := p i ln p i q i

Pr[X (p + t)n] e D KL(p+t p)n.

STK4011 and STK9011 Autumn 2016

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

Econometric Methods. Review of Estimation

IS 709/809: Computational Methods in IS Research. Simple Markovian Queueing Model

Parameter, Statistic and Random Samples

ρ < 1 be five real numbers. The

Chapter 5 Properties of a Random Sample

X ε ) = 0, or equivalently, lim

2. Independence and Bernoulli Trials

Chapter 4 Multiple Random Variables

Point Estimation: definition of estimators

TESTS BASED ON MAXIMUM LIKELIHOOD

Special Instructions / Useful Data

Lecture 3 Probability review (cont d)

CHAPTER VI Statistical Analysis of Experimental Data

18.657: Mathematics of Machine Learning

Lecture 4 Sep 9, 2015

Introduction to Probability

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Law of Large Numbers

1 Solution to Problem 6.40

Functions of Random Variables

Lecture 9: Tolerant Testing

Chapter 5 Properties of a Random Sample

The Mathematical Appendix

Chapter 14 Logistic Regression Models

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

A tighter lower bound on the circuit size of the hardest Boolean functions

PROJECTION PROBLEM FOR REGULAR POLYGONS

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

α1 α2 Simplex and Rectangle Elements Multi-index Notation of polynomials of degree Definition: The set P k will be the set of all functions:

Lecture 02: Bounding tail distributions of a random variable

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Unit 9. The Tangent Bundle

Measures of Entropy based upon Statistical Constants

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

Two Fuzzy Probability Measures

arxiv: v1 [math.st] 24 Oct 2016

Module 7. Lecture 7: Statistical parameter estimation

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

STRONG CONSISTENCY FOR SIMPLE LINEAR EV MODEL WITH v/ -MIXING

Random Variables and Probability Distributions

Bayes (Naïve or not) Classifiers: Generative Approach

18.413: Error Correcting Codes Lab March 2, Lecture 8

STK3100 and STK4100 Autumn 2017

Qualifying Exam Statistical Theory Problem Solutions August 2005

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Lecture 3. Sampling, sampling distributions, and parameter estimation

Summary of the lecture in Biostatistics

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Exchangeable Sequences, Laws of Large Numbers, and the Mortgage Crisis.

Extend the Borel-Cantelli Lemma to Sequences of. Non-Independent Random Variables

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

arxiv:math/ v1 [math.gm] 8 Dec 2005

ESE 523 Information Theory

,m = 1,...,n; 2 ; p m (1 p) n m,m = 0,...,n; E[X] = np; n! e λ,n 0; E[X] = λ.

Nonparametric Density Estimation Intro

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer

Entropies & Information Theory

Multiple Linear Regression Analysis

Continuous Random Variables: Conditioning, Expectation and Independence

The Occupancy and Coupon Collector problems

On Fuzzy Arithmetic, Possibility Theory and Theory of Evidence

Sufficiency in Blackwell s theorem

Probability and Statistics. What is probability? What is statistics?

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Research Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings

The Arithmetic-Geometric mean inequality in an external formula. Yuki Seo. October 23, 2012

Chapter 8: Statistical Analysis of Simulated Data

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Generalized Convex Functions on Fractal Sets and Two Related Inequalities

Application of Generating Functions to the Theory of Success Runs

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Introduction to local (nonparametric) density estimation. methods

Design maintenanceand reliability of engineering systems: a probability based approach

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

4 Inner Product Spaces

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

ON BIVARIATE GEOMETRIC DISTRIBUTION. K. Jayakumar, D.A. Mundassery 1. INTRODUCTION

Hard Core Predicates: How to encrypt? Recap

Dimensionality Reduction and Learning

Transcription:

Cha Rules for Etroy The etroy of a collecto of radom varables s the sum of codtoal etroes. Theorem: Let be radom varables havg the mass robablty x x.x. The...... The roof s obtaed by reeatg the alcato of the two-varable exaso rule for etroes. Codtoal Mutual Iformato We defe the codtoal mutual formato of radom varable ad gve as: log ; E I z y x Mutual formato also satsfy a cha rule: I I... ; ;...

Covex Fucto We recall the defto of covex fucto. A fucto s sad to be covex over a terval ab f for every x x a.b ad 0 λ f λx + λ x λf x + λ f x A fucto f s sad to be strctly covex f equalty holds oly f λ0 or λ. Theorem: If the fucto f has a secod dervatve whch s o-egatve ostve everywhere the the fucto s covex strctly covex. Jese s Iequalty If f s a covex fucto ad s a radom varable the Ef f E Moreover f f s strctly covex the equalty mles that E wth robablty.e. s a costat.

Iformato Iequalty Theorem: Let x qx x χ be two robablty mass fucto. The Wth equalty f ad oly f D q 0 x q x for all x. Corollary: No egatvty of mutual formato: For ay two radom varables I ; 0 Wth equalty f ad oly f ad are deedet Bouded Etroy We show that the uform dstrbuto over the rage χ s the maxmum etroy dstrbuto over ths rage. It follows that ay radom varable wth ths rage has a etroy o greater tha log χ. Theorem: log χ where χ deotes the umber of elemets the rage of wth equalty f ad oly f has a uform dstrbuto over χ. Proof: Let ux / χ be the uform robablty mass fucto over χ ad let x be the robablty mass fucto for. The x D q xlog log χ u x ece by the o-egatvty of the relatve etroy 0 D u log χ 3

Codtog Reduces Etroy Theorem: wth equalty f ad oly f ad are deedet. Proof: 0 I ; Itutvely the theorem says that kowg aother radom varable ca oly reduce the ucertaty. Note that ths s true oly o the average. Secfcally y may be greater tha or less tha or equal to but o the average y y y Examle Let have the followg jot dstrbuto 0 3/4 /8 /8 The /8 7/80544 bts 0 bts ad bt. We calculate 3/4 +/4 0.5 bts. Thus the ucertaty s creased f s observed ad decreased f s observed but ucertaty decreases o the average. 4

Ideedece Boud o Etroy Let are radom varables wth mass robablty x x x. The:... Wth equalty f ad oly f the are deedet. Proof: By the cha rule of etroes:...... Where the equalty follows drectly from the revous theorem. We have equalty f ad oly f s deedet of for all.e. f ad oly f the s are deedet. Fao s Iequalty Suose that we kow a radom varable ad we wsh to guess the value of a correlated radom varable. Fao s equalty relates the robablty of error guessg the radom varable to ts codtoal etroy. It wll be crucal rovg the coverse to Shao s chael caacty theorem. We kow that the codtoal etroy of a radom varable gve aother radom varable s zero f ad oly f s a fucto of. eceweca estmate from wth zero robablty of error f ad oly f 0. Extedg ths argumet we exect to be able to estmate wth a low robablty of error oly f the codtoal etroy s small. Fao s equalty quatfes ths dea. Suose that we wsh to estmate a radom varable wth a dstrbuto x. We observe a radom varable that s related to by the codtoal dstrbuto y x. 5

Fao s Iequalty From we calculate a fucto g ^ where ^ s a estmate of ad takes o values ^. We wll ot restrct the alhabet ^ to be equal to ad we wll also allow the fucto g to be radom. We wsh to boud the robablty that ^. We observe that ^ forms a Markov cha. Defe the robablty of error: Pe Pr{^ }. Theorem: P + P log χ e e + P e log χ The equalty ca be weakeed to: P e log χ Remark: Note that P e 0 mles that 0 as tuto suggests. 6