Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Similar documents
Functions of Random Variables

5 Short Proofs of Simplified Stirling s Approximation

Chapter 4 Multiple Random Variables

Chapter 3 Sampling For Proportions and Percentages

STK4011 and STK9011 Autumn 2016

Lecture 9: Tolerant Testing

Chapter 5 Properties of a Random Sample

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America

Chapter 14 Logistic Regression Models

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Lecture 02: Bounding tail distributions of a random variable

Mu Sequences/Series Solutions National Convention 2014

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

ENGI 4421 Propagation of Error Page 8-01

Special Instructions / Useful Data

X ε ) = 0, or equivalently, lim

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

ESS Line Fitting

Econometric Methods. Review of Estimation

Sampling Theory MODULE X LECTURE - 35 TWO STAGE SAMPLING (SUB SAMPLING)

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

BIOREPS Problem Set #11 The Evolution of DNA Strands

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Lecture 3 Probability review (cont d)

CHAPTER VI Statistical Analysis of Experimental Data

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Logistic regression (continued)

1 Solution to Problem 6.40

3. Basic Concepts: Consequences and Properties

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

LINEAR REGRESSION ANALYSIS

18.657: Mathematics of Machine Learning

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

D KL (P Q) := p i ln p i q i

D. VQ WITH 1ST-ORDER LOSSLESS CODING

Third handout: On the Gini Index

Parameter, Statistic and Random Samples

TESTS BASED ON MAXIMUM LIKELIHOOD

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

ρ < 1 be five real numbers. The

v 1 -periodic 2-exponents of SU(2 e ) and SU(2 e + 1)

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Lecture 3. Sampling, sampling distributions, and parameter estimation

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

PTAS for Bin-Packing

Continuous Distributions

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

A tighter lower bound on the circuit size of the hardest Boolean functions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

The Occupancy and Coupon Collector problems

Dimensionality Reduction and Learning

Comparing Different Estimators of three Parameters for Transmuted Weibull Distribution

18.413: Error Correcting Codes Lab March 2, Lecture 8

Random Variables and Probability Distributions

Chapter -2 Simple Random Sampling

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Chapter -2 Simple Random Sampling

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

Chapter 4 Multiple Random Variables

Lecture Notes to Rice Chapter 5

Point Estimation: definition of estimators

Lecture Note to Rice Chapter 8

Class 13,14 June 17, 19, 2015

Introduction to Probability

8.1 Hashing Algorithms

F. Inequalities. HKAL Pure Mathematics. 進佳數學團隊 Dr. Herbert Lam 林康榮博士. [Solution] Example Basic properties

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Applied Fitting Theory VII. Building Virtual Particles

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Laboratory I.10 It All Adds Up

Beam Warming Second-Order Upwind Method

1 Onto functions and bijections Applications to Counting

arxiv:math/ v1 [math.gm] 8 Dec 2005

VARIABLE-RATE VQ (AKA VQ WITH ENTROPY CODING)

Simulation Output Analysis

A New Family of Transformations for Lifetime Data

A New Measure of Probabilistic Entropy. and its Properties

Rademacher Complexity. Examples

L(θ X) s 0 (1 θ 0) m s. (s/m) s (1 s/m) m s

Lecture 4 Sep 9, 2015

Chapter 8. Inferences about More Than Two Population Central Values

arxiv:cond-mat/ v3 19 Jul 2004

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory

Lecture 1 Review of Fundamental Statistical Concepts

Chapter 8: Statistical Analysis of Simulated Data

Derivation of 3-Point Block Method Formula for Solving First Order Stiff Ordinary Differential Equations

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

MOLECULAR VIBRATIONS

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Transcription:

Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled multomal dstrbuto devate from the true quatty as a fucto of the umber of samples. I ths report, bouds o the expected etropy ad KL-dvergece for a sampled dstrbuto are derved. These bouds ca be helpful uderstadg the error betwee the emprcal quatty ad the true quatty for a dstrbuto. 1 Expected etropy lower boud Cosder a multomal dstrbuto p wth B bs, ad estmates of p obtaed by samplg. Let p be the dstrbuto obtaed by takg samples from p. What s the expected etropy of p as a fucto of? We should expect that for = 1 samples, all the probablty mass wll be oe partcular b of p ad the etropy should be 0. As we expect p p ad Hp ) Hp). Ths dervato explores ths relatoshp. We wat to kow the expected value of the etropy of p, [ B ] E [Hp )] = E p, log p, = E [p, log p, ] 1) We ow cosder oly the expected value term 1). The maxmum lkelhood estmate 1 Jue 27, 2011

of p from samples s p, = x for all = 1... B. Thus, for a partcular b we have [ x E [p, log p, ] = E log x ] = Prx = k) k log k 2) Assumg the samples are d p, the the expected umber of samples b ca be calculated as ) = p k 1 p ) k k k log k = 1! p k 1 p ) k k log k k)!k! = 1! p k 1 p ) k k log k k)!k! k=1 Note that ths equato, p s the true probablty of b p rather tha the estmated value. Also ote that the last le above results from the fact that whe k = 0, the summad s zero. Next we smplfy k k!, pull a out of!, ad pull a p to the frot of the sum obtag 1)! = p k)!k 1)! k=1 p k 1 1 p ) k log k 2 Jue 27, 2011

Now, let j = k 1, m = 1 ad apply Jese s equalty for the followg dervato: = p m j=0 = p m j=0 p log m! m j)!j! Prx = j) log j + 1 m + 1 m j=0 Prx = j) j + 1 m + 1 = p log mp + 1 m + 1 = p log 1)p + 1 = p log p + 1 p ) p j 1 p ) m j log j + 1 m + 1 We obta 4) from 3) usg Jese s equalty for the cocave fucto log ). Equato 5) s just the expected value of the fucto j+1 for the bomal dstrbuto. Substtutg back for m + 1 yelds 6) whch smplfes to 7). Puttg all ths back together, we have E [p, log p, ] p log p + 1 p ) 8) E [p, log p, ] p log p + 1 p ) 9) Recallg equato 1), ad replacg the expectato wth our lower boud we have E [Hp )] = 3) 4) 5) 6) 7) E [p, log p, ] 10) p log p + 1 p ) ) Note that tutvely, as, log p + 1 p log p equato 11) yeldg the ) true etropy. Whe = 1, log p + 1 p = log1) = 0 mplyg Hp 1 ) = 0 as expected. ) Moreover, for each, log p + 1 p < log p ad therefore cotrbutes a smaller factor to the total etropy, mplyg that for small the expected etropy s smaller. 11) 3 Jue 27, 2011

Equato 11) ca also be wrtte as p log p 1 + 1 p )) p = p log p p log 1 + 1 p ) p = Hp) p log 1 + 1 p ) p Equato 12) s useful because t solates the compoet that vares wth. If c = 1 p p the x = c / 1 ad the Taylor expaso to log 1 + x) ca be appled. The Taylor seres of log1 + x) for 1 < x 1 s log1 + x) = x x2 2 + x3 3 x4 4 +... ad so applyg for x = c / we have log1 + c /) = c c2 2 2 + c3 3 3 c4 4 4 +... Pluggg ths to the summato 12) yelds the frst le below, ad the secod le we apply the fact that p c k = 1 p )c k 1 to get ) c = p c2 2 2 + c3 3 3 c4 4 4 +... = 1 p + B 1 1 p )c 2 2 1 p )c 2 3 3... wth the approxmato mprovg as creases. Also ote that ths preserves the boud, sce cludg the frst odd umber of terms of the Taylor seres e. 1,3,5,... terms) s always greater tha the whole seres. Therefore, we ca wrte a alteratve boud o the expected etropy as E [Hp )] Hp) B 1 Ths shows that the lower boud approaches the true etropy wth 1, a property that wll come to play later for the KL-dvergece. We performed some smulatos to obta the expected etropy as a fucto of ad compared ths to the lower boud obtaed usg ths dervato. Iterestgly, ths lower boud seems to be a good approxmato to Hp ), ad for = 1 ad t seems that equalty holds. These smulatos are show fgure 1. 12) 13) 14) 4 Jue 27, 2011

4.5 Hp ) ad lower boud 4 etropy logetropy) 3.5 3 2.5 2 1.5 1 expected etropy 0.5 expected etropy lower boud 0 0 1000 2000 3000 4000 5000 6000 7000 8000 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0.2 expected etropy expected etropy lower boud 0.4 0 1 2 3 4 5 6 7 8 9 10 log) Fgure 1: Smulatos of the expected etropy for varous sample szes ad the lower boud. 5 Jue 27, 2011

1.1 Expected etropy upper boud The lower boud of the expected etropy coverges to the true etropy of the dstrbuto as. Ths s ot surprsg, as p p. Nevertheless, we ca also fd a upper boud o the expected etropy to better uderstad how t vares wth. Startg wth equato 2), we have E [p, log p, ] = = Prx = k) k log k Prx = k) k log k Prx = k) Prx = k) Let a k = Prx = k) k ad b k = Prx = k). The rewrtg, ad applyg the log-sum equalty yelds = a k log a k b k ) a a k k b k = p log p sce a k = p ad b k = 1. Therefore, E [p, log p, ] p log p or equvaletly, E [p, log p, ] p log p. Puttg ths boud o equato 2) back for equato 1) gves E [Hp )] p log p = Hp) I other words, the expected etropy of the sampled dstrbuto obtaed after samples s upper bouded by the etropy of the true dstrbuto. It would be ce to fd a tghter upper boud o the expected etropy, amely, oe that vares wth. Oe crude way to show that the upper boud creases wth s to fd the maxmum etropy p for each. The maxmum etropy p would be oe where each sample lads a ew b, ad for B we have p = log 1/. However, ths s ot a very satsfyg upper boud. It may be more frutful to focus o probablstc bouds, that may stead take the varace to accout. 6 Jue 27, 2011

2 Expected cross etropy The cross etropy betwee q ad p, here deoted as Hq, p) = q log p, ca be thought of as the cost bts of ecodg q usg a code for p. Suppose we have q the dstrbuto obtaed by takg samples from q. The what s the expected cross etropy Hq, p)? Smlar to the expected etropy calculato, we seek [ B ] E [Hq, p)] = E q, log p = = E [q, log p ] E [ x log p The expected value E [x ] above s just the expected cout for b uder the true probablty dstrbuto q. Thus, E [x ] = q ad 1 log p E [x ] = q log p ] = Hq, p) 15) So the expected cross etropy E [Hq, p)] s just the true cross etropy Hq, p). Note that Hp, p) = Hp), ad the specal case where q = p we have that E [Hp, p)] = Hp). 3 Expected KL-dvergece upper boud The KL-dvergece betwee dstrbutos q ad p s wrtte as Dq p) = q log q p. Ths ca be rewrtte as Dq p) = q log q q log p 16) = Hq, p) Hq) 17) For all q ad p of the same dmeso, Dq p) 0 wth equalty ff q = p. The KLdvergece ca be thought of as the addtoal bts requred to ecode q usg a code for p rather tha the code for q. What s the expected KL-dvergece of a sampled dstrbuto p to the true dstrbuto p? Usg the results from the prevous sectos, we have 7 Jue 27, 2011

E [Dp p)] = E [Hp, p) Hp )] = E [Hp, p)] E [Hp )] = Hp) E [Hp )] Hp) + p log p + 1 p ) 18) If we stead wrte the KL-dvergece boud usg equato 14) we have E [Dp p)] Hp) Hp) + B 1 = B 1 For comparg a sampled dstrbuto q agast p, we have E [Dq p)] Hq, p) Hq) + B 1 = Dq p) + B 1 I other words, for a gve umber of samples, we expect the sampled KL-dvergece to be wth a certa rage of the true KL-dvergece, depedg o. We tested ths upper boud by takg samples of p to obta p, computg Dp p), ad repeatg ths may tmes for each. The average of Dp p) s a estmate of the expected KL-dvergece for. We the computed the upper boud usg equato 18) ad plotted ths as well. The smulato results are show fgure 2. 19) 8 Jue 27, 2011

0.16 0.14 Dp p) ad upper boud Emprcal Upper boud Dp p) log 10 Dp p)) 0.12 0.1 0.08 0.06 0.04 0.02 0 0 1 2 3 4 5 um samples x 10 4 0.5 1 1.5 2 2.5 3 3.5 Emprcal Upper boud 4 2 2.5 3 3.5 4 4.5 5 log 10 um samples) Fgure 2: KL-dvergece for p sampled from p for varous, ad the upper boud o the expected value. 9 Jue 27, 2011