arxiv:cond-mat/ v3 19 Jul 2004

Similar documents
Chapter 5 Properties of a Random Sample

CHAPTER VI Statistical Analysis of Experimental Data

Lecture 3. Sampling, sampling distributions, and parameter estimation

Econometric Methods. Review of Estimation

Point Estimation: definition of estimators

X ε ) = 0, or equivalently, lim

Special Instructions / Useful Data

STK4011 and STK9011 Autumn 2016

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Chapter 3 Sampling For Proportions and Percentages

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture Note to Rice Chapter 8

Functions of Random Variables

Class 13,14 June 17, 19, 2015

Lecture 9: Tolerant Testing

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Chapter 4 Multiple Random Variables

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem

Simulation Output Analysis

Lecture Notes Types of economic variables

TESTS BASED ON MAXIMUM LIKELIHOOD

Lecture 3 Probability review (cont d)

5 Short Proofs of Simplified Stirling s Approximation

Chapter 14 Logistic Regression Models

Journal of Mathematical Analysis and Applications

ρ < 1 be five real numbers. The

EVALUATION OF FUNCTIONAL INTEGRALS BY MEANS OF A SERIES AND THE METHOD OF BOREL TRANSFORM

Module 7: Probability and Statistics

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

1 Onto functions and bijections Applications to Counting

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Summary of the lecture in Biostatistics

Non-uniform Turán-type problems

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Introduction to local (nonparametric) density estimation. methods

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

CHAPTER 4 RADICAL EXPRESSIONS

Simple Linear Regression

Bayes (Naïve or not) Classifiers: Generative Approach

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Chapter 10 Two Stage Sampling (Subsampling)

On the convergence of derivatives of Bernstein approximation

Simple Linear Regression

Derivation of 3-Point Block Method Formula for Solving First Order Stiff Ordinary Differential Equations

22 Nonparametric Methods.

Overview of the weighting constants and the points where we evaluate the function for The Gaussian quadrature Project two

LINEAR REGRESSION ANALYSIS

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory

ENGI 3423 Simple Linear Regression Page 12-01

means the first term, a2 means the term, etc. Infinite Sequences: follow the same pattern forever.

A New Family of Transformations for Lifetime Data

Analysis of Variance with Weibull Data

Chapter 8. Inferences about More Than Two Population Central Values

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

arxiv: v1 [math.st] 24 Oct 2016

Multiple Choice Test. Chapter Adequacy of Models for Regression

F. Inequalities. HKAL Pure Mathematics. 進佳數學團隊 Dr. Herbert Lam 林康榮博士. [Solution] Example Basic properties

1 Solution to Problem 6.40

GENERALIZED METHOD OF MOMENTS CHARACTERISTICS AND ITS APPLICATION ON PANELDATA

Multiple Linear Regression Analysis

Chapter 8: Statistical Analysis of Simulated Data

Lecture Notes to Rice Chapter 5

18.657: Mathematics of Machine Learning

PROPERTIES OF GOOD ESTIMATORS

Continuous Distributions

Chapter -2 Simple Random Sampling

The Mathematical Appendix

MEASURES OF DISPERSION

Maximum Likelihood Estimation

ESS Line Fitting

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Extreme Value Theory: An Introduction

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Chapter -2 Simple Random Sampling

Chapter 4 Multiple Random Variables

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

AN EULER-MC LAURIN FORMULA FOR INFINITE DIMENSIONAL SPACES

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

A tighter lower bound on the circuit size of the hardest Boolean functions

Median as a Weighted Arithmetic Mean of All Sample Observations

Qualifying Exam Statistical Theory Problem Solutions August 2005

L5 Polynomial / Spline Curves

02/15/04 INTERESTING FINITE AND INFINITE PRODUCTS FROM SIMPLE ALGEBRAIC IDENTITIES

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems

STATISTICAL INFERENCE

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

MATH 247/Winter Notes on the adjoint and on normal operators.

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

THE ROYAL STATISTICAL SOCIETY 2010 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2 STATISTICAL INFERENCE

To use adaptive cluster sampling we must first make some definitions of the sampling universe:

Transcription:

Bas Aalyss Etropy Estmato arxv:cod-mat/43192 v3 19 Jul 24 Thomas Schürma Research, Westdeutsche Geosseschafts-Zetralbak eg, Ludwg-Erhard-Allee 2, 4227 Düsseldorf, Germay Abstract: We cosder the problem of fte sample correctos for etropy estmato. ew estmates of the Shao etropy are proposed ad ther systematc error the bas s computed aalytcally. We fd that our results cover correcto formulas of curret etropy estmates recetly dscussed lterature. The trade-off betwee bas reducto ad the crease of the correspodg statstcal error s aalyzed. PACS: 89.7+c, 2.5.Fz, 5.45.Tp Statstcal fluctuatos of small samples duce both statstcal ad systematc devatos of etropy estmates. I the ave lkelhood estmator oe replaces the dscrete probabltes p, for = 1,..., M, the Shao etropy [1] H = Emal: thomas.schuerma@vr.wgz-bak.de p lp, 1 by maxmum lkelhood estmates ˆp. More precsely, we cosder samples of observatos, ad let be the frequecy of realzato the esemble. The, wth the choce ˆp =, the ave estmate Ĥ = ˆp l ˆp, 2 leads to a systematc uderestmato of the etropy H. There s a seres of publcatos tryg to mprove the estmato error successvely wth sutable terms of correctos. Oe approach s to apply a Taylor expaso aroud the probablty p to the l-fucto August 2, 24 2 [2, 3, 4]. A detaled computato of the expectato value of Ĥ wth respect to the multomal dstrbuto M ρ 1,..., M ; p 1,..., p M, =! p, 3 up to the secod order was gve by Harrs [3] ad gves E[Ĥ] = H M 1 2 + 1 1 12 2 1 +O 3. p 4 The O1/ correcto term was frst obtaed by Mller [2]. The term of order 1/ 2 volves the ukow probabltes p, ad ca ot be geerally estmated relably. I partcular, t would ot be suffcet to replace them by ˆp ths term. I order to exted the estmato beyod correctos of order 1/, Pask [5] apples Berste approxmatg polyomals, whch are defed as a lear combato of bomal polyomals. It ca be show, usg results from approxmato theory, that there exst expaso coeffcets such that the maxmum over all p systematc devatos are of the order 1/ 2. Ths s better tha the order 1/ rate offered by the correcto terms metoed above. Ufortuately, the good approxmato propertes of ths estmator are a result of a delcate balacg of large, oscllatg coeffcets, ad the varace of the correspodg estmator turs out to be very large [5]. Thus, to fd a good estmator, oe has to mmze bouds o bas ad varace smultaeously. The result s a regularzed least-squares problem, whose closed-form solutos well kow. 1

However, oe ca oly hope that the soluto of the regularzed problem mples a good polyomal approxmato of the etropy fucto. The latter also depeds o whether the expermeter s more terested reducg bas tha varace, or vce versa. A alteratve approach, where oly observables appear the correcto term, was proposed by Grassberger [6]. There t was assumed that all p 1, so that each s a radom varable whch should follow a Posso dstrbuto. To start wth, we cosder Rey etropes of order q Hq = 1 1 q l M p q. 5 The Shao case results from takg the lmt q 1,.e. H = lm q 1 Hq. For the estmato of Hq t seems obvous frst to ask for a ubased estmator of ay term p q of the sum 5. I the case of teger values of q 1 the stuatos trval because the uque ubased estmator p q s 1 p q = 1 q! q! q, 6 wth p q := for < q. However, to acheve q 1, t s ecessary to look frst for a geeralzato for arbtrary q. As show [6], the aalytcal cotuato of the estmator s o-trval sce a ave replacemet of the factorals 6 by Γ-fuctos s based. Ideed, ubased estmators of p q do ot exst for o-teger values of q. evertheless, [6] a terestg estmator of p q was proposed whch s at least asymptotcally ubased for large, ad s also a good approxmato the case of small samples. The correspodg estmator of the Shao etropy s 2 [6] Ĥ ψ = l ψ 1. 7 + 1 For the terestg case of small probabltes p 1 the estmate 7 s less based tha the estmator obtaed by the Mller correcto. A further mprovemet, related to the latter approach, whch s also based o the assumpto of Posso dstrbuted frequeces, was recetly proposed by 1 For smplfcato the dex wll be omtted. 2 The summatos defed for all >. The dgamma fucto ψ s the logarthmc dervatve of the Γ-fucto, see e.g. [8] Grassberger [7]. The correspodg etropy estmator of the Shao etropy s Ĥ G = ψ ψ 1 1 t 1 1 + t dt. 8 The correcto term of the earler estmator Ĥψ, s recovered by a seres expaso of the tegrad 8 up to the secod order. The hgher order terms of the tegrad lead to successve bas reductos compared to 7. At ths pot, oe mght ask whether further mprovemets bas reducto are possble. Moreover, t s of specal terest to cosder the trade-off betwee bas reducto ad the crease of the correspodg statstcal error. I the followg theorem, we propose a famly of ew etropy estmators ad determe ther systematcal error aalytcally. We wll preset a detaled aalyss of the bas ad show that the etropy estmators above are specfc examples of our geeral results. I vew of the followg computatos we ote that the Shao etropy s a sum of terms, hp = p lp, whch exclusvely deped o the class, for = 1,..., M. Therefore, whe we cosder expectato values wth respect to, the computatos ca be carred out by replacg the jot dstrbuto 3 by the bomal dstrbuto P ; p, = p 1 p, 9 for ad E[ ] = p. ow let us cosder the followg Theorem: Let > be a real umber ad ĥ, = 1/ 1 ψ ψ 1 t 1 1 + t dt, 1 be a parametrc famly of estmators of the fucto hp = p lp. For the partcular case =, let ĥ, =. The, we have the detty ad E[ĥ, ] = p lp + b, p, 11 b, p = p 1 p/ t 1 s the bas of the estmator ĥ,. dt 12 2

From the theorem we drectly obta a estmator of the Shao etropy by summato of 1,.e. Ĥ S = M ĥ,. Usg a smlar otato as [7] we receve the followg expresso wth Ĥ S = ψ 1 S 13 S = ψ + 1 1/ 1 t 1 dt. 14 1 + t Proof: For real q we cosder the fte Taylor seres approxmato of p q aroud >,.e. wth T p = = q = q q p, 15 q + 1. 16 We expad the brackets o the rght had sde of 15 ad rearrage the terms order to obta the followg double summato T p = 1 p q = = q 1. 17 For smplfcato we troduce the substtuto F, q, k, = 1 k q k =k q 1. 18 k The, by further algebrac mapulatos we obta the detty T p 1 p = k Θ F, q, k, 19 k = k= wth Θ = p/1 p. The rhs of the latter expresso s a polyomal Θ whose + 1 coeffcets are all depedet of the probablty p. O the other had, there s a ubased estmator, say ˆδ q,, of the expaso T p, sce the expectato value of ˆδ q, ca also be expressed by a polyomal of fte order Θ,.e. E[ˆδ q, ] 1 p = = ˆδ q, Θ. 2 To obta the explct expresso of ˆδ q,, we cosder the ecessary/suffcet codto for ubasedess E[ˆδ q, ] = T p. 21 After sertg 19 ad 2 to to the codto 21 ad the comparg coeffcets, t follows ˆδ q, = 1 F, q, k, k= k k. 22 Ths ubased estmator of T p s uque because the detty 21 s satsfed for arbtrary Θ. ext we carry out the dervato of ˆδ q, wth respect to q, ad cosder the lmes q 1. For ths purpose we ote that the dervatve of the bomal coeffcet 16 s { d q 1 1 = 1 2 dq q=1 =, 1. 23 By drect computatot follows, that the egatve dervatve of the estmator ˆδ q, for q 1, s gve by the expresso dˆδ q lm q 1 dq = ψ ψ + 1 1/ 1 1 1 t 1 dt. 24 1 + t O the other had, whe applyg the same procedure to the Taylor seres expaso T p, we fd lm q 1 dt dq = p lp + p 1 p/ t 1 1 p dt. 25 Equatg both by usg 21 ad applyg the trval detty E[1 1/ ] 1 p/, by usg the otato of the theorem, we obta the result E[ĥ, ] = p lp + b, p. 26 Thus, the clam 11 has bee prove. Fally, we cosder the resdual term, R +1, of the Taylor seres expaso T p. By defto, the detty p q = T p + R +1 p s vald. Usg the latter ad applyg the ordary tegral represetato of 3

bas.5.5.1.15.2.25.3 b,p =1/exp.5 = 1/2 Grassberger 23 = p turg pot = p zero bas = p/2 statstcal error.3.25.2 σ,p = 1/exp.5 = 1/2 Grassberger 23 = p turg pot of b,p Mmum of σ,p.35.4.45.5.1.2.3.4.5.6.7.8.9 1.15.5 1 1.5 2 2.5 3 3.5 4 Fgure 1: Systematc error of ĥ, for samples of = 6 observatos ad p =.7. Several specal cases of are show. The case = e 1 2 slghtly mproves the estmator Ĥψ see the dot wth the crcle. Fgure 2: umercal computato of the statstcal error σ, p for samples of = 6 observatos ad p =.7. The mmum of σ, p s obtaed by the soluto of E[ĥ ĥ ] = h b. R +1, the we fd the followg relato betwee the bas ad the frst dervatve of the resdual term 1 p. 27 lm q 1 dr +1 dq = b, p + Every pot o the cotuous le Fg.1 s the bas of the correspodg estmator ĥ,. It s ubased for = p, ad there s a turg pot for = p. The estmator s asymptotcally ubased,.e. b, p for, f p/2. O the other had, Fg.2 we see the mea square error statstcal error σ 2, p = E[ĥ, hp2 ]. The trade-off betwee bas ad the statstcal error of the estmator s show Fg.3. Typcally, oe s more terested the error of the etre sum over the states Eq.1. If there are M terms, ad f each s roughly of the same order of magtude, the the total bas ad the total varace are both M, thus the statstcal devatocreases oly as M 1/2. Therefore, the more terms oe has the larger M, the more oe s terested usg small values of p/2, f oe wats the total statstcal ad the total systematc devatos to have the same sze. Thus, the terestg estmators le betwee both extremes,.e. the mmum statstcal error, ad case = p wth vashg bas. The followg partcular cases are especally terestg to focus o: = 1: I ths case we obta the trval estmator for hp ĥ1, = ψ ψ, 28 ad ĥ1, = for =. By the detty 11 we receve the followg expectato value E[ĥ1, ] = p lp p 1 p t 1 dt. 29 The latter expresso has bee recetly metoed [7] ctato [14] t. I the asymptotc regme 1 t leads to the Mller correcto 3,.e. ĥ1, = l + 1 2 + O1/2. = e 2: 1 The Grassberger estmator Ĥψ s a specal case, sce t s ot exactly covered by our theorem. However, t ca be very well approxmated, f the Taylor expasos chose aroud the partcular value of = e 2. 1 By umercal aalyss we verfed that the correspodg estmator ĥe 1 2, s less based tha the estmator 7, for ay > 1 ad arbtrary p. I Fg.1, we see that there s almost o dfferece betwee both estmators. However, by umercal verfcato, slght mprovemets become vsble for larger probabltes e.g. p >.8. I the 3 Ths s because the asymptotc regme we have the relato ψx lx 1/2x. 4

bas.2.4.6.8.1.12 b,p vs. σ,p = 1/exp.5 = 1/2 Grassberger 23 = p turg pot of b,p Mmum σ,p = p zero bas 1 1 1 1 1 statstcal error Fgure 3: Trade-off betwee bas ad statstcal error for samples of = 6 ad p =.7. The reducto of the bas for < p s related to a strogly creasg statstcal error. O the other had, the bas correspodg to the mmum statstcal error s larger tha all the above metoed estmators. case of a sgle observato,.e. = 1, there s o dfferece betwee the two, for ay p. = 1 2 : Ths case s detcal to the Grassberger estmator 8, see [7]. As show Fg.1, t s less based tha the Mller estmator ad the estmator ĥe 2,. 1 But the statstcal error of ĥ1 2, s slghtly bgger as we ca see Fg.2. I the left half of the ut terval,.e. < 1 2, we obta further reducto of the bas. But ow oe has to be attetve sce we have the lmes ĥ, for. Although s always fte practce, ths behavor s adcato that the statstcal error of all estmators wth 1 could crease very fast. The dramatc crease of σ, p for s show Fg.3. Therefore, the partcular choce = 1 2 seems to be very sutable for estmato because t has the smallest b, p wth ĥ hp for ad ay p, 1]. O the other had, the most coservatve case s gve by the mmum varace estmator see Fg.3. I ths case the value of the statstcal error ad the absolute value of the bas are comparable. A compromse betwee both extremes mght be the estmator for = e 1 2.6. Ths case s less based tha the mmum varace estmator, ad less rsky tha the Grassberger estmator ĤG. estmators ĥ, should be geerally preferred. A good choce of the parameter always depeds o the specal applcato uder cosderato ad the dvdual preferece of the scetst. Refereces [1] C. E. Shao ad W. Weaver, The Mathematcal Theory of Commucato, Uversty of Illos Press, Urbaa, IL 1949. [2] G. Mller, ote o the bas of formato estmates. I H. Quastler, ed., Iformato theory psychology II-B, pp 95-1 Free Press, Glecoe, IL 1955. [3] B. Harrs, Colloqua Math. Soc. Jaos Bolya, p. 323 1975. [4] H. Herzel, Sys. Aal. Mod. Sm. 5, 435 1988. [5] L. Pask, eural Computato 15, 1191 23. [6] P. Grassberger, Phys. Lett. A 128, 369 1988. [7] P. Grassberger, www.arxv.org, physcs/37138 23. [8] M. Abramowtz ad I. Stegu, eds., Hadbook of Mathematcal Fuctos Dover, ew York 1965. To sum up, the above aalyss, we see that t s ot possble to decde whch of the may 5