Entropy Principle in Direct Derivation of Benford's Law

Similar documents
A Novel Approach to Probability

A first digit theorem for powerful integer powers

Why the Summation Test Results in a Benford, and not a Uniform Distribution, for Data that Conforms to a Log Normal Distribution. R C Hall, MSEE, BSEE

arxiv: v1 [math.pr] 21 Sep 2009

Boltzmann Distribution Law (adapted from Nash)

Physics 172H Modern Mechanics

MAT246H1S - Concepts In Abstract Mathematics. Solutions to Term Test 1 - February 1, 2018

Lecture 8. The Second Law of Thermodynamics; Energy Exchange

The Newcomb-Benford Law: Theory and Applications

CHAPTER 4 ENTROPY AND INFORMATION

Information in Biology

Universality. Terence Tao University of California Los Angeles Trinity Mathematical Society Lecture 24 January, 2011

Title of communication, titles not fitting in one line will break automatically

ε1 ε2 ε3 ε4 ε

Shunlong Luo. Academy of Mathematics and Systems Science Chinese Academy of Sciences

Section 11.1 Sequences

arxiv:cond-mat/ v1 10 Aug 2002

Information in Biology

Random arrang ement (disorder) Ordered arrangement (order)

The Game of Normal Numbers

Computer Science Foundation Exam

A statistical mechanical interpretation of algorithmic information theory

Characterization of Self-Polar Convex Functions

Stochastic equations for thermodynamics

ISSN Article. Tsallis Entropy, Escort Probability and the Incomplete Information Theory

Entropy and Free Energy in Biology

Lecture 8. The Second Law of Thermodynamics; Energy Exchange

Quiz 3 for Physics 176: Answers. Professor Greenside

Test Exchange Thermodynamics (C) Test Team Name: Team Number: Score: / 43. Made by Montgomery High School -

(# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble

BOLTZMANN-GIBBS ENTROPY: AXIOMATIC CHARACTERIZATION AND APPLICATION

7.1 Significance of question: are there laws in S.S.? (Why care?) Possible answers:

Further Analysis of the Link between Thermodynamics and Relativity

IV. Classical Statistical Mechanics

Benford s Law Strikes Back: No Simple Explanation in Sight for Mathematical Gem

A Triangular Array of the Counts of Natural Numbers with the Same Number of Prime Factors (Dimensions) Within 2 n Space

Concept: Thermodynamics

On invariance of specific mass increment in the case of non-equilibrium growth

Competition amongst scientists for publication status: Toward a model of scientific publication and citation distributions

Entropy and Free Energy in Biology

An example of a convex body without symmetric projections.

( ) ( )( k B ( ) ( ) ( ) ( ) ( ) ( k T B ) 2 = ε F. ( ) π 2. ( ) 1+ π 2. ( k T B ) 2 = 2 3 Nε 1+ π 2. ( ) = ε /( ε 0 ).

Complexity 6: AIT. Outline. Dusko Pavlovic. Kolmogorov. Solomonoff. Chaitin: The number of wisdom RHUL Spring Complexity 6: AIT.

Maximum Entropy - Lecture 8

On John type ellipsoids

A note on Gibbs paradox

The Central Limit Theorem: More of the Story

On the Amount of Information Resulting from Empirical and Theoretical Knowledge

Lagrange Principle and the Distribution of Wealth

Solid Thermodynamics (1)

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

Optimal compression of approximate Euclidean distances

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

COLLATZ CONJECTURE: IS IT FALSE?

6.730 Physics for Solid State Applications

arxiv: v1 [math.nt] 24 Aug 2015

arxiv:quant-ph/ v2 17 Jun 1996

First and Last Name: 2. Correct The Mistake Determine whether these equations are false, and if so write the correct answer.

arxiv: v2 [math.nt] 4 Jun 2016

Some New Information Inequalities Involving f-divergences

The goal of equilibrium statistical mechanics is to calculate the diagonal elements of ˆρ eq so we can evaluate average observables < A >= Tr{Â ˆρ eq

Information Thermodynamics on Causal Networks

Lecture Notes Set 3a: Probabilities, Microstates and Entropy

Lecture 13. Multiplicity and statistical definition of entropy

Math 164-1: Optimization Instructor: Alpár R. Mészáros

Statistical Mechanics

Heights of characters and defect groups

Pattern Structures 1

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department Statistical Physics I Spring Term 2013 Notes on the Microcanonical Ensemble

Nonarchimedean Cantor set and string

The Practice of Statistics Third Edition

Simple observations concerning black holes and probability. Abstract

arxiv:quant-ph/ v1 19 Oct 1995

You are permitted to use your own calculator where it has been stamped as approved by the University.

Solid State Physics LATTICE DEFECTS MISN LATTICE DEFECTS by K. Jesse

Week 2: Sequences and Series

BOLTZMANN ENTROPY: PROBABILITY AND INFORMATION

An Entropy depending only on the Probability (or the Density Matrix)

Statistical mechanics of biological processes

Received: 4 December 2005 / Accepted: 30 January 2006 / Published: 31 January 2006

Practice Questions for Math 131 Exam # 1

arxiv:cond-mat/ v1 17 Aug 1994

VARIANTS OF KORSELT S CRITERION. 1. Introduction Recall that a Carmichael number is a composite number n for which

arxiv:gr-qc/ v1 2 Mar 1999

Another Low-Technology Estimate in Convex Geometry

A CRITERION FOR POTENTIALLY GOOD REDUCTION IN NON-ARCHIMEDEAN DYNAMICS

Information-theoretic treatment of tripartite systems and quantum channels. Patrick Coles Carnegie Mellon University Pittsburgh, PA USA

The physical limits of communication or Why any sufficiently advanced technology is indistinguishable from noise. Abstract

Section 1: Sets and Interval Notation

Homework #2 Solutions Due: September 5, for all n N n 3 = n2 (n + 1) 2 4

arxiv: v1 [gr-qc] 17 Nov 2017

Algorithms CMSC The Method of Reverse Inequalities: Evaluation of Recurrent Inequalities

Thermalization of Random Motion in Weakly Confining Potentials

Lecture 04: Balls and Bins: Birthday Paradox. Birthday Paradox

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

DEMBSKI S SPECIFIED COMPLEXITY: A SIMPLE ERROR IN ARITHMETIC

PHY 5524: Statistical Mechanics, Spring February 11 th, 2013 Midterm Exam # 1

arxiv: v1 [math.nt] 2 Oct 2015

DISTRIBUTION OF THE FIBONACCI NUMBERS MOD 2. Eliot T. Jacobson Ohio University, Athens, OH (Submitted September 1990)

Transcription:

Entropy Principle in Direct Derivation of Benford's Law Oded Kafri Varicom Communications, Tel Aviv 6865 Israel oded@varicom.co.il The uneven distribution of digits in numerical data, known as Benford's law, was discovered in 88. Since then, this law has been shown to be correct in copious numerical data relating to economics, physics and even prime numbers. Although it attracts considerable attention, there is no a priori probabilistic criterion when a data set should or should not obey the law. Here a general criterion is suggested, namely that any file of digits in the Shannon limit (namely, having maximum entropy) has a Benford's law distribution of digits.

Bedford's law is an empirical uneven distribution of digits that is found in many random numerical data. Numerical data of natural sources that are expected to be random exhibit an uneven distribution of the first order digits that fits to the equation, ρ ( n ) = log0 ( + ), where n =,2,3,4,5,6,7,8, 9 () n Namely, digit appears as the first digit at probability ρ (), which is about 6.5 times higher than the probability ρ (9) of digit 9 (fig ). Fig. Benford's law predicts a decreasing frequency of first digits, from through 9. Eq.() was suggested by Newcomb in 88 from observations of the physical tear and wear of books containing logarithmic tables []. Benford further explored the phenomena in 938, empirically checked it for a wide range of numerical data [2], and unsuccessfully attempted to present a formal proof. Since then Benford's law was found also in prime numbers [3], physical constants, Fibonacci numbers and many more [3,4,5]. 2

Benford's law attracts a considerable attention [6]. Attempts for explanation are based on scale invariance [7] and base invariance [8,9,0] principles. However, there are no a priori well-defined probabilistic criteria when a data set should or should not obey the law [4]. Benford's distribution of digits is counterintuitive as one expects that a random numbers would result in uniformity of their digits distribution, namely, ρ ( n) = as in the 9 case of an unbiased lottery. This is the reason why Benford's law is used by income tax agencies of several nations and states for fraud detection of large companies and accounting businesses [4,,2]. Usually, when a fraud is done, the digits are invoked in equal probabilities and the distribution of digits does not follow Eq. (). In this paper Benford's law is derived according to a standard probabilistic argumentation. It is assumed that, counter to common intuition (that digits are the logical units that comprise numbers) that the logical units are the 's. For example, the digit 8 comprises of 8 units etc. This model can be easily viewed as a model of balls and boxes, namely: A) Digit n is equivalent to a "box" containing n none-interacting balls. B) N sequence of such "boxes" is equivalent to a number or a numerical file. C) All possible configurations of the boxes and balls, for a given number of balls, have equal probability. The last assumption is the definition of equilibrium and randomness in statistical physics. In information theory it means that the file is in the Shannon limit (a compressed file). A number is written as a combination of ordered digits assuming a given base B. When we have a number with N digits of base B, we can describe the number as a set of 3

N boxes, each contains a number of balls n, when n can be any integer from 0 to B. We designate the total number of balls in a number as P. An unbiased distribution of balls in boxes means an equal probability for any ball to be in any box. Hereafter, it is shown that this assumption is equivalent to assumption C and yields Benford law. The "intuitive" distribution in which each box has an equal probability to have any digit n (n balls) does not means an equal probability for any single ball to be in any box, but an equal probability for any group of n balls (the digit n) to be in any box. For example, for base B=4 there are four digits 0,,2,3. The highest value of a 3 digits number in this base is 3 3 3, which contains 9 balls. There is only one possible configuration to distribute the 9 balls in 3 boxes (because the limit of 3 balls per box). However, in the case of 3 balls in 3 boxes there are several possible configurations, namely: 3 0 0, 0 3 0, 0 0 3, 2 0, 2 0, 2 0, 0 2, 0 2, 0 2, and. We see that digit appears 9 times, digit 2 appears 6 times, and digit 3 appears 3 times. It is worth noting that the ratio of the digits 9:6:3, ρ ( ) = 0. 5, ρ ( 2) = 0.333, and ρ ( 3) = 0.66, is independent of N, which is the reason why 0 is not included in Benford's law. As we see in the example above, each box has the same probability of having, 2 or 3 balls as the other boxes, however, the probability of a box of having 3 balls is smaller than the probability of a box to have 2 balls and the probability of a box of having one ball is the highest. The reason for this is that in order for a box to have several balls it has to score a ball several times, since the probability of a box to score n balls is smaller as n increases, lower value digits have higher probability. The formal calculation of the distribution of balls in the boxes was done by writing all possible ten configurations (in general 4

( N + P )! P!( N )! configurations), and give each one of them an equal probability) and then counting the total number of each digit, regardless of its location. The distribution of P balls in N boxes in equilibrium is a classic thermodynamic problem. Equilibrium is defined in statistical mechanics as a statistical ensemble in which all the possible configurations have an equal probability. The equilibrium distribution function ρ (n) (the fraction of boxes having n balls) is calculated in a way that it yields maximum entropy, which means equal probability for all the configurations (microstates). The standard way to find the distribution function in equilibrium is to maximize entropy S (S is proportional to the logarithm of the number of configurations Ω ), under the constraint of a fixed number of balls P. To do this, we apply the Stirling approximation; since ( N + P )! Ω ( N, P) = we obtain, ( N )! P! S = ln Ω N{( + n)ln( + n) nln n} (2) Where n = P N. The number n is any integer (limited by B-). If we designate the number B of boxes having n balls by φ (n) then = P nφ( n). n= It should be noted that we count all the boxes in all the configurations excluding the empty boxes (the 0's) and the boxes that contain more balls then B-. To find φ(n) that maximizes S we will use the Lagrange multipliers method namely to define a function f ( n) = N{( + n)ln( + n) nln n} β ( P B n= nφ( n)). The first term is 5

Shannon entropy and the second term is the conservation of the balls. β is the Lagrange f ( n) multiplier. To find φ(n) we substitute = 0, hence n n + φ ( n) = ln( ). (3) β n We are interested in the normalized distribution, namely, ρ φ( n) (4) φ( n) ( n) = B n= B Since φ ( n) = ln B it follows that n= β namely, Benford's law. ln( + ) ρ ( n) = n = log B ( + ) (5) ln B n N In the normalization of φ (n) the quantity disappeared. That means that the β distribution function is independent in the digit location in P and N and it is only a function of B. That is the reason why Benford law is so general. Reconsidering the example of 3 balls in 3 boxes, we calculate from Eq,(5) that ρ ( ) = 0.5 ; ρ( 2) 0. 29 ρ ( 3) 0. 2. The total number of the none-zero digits (, 2 and 3) is 8, and the distribution points to the ratio 9:5:4 as compared to the result of 9:6:3 that was obtained in the numerical example. The deviation from the theoretical calculation is explained by the fact that Sterling approximation yields a better fit as the number of digits grows. 6

Benford's law distribution was shown recently to be a special case of Planck distribution of photons at a given frequency [3]. It is intriguing that digits distribution of prime numbers also obeys the Planck statistics. Acknowledgment: This paper was written due to the encouragement of Alex Ely Kossovsky who is a truly a Benford's law expert and enthusiast. I acknowledge his most valuable comments. 7

References:. S. Newcomb, "Note on the frequency of use of the different digits in natural numbers," Amer. J. Math., 4 (88) 39-40. 2. F. Benford, "The law of anomalous numbers," Proc. Amer. Math. Soc., 78 (938) 55-572. 3. D. Cohen and K. Talbot, "Prime numbers and the first digit phenomenon," J. Number Theory, 8 (984) 26-268. 4.L. Zyga, " Numbers follow a surprising law of digits, and scientists can't explain why" www.physorg.com (2007). 5. Torres, et.al. "How do numbers begin?(the first digit law)." Eur. J Phys.28(2007) L- 7-25. 6. New York Times Aug. 4 (998) ; Monday, November 24, (2008) 7. D. Cohen "An explanation of first digit phenomenon" J Combin. Theory,Ser.A 20 (976) 367-370. 8. T. Hill, "Base-invariance implies Benford's law," Proc. Amer. Math. Soc., 23:3 (995) 887-895. 9. T. Hill, "A statistical derivation of the significant-digit law" Statistical Science, 0:4 (996) 354-363. 0. A. E. Kossovsky, "Towards a better understanding of the leading digits phenomena" http://arxiv.org/abs/math/062627. M. Nigrini, "The detection of income evasion through an analysis of digital distributions," Ph.D. thesis, Dept. of Accounting, Univ. Cincinnati, Cincinnati OH, (992) 2. M. Nigrini, "A taxpayer compliance application of Benford's law," J. Amer. Taxation Assoc., 8 (996) 72-9. 3. O. Kafri "Sociological inequality and the second law" arxiv:0805.3206 8