The Measure of Information Uniqueness of the Logarithmic Uncertainty Measure

Similar documents
Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

3F1 Information Theory, Lecture 1

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Lecture 27: Entropy and Information Prof. WAN, Xin

Information Theory and Coding Techniques

Information in Biology

Information in Biology

INTRODUCTION TO INFORMATION THEORY

Lecture 1: Shannon s Theorem

What is Entropy? Jeff Gill, 1 Entropy in Information Theory

A Mathematical Theory of Communication

A Simple Introduction to Information, Channel Capacity and Entropy

ECE INFORMATION THEORY. Fall 2011 Prof. Thinh Nguyen

Information Theory in Intelligent Decision Making

Information Theory, Statistics, and Decision Trees

Introduction to Information Entropy Adapted from Papoulis (1991)

An introduction to basic information theory. Hampus Wessman

Distance between physical theories based on information theory

Entropies & Information Theory

William Stallings Copyright 2010

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Entropy as a measure of surprise

Experimental Design to Maximize Information

Lecture 18: Quantum Information Theory and Holevo s Bound

Historical Perspectives John von Neumann ( )

Shannon's Theory of Communication

Thermodynamics Second Law Entropy

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

log 2 N I m m log 2 N + 1 m.

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Chapter 2 Review of Classical Information Theory

Lecture 11: Quantum Information III - Source Coding

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

Sequences and Information

UNIT I INFORMATION THEORY. I k log 2

Information & Correlation

Entropy in Classical and Quantum Information Theory

Information Theory - Entropy. Figure 3

An Entropy Bound for Random Number Generation

Fundamentals of Information Theory Lecture 1. Introduction. Prof. CHEN Jie Lab. 201, School of EIE BeiHang University

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

F R A N C E S C O B U S C E M I ( N A G OYA U N I V E R S I T Y ) C O L L O Q U I U D E P T. A P P L I E D M AT H E M AT I C S H A N YA N G U N I

Chapter 2: Source coding

Computing and Communications 2. Information Theory -Entropy

Music 170: Quantifying Sound

ISSN Article. Tsallis Entropy, Escort Probability and the Incomplete Information Theory

CHAPTER 4 ENTROPY AND INFORMATION

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering

Lecture 5 - Information theory

Quantum Fisher Information. Shunlong Luo Beijing, Aug , 2006

6.02 Fall 2011 Lecture #9

Contradiction Measures and Specificity Degrees of Basic Belief Assignments

The Logic of Partitions

Cryptographic Engineering

arxiv:math-ph/ v1 30 May 2005

Review of Probability, Expected Utility

Introduction to Information Theory. Part 2

Random Bit Generation

Upper Bounds on the Capacity of Binary Intermittent Communication

Lecture 27: Entropy and Information Prof. WAN, Xin

Discrete Memoryless Channels with Memoryless Output Sequences

On the redundancy of optimum fixed-to-variable length codes

An Introduction to Information Theory: Notes

Assessing Risks. Tom Carter. tom/sfi-csss. Complex Systems

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

FRAMES IN QUANTUM AND CLASSICAL INFORMATION THEORY

Entropy in Communication and Chemical Systems

6.02 Fall 2012 Lecture #1

Chaos, Complexity, and Inference (36-462)

Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802

Towards a Theory of Information Flow in the Finitary Process Soup

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

On Some Measure of Conditional Uncertainty in Past Life

Gambling and Information Theory

Basic information theory

Chapter I: Fundamental Information Theory

Great Expectations. Part I: On the Customizability of Generalized Expected Utility*

Notes on induction proofs and recursive definitions

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Quantum Information Theory and Cryptography

An Introduction to Quantum Computation and Quantum Information

1.6. Information Theory

Intro to Information Theory

Password Cracking: The Effect of Bias on the Average Guesswork of Hash Functions

Real-Orthogonal Projections as Quantum Pseudo-Logic

Stickiness. September 18, 2017

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Short proofs of the Quantum Substate Theorem

Roll No. :... Invigilator's Signature :.. CS/B.TECH(ECE)/SEM-7/EC-703/ CODING & INFORMATION THEORY. Time Allotted : 3 Hours Full Marks : 70

Shannon Entropy: Axiomatic Characterization and Application

Introduction to Induction (LAMC, 10/14/07)

Objectives: Review open, closed, and mixed intervals, and begin discussion of graphing points in the xyplane. Interval notation

Name: Final Exam EconS 527 (December 12 th, 2016)

arxiv: v2 [cs.it] 20 Feb 2018

Entanglement Measures and Monotones Pt. 2

Communication Theory II

Transcription:

The Measure of Information Uniqueness of the Logarithmic Uncertainty Measure Almudena Colacito Information Theory Fall 2014 December 17th, 2014 The Measure of Information 1 / 15

The Measurement of Information [R.V.L. Hartley, 1928] A quantitative measure of information is developed which is based on physical as contrasted with psychological considerations. The Measure of Information 2 / 15

How much choice is involved? Shannon s set of axioms; Proof: we are talking about the entropy indeed; Other sets of axioms: comparisons and consequences; Logarithm: why? The Measure of Information 3 / 15

How much choice is involved? Shannon s set of axioms; Proof: we are talking about the entropy indeed; Other sets of axioms: comparisons and consequences; Logarithm: why? The Measure of Information 3 / 15

How much choice is involved? Shannon s set of axioms; Proof: we are talking about the entropy indeed; Other sets of axioms: comparisons and consequences; Logarithm: why? The Measure of Information 3 / 15

How much choice is involved? Shannon s set of axioms; Proof: we are talking about the entropy indeed; Other sets of axioms: comparisons and consequences; Logarithm: why? The Measure of Information 3 / 15

How much choice is involved? Shannon s set of axioms; Proof: we are talking about the entropy indeed; Other sets of axioms: comparisons and consequences; Logarithm: why? The Measure of Information 3 / 15

Shannon s axioms The Measure of Information 4 / 15

Shannon s axioms Suppose we have a set of possible events whose probabilities of occurrence are p 1, p 2,..., p n : 1 H is continuous in pi, for any i; 2 If pi = 1 n, for any i, then H is a monotonic increasing function of n; 3 If a choice be broken down into two successive choices, the original H is the weighted sum of the individual values of H. The Measure of Information 5 / 15

Shannon s axioms Suppose we have a set of possible events whose probabilities of occurrence are p 1, p 2,..., p n : 1 H is continuous in pi, for any i; 2 If pi = 1 n, for any i, then H is a monotonic increasing function of n; 3 If a choice be broken down into two successive choices, the original H is the weighted sum of the individual values of H. The Measure of Information 5 / 15

Shannon s axioms Suppose we have a set of possible events whose probabilities of occurrence are p 1, p 2,..., p n : 1 H is continuous in pi, for any i; 2 If pi = 1 n, for any i, then H is a monotonic increasing function of n; 3 If a choice be broken down into two successive choices, the original H is the weighted sum of the individual values of H. The Measure of Information 5 / 15

Shannon s axioms Suppose we have a set of possible events whose probabilities of occurrence are p 1, p 2,..., p n : 1 H is continuous in pi, for any i; 2 If pi = 1 n, for any i, then H is a monotonic increasing function of n; 3 If a choice be broken down into two successive choices, the original H is the weighted sum of the individual values of H. The Measure of Information 5 / 15

Uniqueness of Uncertainty Measure Theorem There exists a unique H satisfying the three above assumptions. In particular, H is of the form: H = K n p i log(p i ). i=1 Proof: Consider A(n) := H( 1 n,..., 1 n ). The Measure of Information 6 / 15

Uniqueness of Uncertainty Measure Theorem There exists a unique H satisfying the three above assumptions. In particular, H is of the form: H = K n p i log(p i ). i=1 Proof: Consider A(n) := H( 1 n,..., 1 n ). The Measure of Information 6 / 15

Alternative set of axioms [CT] The Measure of Information 7 / 15

Alternative set of axioms [CT] Let H m (p 1, p 2,..., p m ) be a sequence of symmetric functions, then it satisfies the following properties: 1 Normalization: H 2 ( 1 2, 1 2 ) = 1; 2 Continuity: H(p, 1 p) is a continuous function in p; 3 Grouping: H m (p 1, p 2,..., p m ) = = H m 1 (p 1 + p 2, p 3,..., p m ) + (p 1 + p 2 )H 2 ( p 1 p 1 +p 2, p 2 p 1 +p 2 ). The Measure of Information 8 / 15

Alternative set of axioms [CT] Let H m (p 1, p 2,..., p m ) be a sequence of symmetric functions, then it satisfies the following properties: 1 Normalization: H 2 ( 1 2, 1 2 ) = 1; 2 Continuity: H(p, 1 p) is a continuous function in p; 3 Grouping: H m (p 1, p 2,..., p m ) = = H m 1 (p 1 + p 2, p 3,..., p m ) + (p 1 + p 2 )H 2 ( p 1 p 1 +p 2, p 2 p 1 +p 2 ). The Measure of Information 8 / 15

Alternative set of axioms [CT] Let H m (p 1, p 2,..., p m ) be a sequence of symmetric functions, then it satisfies the following properties: 1 Normalization: H 2 ( 1 2, 1 2 ) = 1; 2 Continuity: H(p, 1 p) is a continuous function in p; 3 Grouping: H m (p 1, p 2,..., p m ) = = H m 1 (p 1 + p 2, p 3,..., p m ) + (p 1 + p 2 )H 2 ( p 1 p 1 +p 2, p 2 p 1 +p 2 ). The Measure of Information 8 / 15

Alternative set of axioms [CT] Let H m (p 1, p 2,..., p m ) be a sequence of symmetric functions, then it satisfies the following properties: 1 Normalization: H 2 ( 1 2, 1 2 ) = 1; 2 Continuity: H(p, 1 p) is a continuous function in p; 3 Grouping: H m (p 1, p 2,..., p m ) = = H m 1 (p 1 + p 2, p 3,..., p m ) + (p 1 + p 2 )H 2 ( p 1 p 1 +p 2, p 2 p 1 +p 2 ). The Measure of Information 8 / 15

Alternative set of axioms [Carter] Let I (p) be an information measure and let p indicate a probability measure. 1 I (p) 0 (non-negative); 2 I (1) = 0, (we don t get any information from an event with probability 0); 3 let p 1 and p 2 be the probabilities of two independent events. Then, I (p 1 p 2 ) = I (p 1 ) + I (p 2 ) (!); 4 I is a continuous and monotonic function of the probability (slight changes in probability-slight changes in information). The Measure of Information 9 / 15

Alternative set of axioms [Carter] Let I (p) be an information measure and let p indicate a probability measure. 1 I (p) 0 (non-negative); 2 I (1) = 0, (we don t get any information from an event with probability 0); 3 let p 1 and p 2 be the probabilities of two independent events. Then, I (p 1 p 2 ) = I (p 1 ) + I (p 2 ) (!); 4 I is a continuous and monotonic function of the probability (slight changes in probability-slight changes in information). The Measure of Information 9 / 15

Alternative set of axioms [Carter] Let I (p) be an information measure and let p indicate a probability measure. 1 I (p) 0 (non-negative); 2 I (1) = 0, (we don t get any information from an event with probability 0); 3 let p 1 and p 2 be the probabilities of two independent events. Then, I (p 1 p 2 ) = I (p 1 ) + I (p 2 ) (!); 4 I is a continuous and monotonic function of the probability (slight changes in probability-slight changes in information). The Measure of Information 9 / 15

Alternative set of axioms [Carter] Let I (p) be an information measure and let p indicate a probability measure. 1 I (p) 0 (non-negative); 2 I (1) = 0, (we don t get any information from an event with probability 0); 3 let p 1 and p 2 be the probabilities of two independent events. Then, I (p 1 p 2 ) = I (p 1 ) + I (p 2 ) (!); 4 I is a continuous and monotonic function of the probability (slight changes in probability-slight changes in information). The Measure of Information 9 / 15

Alternative set of axioms [Carter] Let I (p) be an information measure and let p indicate a probability measure. 1 I (p) 0 (non-negative); 2 I (1) = 0, (we don t get any information from an event with probability 0); 3 let p 1 and p 2 be the probabilities of two independent events. Then, I (p 1 p 2 ) = I (p 1 ) + I (p 2 ) (!); 4 I is a continuous and monotonic function of the probability (slight changes in probability-slight changes in information). The Measure of Information 9 / 15

Comparisons between axiomatisations I (p 2 ) = I (p p) = I (p) + I (p) = 2 I (p) by axiom (3); by induction on n, we get: I (p n ) = I (p p) = n I (p); I (p) = I ((p 1 m ) m ) = m I (p 1 m ), then: I (p 1 m ) = 1 m I (p); by continuity, for any 0 < p 1 and 0 < a: I (p a ) = a I (p). We get, again, I (p) = log( 1 p ) as measure of information. The Measure of Information 10 / 15

Comparisons between axiomatisations I (p 2 ) = I (p p) = I (p) + I (p) = 2 I (p) by axiom (3); by induction on n, we get: I (p n ) = I (p p) = n I (p); I (p) = I ((p 1 m ) m ) = m I (p 1 m ), then: I (p 1 m ) = 1 m I (p); by continuity, for any 0 < p 1 and 0 < a: I (p a ) = a I (p). We get, again, I (p) = log( 1 p ) as measure of information. The Measure of Information 10 / 15

Comparisons between axiomatisations I (p 2 ) = I (p p) = I (p) + I (p) = 2 I (p) by axiom (3); by induction on n, we get: I (p n ) = I (p p) = n I (p); I (p) = I ((p 1 m ) m ) = m I (p 1 m ), then: I (p 1 m ) = 1 m I (p); by continuity, for any 0 < p 1 and 0 < a: I (p a ) = a I (p). We get, again, I (p) = log( 1 p ) as measure of information. The Measure of Information 10 / 15

Comparisons between axiomatisations I (p 2 ) = I (p p) = I (p) + I (p) = 2 I (p) by axiom (3); by induction on n, we get: I (p n ) = I (p p) = n I (p); I (p) = I ((p 1 m ) m ) = m I (p 1 m ), then: I (p 1 m ) = 1 m I (p); by continuity, for any 0 < p 1 and 0 < a: I (p a ) = a I (p). We get, again, I (p) = log( 1 p ) as measure of information. The Measure of Information 10 / 15

Comparisons between axiomatisations I (p 2 ) = I (p p) = I (p) + I (p) = 2 I (p) by axiom (3); by induction on n, we get: I (p n ) = I (p p) = n I (p); I (p) = I ((p 1 m ) m ) = m I (p 1 m ), then: I (p 1 m ) = 1 m I (p); by continuity, for any 0 < p 1 and 0 < a: I (p a ) = a I (p). We get, again, I (p) = log( 1 p ) as measure of information. The Measure of Information 10 / 15

Comparisons between axiomatisations I (p 2 ) = I (p p) = I (p) + I (p) = 2 I (p) by axiom (3); by induction on n, we get: I (p n ) = I (p p) = n I (p); I (p) = I ((p 1 m ) m ) = m I (p 1 m ), then: I (p 1 m ) = 1 m I (p); by continuity, for any 0 < p 1 and 0 < a: I (p a ) = a I (p). We get, again, I (p) = log( 1 p ) as measure of information. The Measure of Information 10 / 15

Logarithm: why? The most natural choice is the logarithmic function. (Shannon, 1948) It is practically more useful; It is nearer to our intuitive feelings; It is mathematically more suitable. The Measure of Information 11 / 15

Logarithm: why? The most natural choice is the logarithmic function. (Shannon, 1948) It is practically more useful; It is nearer to our intuitive feelings; It is mathematically more suitable. The Measure of Information 11 / 15

Logarithm: why? The most natural choice is the logarithmic function. (Shannon, 1948) It is practically more useful; It is nearer to our intuitive feelings; It is mathematically more suitable. The Measure of Information 11 / 15

Logarithm: why? The most natural choice is the logarithmic function. (Shannon, 1948) It is practically more useful; It is nearer to our intuitive feelings; It is mathematically more suitable. The Measure of Information 11 / 15

Here it is: the Entropy! [John von Neumann] You should call it entropy, for two reasons. In the first place, your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage. The Measure of Information 12 / 15

Claude Shannon invented a way to measure the amount of information in a message without defining the word information itself, nor even addressing the question of the meaning of the message. (Hans Christian von Baeyer) THANK YOU!

References 1 C. E. Shannon, A Mathematical Theory of Communication, The Bell System Technical Journal, Vol. 27, pp. 379 423, 623 656, July, October, 1948. 2 R. V. Hartley, Transmission of Information, 1928. 3 T. M. Cover, J. A. Thomas, Elements of Information Theory, second edition, Wiley-Interscience, New York, 2006. 4 T. Carter, An Introduction to Information Theory and Entropy, http://astarte.csustan.edu/ tom/sfi-csss, California State University Stanislaus. The Measure of Information 14 / 15

Baudot System The Measure of Information 15 / 15