CHAPTER 4 ENTROPY AND INFORMATION

Similar documents
Information in Biology

BOLTZMANN ENTROPY: PROBABILITY AND INFORMATION

arxiv: v1 [physics.data-an] 10 Sep 2007

What is Entropy? Jeff Gill, 1 Entropy in Information Theory

Information in Biology

3F1 Information Theory, Lecture 1

Part 1: Logic and Probability

CHAPTER 2 THE MICROSCOPIC PERSPECTIVE

An introduction to basic information theory. Hampus Wessman

CSE468 Information Conflict

It From Bit Or Bit From Us?

Uncertainity, Information, and Entropy

Basic information theory

Information Theory in Statistical Mechanics: Equilibrium and Beyond... Benjamin Good

ITCT Lecture IV.3: Markov Processes and Sources with Memory

Murray Gell-Mann, The Quark and the Jaguar, 1995

Chapter I: Fundamental Information Theory

Entropies & Information Theory

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Gibbs Paradox Solution

Bits. Chapter 1. Information can be learned through observation, experiment, or measurement.

Physics 172H Modern Mechanics

BOLTZMANN-GIBBS ENTROPY: AXIOMATIC CHARACTERIZATION AND APPLICATION

Thermodynamics of feedback controlled systems. Francisco J. Cao

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department Statistical Physics I Spring Term 2013 Notes on the Microcanonical Ensemble

The Temperature of a System as a Function of the Multiplicity and its Rate of Change

Quantum Thermodynamics

Title of communication, titles not fitting in one line will break automatically

Channel Coding: Zero-error case

Statistical Mechanics in a Nutshell

Computing and Communications 2. Information Theory -Entropy

X 1 : X Table 1: Y = X X 2

Lecture Notes. Advanced Discrete Structures COT S

A Simple Introduction to Information, Channel Capacity and Entropy

Chapter 2 Review of Classical Information Theory

Physics, Time and Determinism

Signal Processing - Lecture 7

CHAPTER 1 A OVERVIEW OF THERMODYNAMICS

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Chapter 9 Fundamental Limits in Information Theory

Introduction to Information Theory. Part 2

Entropy & Information

Information Theory in Intelligent Decision Making

UNIT I INFORMATION THEORY. I k log 2

Non-independence in Statistical Tests for Discrete Cross-species Data

to mere bit flips) may affect the transmission.

1. Introduction to commutative rings and fields

Real numbers, chaos, and the principle of a bounded density of information

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

IV. Classical Statistical Mechanics

Quantum Hadamard channels (II)

6.02 Fall 2012 Lecture #1

First and Last Name: 2. Correct The Mistake Determine whether these equations are false, and if so write the correct answer.

Noisy-Channel Coding

Statistical Mechanics

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Lecture 1: Shannon s Theorem

8 Lecture 8: Thermodynamics: Principles

Further Analysis of the Link between Thermodynamics and Relativity

Entropy Principle in Direct Derivation of Benford's Law

Pure Quantum States Are Fundamental, Mixtures (Composite States) Are Mathematical Constructions: An Argument Using Algorithmic Information Theory

Chapter 2 Ensemble Theory in Statistical Physics: Free Energy Potential

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Information Theory Primer With an Appendix on Logarithms

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

1 The fundamental equation of equilibrium statistical mechanics. 3 General overview on the method of ensembles 10

The (absence of a) relationship between thermodynamic and logical reversibility

INTRODUCTION TO INFORMATION THEORY

Introduction to Information Theory

Lecture 14 February 28

A Mathematical Theory of Communication

2m + U( q i), (IV.26) i=1

6.730 Physics for Solid State Applications

Stat Mech: Problems II

Massachusetts Institute of Technology. Final Exam Solutions

Statistical Physics. How to connect the microscopic properties -- lots of changes to the macroscopic properties -- not changing much.

The physical limits of communication or Why any sufficiently advanced technology is indistinguishable from noise. Abstract

Massachusetts Institute of Technology

Inaccessible Entropy and its Applications. 1 Review: Psedorandom Generators from One-Way Functions

Introducing Proof 1. hsn.uk.net. Contents

Removing the mystery of entropy and thermodynamics. Part 3

Shannon s noisy-channel theorem

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?

Shannon's Theory of Communication

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

Macroscopic time evolution and MaxEnt inference for closed systems with Hamiltonian dynamics. Abstract

Thermodynamics of violent relaxation

CS 256: Neural Computation Lecture Notes

Ultrasonic Testing and Entropy

REED-SOLOMON CODE SYMBOL AVOIDANCE

An Elementary Notion and Measurement of Entropy

Lecture 11 September 30, 2015

Class 22 - Second Law of Thermodynamics and Entropy

Consideration of Expression Method of the Entropy Concept

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

which possibility holds in an ensemble with a given a priori probability p k log 2

MACROSCOPIC VARIABLES, THERMAL EQUILIBRIUM. Contents AND BOLTZMANN ENTROPY. 1 Macroscopic Variables 3. 2 Local quantities and Hydrodynamics fields 4

FQXI 2014: Foundations of Probability and Information

Transcription:

4-1 CHAPTER 4 ENTROPY AND INFORMATION In the literature one finds the terms information theory and communication theory used interchangeably. As there seems to be no wellestablished convention for their use, here the work of Shannon, directed toward the transmission of messages, will be called communication theory and the later generalization of these ideas by Jaynes will be called information theory. 4.1 COMMUNICATION THEORY Communication theory deals with the transmission of messages through communication channels where such topics as channel capacity and fidelity of transmission in the presence of noise must be considered. The first comprehensive exposition of this theory was given by Shannon and Weaver 1 in 1949. For Shannon and Weaver the term information is narrowly defined and is devoid of subjective qualities such as meaning. In their words: Information is a measure of one's freedom of choice when one selects a message.... the amount of information is defined, in the simplest of cases, to be measured by the logarithm of the number of available choices. Shannon and Weaver considered a message N places in length and defined p i as the probability of the occurrence at any place of the i th symbol from the set of n possible symbols and specified that the measure of information per place, H, should have the following characteristics 1 C.E. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, 1949.

1. The function H should be a continuous function of the probabilities, p i 's. 2. If all the p i 's are equal, p i = 1/n, then H should be a monotonic increasing function of n. 3. If A and B are two different independent events, and if we consider their joint occurrence, AB, as a compound event, then the uncertainty in AB is 4-2 the sum of the separate uncertainties about A and B. That is, H(AB) = H(A) + H(B). Shannon and Weaver showed that only the following function satisfied these conditions H = - K p log i p (4-1) i Because K is an arbitrary constant, H is a relative measure and any convenient base can be chosen for the logarithm. If binary numbers are used n = 2 and p i = 1/2 and the use of two-based logarithms yields 1 1 1 H = - K [ log ( )+ 2 2 log2 2 2 ( 1 )] = K 2 Setting K equal to unity we obtain H = 1, one unit of information per place a bit. Alternatively, for a message in decimal numbers we have n = 10 and p i = 1/10 and on using ten-base logarithms find 1 H = -10 K [ 10 log 1 10 ] = K Again, setting K equal to unity will result in one unit of information per place. The term H has been called the information per symbol, uncertainty per symbol, or the entropy. The terms information and uncertainty are not as incompatible as it would first appear. In fact, they are complementary in that the amount of information received is equal to the uncertainty removed on receipt of the message. For a message of N places the total amount of information contained is N H, the product of a quantity and an intensive factor.

4-3 Consider a message in the form of a three-digit decimal number (e.g., 329). It has just been shown that with ten-based logarithms H = 1 and therefore the amount of information in the message, N H, is 3 ten-base units. Now,consider this same number to be expressed in binary notation. Here it will be necessary to use a message length of ten places (2 10 = 1024). At each place the probability is 1/2 and the use of ten-base logarithm gives 1 1 1 1 H = - K [ log + log ] = K log 2 2 2 2 2 Again, K = 1 and we obtain H = 0.30103 and N H = 10(0.30103) = 3.01 for a total of slightly over 3 ten-based units of information. If we had chosen to use two-based logarithms in Eq. (4-1), we would have obtained 9.966 and 10 twobased units 2 for the decimal and binary representations respectively. We would expect the amount of information in the message should be independent of the number system used and we see that this is essentially the case. The minor difference is due to a slight mismatch between the number systems: 1000 as compared to 1024. Because there are 1024 possibilities in the binary system and only 1000 in the decimal system,slightly more uncertainty is eliminated in specifying one member of a set of 1024 than one of a set of 1000. Equation (4-1) was selected so that for any set of n symbols the maximum value of H occurs when all p i are equal. This is the expected situation for a numerical message, but not for a message expressed in a language such as English. Because a language has structure, all symbols do not have the same frequency or probability of occurrence. Excluding punctuation, messages in English can be composed of the 26 characters of the alphabet and the space. If probabilities based on occurrence frequencies are used in Eq. (4-1), the value of H is 1.211 ten-based units. This is less than the value of 1.431 ten-base units computed from Eq. (4-1) using equal p i of 1/27. The uncertainty per symbol is less in a structured language because we could make an educated guess. 2 Note that log 2 M = log M/log 2

4-4 In considering the transfer of information between two persons A and B, communication theory represents the chain as A/a--b/B where a--b is the physical part, the communication channel with transmitter a and receiver b. Communication theory is concerned only with the physical aspect of the exchange and takes no cognizance of meaning. At the receiver b it may be appropriate to consider the message as only one possibility out of a computable number of alternatives, but this certainly is not the attitude of A in originating the message. Even in the simplest of cases, the quantity N H does not give an unambiguous measure of information exchanged between A and B. For example, consider the various possibilities facing a present-day Paul Revere and his compatriot. By prior agreement the message could be coded to "0" or "1" (add one to get "one if by land, two if by sea"). If binary digits were used, the message would carry 1.0 two-base unit or 0.301 ten-base units of information. If the agreement were to send either "LAND" or "SEA" the message would carry 4.844 or 3.633 ten-base units respectively 3 and the amount of information would appear to depend on which route the British took. Would more information be imparted if the British were coming by land? This problem might be avoided if the prearranged code were "L" or "S". In this case either alternative imparts the same information, 1.211 ten-base units: but this is seen to be different from the information imparted from the use of binary digits. If no code were prearranged, the message might read "BRITISH COME BY LAND". This message of twenty places would convey 24.22 ten-base units of information. Note that in all cases the same amount of information was exchanged between A and B while the measurement in terms of the transmission from a to b varies from 0.301 to 24.22 units. Communication is a process and in communication theory it is restricted to the physical process connecting a and b. Communication theory 3 We have seen that H = 1.211 ten-base units for messages in English and we obtain 4(1.211) = 4.844 and 3(1.211) = 3.633.

has been successfully applied and is a well-established tool of communication engineers. 4 4-5 On the other hand, information is a concept which is elusive and, as we have seen, is difficult, or perhaps impossible, to quantify. The similarity between Shannon's H and the entropy of statistical mechanics, S SM, can be seen by comparing Eq. (4-1) with Eq. (2-10) S SM = - k Pi ln Pi (2-10) The k in Eq. (2-10) is not arbitrary, but is Boltzmann's constant and P i is the probability that the system is in the i th quantum state. Note that Eq. (2-10) was derived from a realistic model within an accepted conceptual structure and can be expected to be a reasonable reflection of reality. On the other hand, we have seen that Eq. (4-1) results from probabilistic considerations arising from the useful, but limited, concept of information. There is no physically realistic connection between H and S SM and the designation of H as entropy can only lead to confusion. In quantum statistical mechanics it is possible to show that S SM reduces to a heat effect divided by absolute temperature in conformance with the thermodynamic definition of entropy. However, no similar reduction is possible for H, in fact, the association of a heat effect with H would make no sense. Further, in examining the ramifications of the communication theory of Shannon and Weaver, one receives the impression that there is no distinct advantage in identifying H with entropy. None of the usual applications are exploited; in fact, the identification of system and surroundings would seem out of place. The association of H with entropy is regrettable for it has resulted in much confusion especially in regard to entropy and information. The entropy as information interpretation has a long history dating back to Maxwell but the idea became more strongly established in 1929 with the work of Szilard. Here the concept of entropy of information was invoked by Szilard in 4 See J.R. Pierce, An Introduction to Information Theory, 2nd ed., Dover Publications, New York, 1980.

order to save the second law of thermodynamics from the onslaught of a Maxwell demon of his devising. Szilard's work will be discussed in chapter 5. 4-6 4.2 JAYNES' FORMALISM E.T.Jaynes has been the major force in generalizing certain aspects of Shannon's communication theory into what is sometimes known as information theory 5. The cornerstone of this formulation is Jaynes' Principle which provides a means of determining the least-biased probability distribution. This principle merely states that, subject to the constraint that the sum of the P i 's is unity and other known constraints, the least-biased probability distribution is obtained by maximizing Shannon's H. In applying Jaynes' formalism to quantum statistical mechanics, the entropy of statistical mechanics, S SM, is used in place of Shannon's H. The usual constraints and S SM = - k Pi ln Pi (2-10) = 1 Pi Pi Ei =U are written and the method of Lagrangian multipliers is used to find the set of P i that maximizes S SM subject to these constraints. Once the probability distribution is found, it is used in Eqs. (2-6) and (2-10) to obtain the internal energy and the entropy respectively. In this procedure the expression for the entropy is postulated and used to find probability distribution as opposed to the usual procedure of postulating the 5 E.T. Jaynes, Phys. Rev., 106, 620 (1957) and Phys. Rev., 108, 171 (1957).

4-7 probability distribution and then determining the entropy. This inverted approach of the Jaynes formalism has been claimed to be more general than the approach of quantum statistical mechanics because it relies on logic rather than physics 6. However, because the entropy in Jaynes' formalism is the entropy of physics, the claim is unconvincing. Jaynes' Principle appears to possess considerable generality for statistical inference. It has proven useful in many fields and can be viewed as a tool which provides the best information available from a particular situation. However, it is unfortunate that this success has prompted a few enthusiasts 7 to propose the elevation of Jaynes' formalism to the status of a fundamental axiom of science and to state that it subsumes statistical mechanics. Such claims, together with giving the name entropy to Shannon's H, have served to establish the putative association of entropy with information. This association often leads to a subjective interpretation of entropy. 4.3 ENTROPY AND INFORMATION The association of an increase in entropy with a decrease in information is an idea that dates back to the time of Maxwell 8 and the development of information theory (Jaynes' formalism) has seemed to reinforce this interpretation. According to this view, the condition Ω 2 > Ω 1,which leads by Eq. (2-11a) to an increase in entropy, represents an observer's loss of information about the microscopic state 6 O. Costa de Beauregard and M. Tribus, Helv. Phys. Acta, 47, 238, 1974 also reprinted in H.S. Leff and A.F. Rex, Maxwell's Demon: Entropy, Information, and Computing, Princeton University Press, Princeton,N.J.,1990. 7 See for example, M. Tribus, Thermostatics and Thermodynamics, D.Van Nostrand Co., Princeton, NJ, 1961. 8 S. Brush, The Kind of Motion We Call Heat, North Holland Publishing Co., Amsterdam, 1976.

4-8 of the system. Accordingly, one reasons that there are more possibilities in state 2 and therefore the increase in Ω implies more uncertainty or a loss of information. This view presents two difficulties. First, because Ω is not a virtual observable quantity, it is doubtful that an observer could have access to this type of information. The information associated with Ω concerns not the system, but our description of the system. Second, it is unreasonable to believe that S, a thermodynamic property change that depends on objectively determined macrostates, could also depend on microscopic information gained or lost by an observer. In an effort to blunt the subjectivity criticism, Jaynes 9 has suggested the following carefully worded definition of information The entropy of a thermodynamic system is a measure of the degree of ignorance of a person whose sole knowledge about its microstate consists of the values of the macroscopic quantities X i which define its thermodynamic state. This is a completely "objective" quantity, in the sense that it is a function only of the X i, and does not depend on anybody's personality. There is then no reason why it cannot be measured in the laboratory. Here, we will assume that the term knowledge is synonymous with information although in general usage they often have different connotations. Also, one wonders what type of knowledge would alleviate a person's ignorance of the microstate: virtual observable quantities such as positions and velocities or the identification of quantum states and their corresponding probabilities. If Jaynes were speaking in quantum terms, there is a double dose of subjectivity here. First, we have introduced quantities such as Ω which are 9 E.T. Jaynes, in The Maximum Entropy Formalism, R.D. Levine and M. Tribus, eds, M.I.T. Press, Cambridge, 1979.

4-9 mental constructs that relate to our description of the system rather than to the system itself. Second, we now say that the macroscopic behavior of the system, as reflected in the value of the entropy, is dependent on the extent of our knowledge of these model parameters. It would also be instructive to test the quantum interpretation through the use of Eq. (2-11a) that relates statistical entropy change to Ω 2 /Ω 1 providing informational value could be assigned to the knowledge of Ω, a model parameter. This information should depend not on the value of Ω but on the range of numbers from which it is to be chosen. Thus, if Ω 1 and Ω 2 lie within the range 1 to N MAX, they will have the same informational value regardless of their numerical values and there would be no change in our information concerning the system as it moves between macroscopic states. We reach the same conclusion by noting that the number of position coordinates and velocity components is always 6N regardless of the macroscopic state of the system a constant amount of microscopic knowledge. Thus, from the viewpoint of "information" there would be no "entropy" change and we see that the concept of entropy as a measure of microscopic information is inconsistent as well as extremely subjective.