Anatomy of a Bit. Reading for this lecture: CMR articles Yeung and Anatomy.

Similar documents
Example Chaotic Maps (that you can analyze)

Anatomy of a Bit: Information in a Time Series Measurement

Communications Theory and Engineering

Reconstruction Deconstruction:

ITCT Lecture IV.3: Markov Processes and Sources with Memory

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Entropies & Information Theory

Information Theoretic Properties of Quantum Automata

Lecture 6: Entropy Rate

Hierarchical Thermodynamics

Lecture 22: Final Review

Complex Systems Methods 3. Statistical complexity of temporal sequences

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Statistical Complexity of Simple 1D Spin Systems

Computing and Communications 2. Information Theory -Entropy

Informational and Causal Architecture of Discrete-Time Renewal Processes

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Lecture 5: Asymptotic Equipartition Property

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Predictive information in Gaussian processes with application to music analysis

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Noisy channel communication

Lecture 11: Continuous-valued signals and differential entropy

EE5585 Data Compression April 18, Lecture 23

ECE 4400:693 - Information Theory

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 11: Quantum Information III - Source Coding

The Organization of Intrinsic Computation: Complexity-Entropy Diagrams and the Diversity of Natural Information Processing

Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos. Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2

Lecture 8: Shannon s Noise Models

(Classical) Information Theory II: Source coding

Intrinsic Quantum Computation

Solutions to Set #2 Data Compression, Huffman code and AEP

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

Regularities Unseen, Randomness Observed: Levels of Entropy Convergence

Lecture 7: DecisionTrees

Regularities unseen, randomness observed: Levels of entropy convergence

3F1 Information Theory, Lecture 3

CSCI 2570 Introduction to Nanocomputing

Capacity of a channel Shannon s second theorem. Information Theory 1/33

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

Chapter 9 Fundamental Limits in Information Theory

1 Stat 605. Homework I. Due Feb. 1, 2011

How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos

LECTURE 3. Last time:

Introduction to information theory and coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Basic Principles of Video Coding

Lecture 7. Order Out of Chaos

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Massachusetts Institute of Technology

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Reconstruction. Reading for this lecture: Lecture Notes.

Lecture 1. Introduction

Revision of Lecture 4

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

3F1 Information Theory, Lecture 1

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Homework Set #2 Data Compression, Huffman code and AEP

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Information Theory, Statistics, and Decision Trees

Mechanisms of Chaos: Stable Instability

Entropy as a measure of surprise

Computational Systems Biology: Biology X

Lecture 5 Linear Quadratic Stochastic Control

Beyond the typical set: Fluctuations in Intrinsic Computation

Communication Theory II

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 1: Introduction, Entropy and ML estimation

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

LECTURE 13. Last time: Lecture outline

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Information Theory and Coding Techniques

From Determinism to Stochasticity

Information Dynamics Foundations and Applications

Principles of Communications

Modelling Non-linear and Non-stationary Time Series

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Lecture 5: GPs and Streaming regression

Upper Bounds on the Capacity of Binary Intermittent Communication

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Revision of Lecture 5

1 Probability and Random Variables

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

ELEC546 Review of Information Theory

CS 630 Basic Probability and Information Theory. Tim Campbell

Chaos, Complexity, and Inference (36-462)

Shannon s A Mathematical Theory of Communication

Massachusetts Institute of Technology. Solution to Problem 1: The Registrar s Worst Nightmare

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Lecture 2: August 31

Transcription:

Anatomy of a Bit Reading for this lecture: CMR articles Yeung and Anatomy. 1

Information in a single measurement: Alternative rationale for entropy and entropy rate. k measurement outcomes Stored in log 10 k decimal digits or log 2 k bits. Series of n measurements stored in n log 2 k bits. Number M of possible sequences of n measurements with n 1,n 2,...,n k counts: n M = n 1 n 2 n k n! = n 1!n 2! n k! 2

To specify/code which particular sequence occurred store: k log 2 n + log 2 M + log 2 n + Maximum number of bits to store the count ni of each of the k output values Number of bits to specify particular observed sequence within the class of sequences that have counts n1, n2,..., nk Bits to specify the number of bits in n itself bits Apply Stirlings approximation: n! = p 2 n(n/e) n kx To M: log 2 M n n i n log 2 n i n i=1 = nh[ n 1 n, n 2 n,..., n k n ] = nh[x] 3

Can compress if H[X] < log 2 k R 1 1-Symbol Redundancy R 1 = log 2 k R 1 H[X] log 2 k H[X] h µ (a) (b) (c) 4

Typical sources are not IID, therefore use: h µ = lim L!1 H(L) L R 1 R 1 Intrinsic Redundancy log 2 k R 1 = log 2 k h µ (Total Predictability) H[X] h µ Intrinsic Randomness (a) (b) (c) 5

In addition, measurements occur in the context of past and future. A semantically rich decomposition of a single measurement: µ q µ w µ H[X] h µ r µ r µ (a) (b) (c) (d) 6

A semantically rich decomposition of a single measurement... µ Anticipated from prior observations q µ w µ H[X] h µ Random component r µ r µ (a) (b) (c) (d) 7

A semantically rich decomposition of a single measurement... µ q µ Shared between the past and the current observation, but relevance stops there (stationary process only) Anticipated by the past, present currently, and plays a w µ role in future behavior. Can be negative! H[X] Relevant for predicting the future h µ r µ r µ Ephemeral: Exists only for the present (a) (b) (c) (d) 8

A semantically rich decomposition of a single measurement... µ q µ w µ Structural information H[X] h µ r µ r µ Ephemeral: exists only for the present (a) (b) (c) (d) 9

Settings: Before: Predict the present from the past, h µ = H[X 0 X ] X :0 X 0 X 1: X 3 X 2 X 1 X 0 X 1 X 2 X 3 10

Settings: Before: Predict the present from the past, h µ = H[X 0 X ] X :0 X 0 X 1: X 3 X 2 X 1 X 0 X 1 X 2 X 3 Today, prediction from the omniscient view: The present in the context of the past and the future. X :0 X 0 X 1: X 3 X 2 X 1 X 0 X 1 X 2 X 3 10

Notations: l-blocks: Past: Present: Future: X t:t+` = X t X t+1...x t+` 1 X :0 =...X 2 X 1 X 0 X 1: = X 1 X 2... Any information measure F : F(`) =F[X 0:`] 11

Recall block entropy: H(`) E + h µ` where: E = I[X :0 ; X 0: ] Recall total correlation: X T [X 0:N ]= H[X A ] H[X 0:N ] A2P (N) A =1 For stationary time series, get block total correlation: T (`) =`H[X 0 ] H(`) with T (0) = 0 T (1) = 0 Compares process s block entropy to the case of IID RVs. 12

Block entropy versus total correlation... Monotone increasing, with linear asymptote: T (`) E + µ` Convex up. H(`) T (`) Information [bits] E 0 E 0 1 2 3 4 5 6 Block length ` [symbols] 13

Block entropy versus total correlation... Note: H(`)+T (`) =`H(1) Plug in asymptotes gives first decomposition: H(1) = h µ + µ µ Anticipated from prior observations q µ w µ H[X] h µ Random component r µ r µ (a) (b) (c) (d) 14

Scaling of other block informations: Bound information Enigmatic B(`) =H(`) Q(`) =T (`) R(`) B(`) Local Exogenous W (`) =B(`)+T (`) Asymptotes: R(`) E R + `r µ B(`) E B + ` R(`) B(`) Q(`) W (`) Q(`) E Q + `q µ W (`) E W + `w µ E R E B E Q, E W 0 2 4 6 8 10 12 Block length ` [symbols] 15

Scaling of other block informations: Block entropy: H(`) =B(`)+R(`) =(E B + E R )+`( + r µ ) Thus, h µ = + r µ and E = E B + E R. Block total correlation: T (`) =B(`)+Q(`) =(E B + E Q )+`( + q µ ) Thus, µ = + q µ and E = E B E Q. µ Shared between the past and the current observation, but relevance stops there Gives second decomposition of H[X]. H[X] q µ Anticipated w µ by the past, present currently, and plays a role in future behavior. Can be negative! Relevant for predicting the future h µ r µ Ephemeral: r µ Exists only for the present Lecture 20: Natural Computation & Self-Organization, Physics 256A (a) (Winter 2014); (b) Jim Crutchfield (c) (d) 16

Scaling of other block informations: Block local exogenous: W (`) =B(`)+T (`) Thus, w µ = + µ. From definitions: =(E B E)+`( + µ ) R(`)+W (`) =`H(1) Plug in asymptotes gives third decomposition of H[X] : H(1) = w µ + r µ µ q µ w µ Structural information H[X] h µ r µ r µ Ephemeral: exists only for the present (a) (b) (c) (d) 17

Block multivariate mutual information: X I(`) =H(`) I[X A X Ā ] A2P (`) 0< A <` Asymptote: I(`) I + `i µ? Observations of finite-state process tend to independence: h µ > 0 : lim I(`) =0 i µ =0 I =0 `!1 I(`) Information [bits] I 0 2 4 6 8 10 12 Block length ` [symbols] 18

Measurement Decomposition in the Process I-diagram: H[X 0 ] r µ H[X :0 ] H[X 1: ] q µ µ 19

Predictive decomposition: H[X 0 ] h µ µ 20

Atomic decomposition: H[X 0 ] r µ q µ 21

Structural decomposition: H[X 0 ] r µ w µ 22

Measurement Decomposition in the Process I-diagram... H[X 0 ] r µ H[X :0 ] H[X 1: ] q µ µ What is? µ Not a component of the information in a single observation. Information shared between past and future, not in present! Hidden state information! 23

Measurement Decomposition in the Process I-diagram: H[X 0 ] r µ H[X :0 ] H[X 1: ] q µ µ Decomposition of Excess Entropy: E = + q µ + µ 24

Example Processes: 1 1 1 2 0 A B Even 1 2 1 1 1 1 2 1 A B 1 2 0 Golden Mean 1 0 B 1 2 1 C 1 2 0 A 1 1 D 1 0 E 1 2 0 1 2 1 Noisy Random Phase Slip 25

Example: Parametrized Golden Mean Process: p 1 2 1 Entropy rate [bits/symbol] 0.7 0.6 0.5 0.4 0.3 0.2 0.1 h µ r µ 1 1 A B 1 2 0 1-p 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Self loop probability p ~ Period 2: Randomness affects phase and so is remembered. r µ 0 ~ Period1: Randomness disrupts all 1s, but is not remembered. 0 h µ h µ r µ 26

Entropy-rate Decomposition: Logistic Map 27

Entropy-rate decomposition: Tent Map h µ = log 2 a 28

What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... 29

What is information? Depends on the question! Uncertainty, surprise, randomness,... H(X) Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... h µ 29

What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ 29

What is information? Depends on the question! Uncertainty, surprise, randomness,... H(X) h µ Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... R = log 2 A I[X; Y ] h µ 29

What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ I[X; Y ] E 29

What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ I[X; Y ] E T 29

What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ I[X; Y ] E T rµ 29

What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ I[X; Y ] E T rµ 29

Analysis Pipeline: 1 A 1 B 0 0 C 1...001011101000... α 0 1 0 1 γ 0 β 1 0 1 1 δ 0 System Instrument Process Modeller 30

Analysis Pipeline: 1 A 1 B 0 0 C 1...001011101000... α 0 1 0 1 γ 0 β 1 0 1 1 δ 0 System Instrument Process Modeller 1. An information source: a. Dynamical System: deterministic or stochastic low-dimensional or high-dimensional (spatial?) b. Design instrument (partition) 2. Information-theoretic analysis: a. How much information produced? b. How much stored information? Calculate or estimate c. How does observer synchronize? H(L) h µ E T Pr(s L ) 30

Course summary: Dynamical systems as sources of complexity: 1. Chaotic attractors (State) 2. Basins of attraction (Initial conditions) 3. Bifurcation sequences (Parameters) Dynamical systems as information processors: 1. Randomness: Entropy hierarchy: block entropy, entropy convergence, rate,... 2. Information storage: 1. Total predictability 2. Excess entropy 3. Transient information 31

Preview of 256B: Physics of Computation Answers a number of questions 256A posed: What is a model? What are the hidden states and equations of motion? Are these always subjective, depending on the observer? Or is there a principled way to model processes? To discover states? What are cause and effect? To what can we apply these ideas? How much can we calculate analytically? How much numerically? How much from data? 32

Preview of 256B: Physics of Computation Punch line: How nature is organized is How nature computes 33

Reading for next lecture: CMR articles RURO (Introduction), CAO, and ROIC.... next quarter! 34