Anatomy of a Bit Reading for this lecture: CMR articles Yeung and Anatomy. 1
Information in a single measurement: Alternative rationale for entropy and entropy rate. k measurement outcomes Stored in log 10 k decimal digits or log 2 k bits. Series of n measurements stored in n log 2 k bits. Number M of possible sequences of n measurements with n 1,n 2,...,n k counts: n M = n 1 n 2 n k n! = n 1!n 2! n k! 2
To specify/code which particular sequence occurred store: k log 2 n + log 2 M + log 2 n + Maximum number of bits to store the count ni of each of the k output values Number of bits to specify particular observed sequence within the class of sequences that have counts n1, n2,..., nk Bits to specify the number of bits in n itself bits Apply Stirlings approximation: n! = p 2 n(n/e) n kx To M: log 2 M n n i n log 2 n i n i=1 = nh[ n 1 n, n 2 n,..., n k n ] = nh[x] 3
Can compress if H[X] < log 2 k R 1 1-Symbol Redundancy R 1 = log 2 k R 1 H[X] log 2 k H[X] h µ (a) (b) (c) 4
Typical sources are not IID, therefore use: h µ = lim L!1 H(L) L R 1 R 1 Intrinsic Redundancy log 2 k R 1 = log 2 k h µ (Total Predictability) H[X] h µ Intrinsic Randomness (a) (b) (c) 5
In addition, measurements occur in the context of past and future. A semantically rich decomposition of a single measurement: µ q µ w µ H[X] h µ r µ r µ (a) (b) (c) (d) 6
A semantically rich decomposition of a single measurement... µ Anticipated from prior observations q µ w µ H[X] h µ Random component r µ r µ (a) (b) (c) (d) 7
A semantically rich decomposition of a single measurement... µ q µ Shared between the past and the current observation, but relevance stops there (stationary process only) Anticipated by the past, present currently, and plays a w µ role in future behavior. Can be negative! H[X] Relevant for predicting the future h µ r µ r µ Ephemeral: Exists only for the present (a) (b) (c) (d) 8
A semantically rich decomposition of a single measurement... µ q µ w µ Structural information H[X] h µ r µ r µ Ephemeral: exists only for the present (a) (b) (c) (d) 9
Settings: Before: Predict the present from the past, h µ = H[X 0 X ] X :0 X 0 X 1: X 3 X 2 X 1 X 0 X 1 X 2 X 3 10
Settings: Before: Predict the present from the past, h µ = H[X 0 X ] X :0 X 0 X 1: X 3 X 2 X 1 X 0 X 1 X 2 X 3 Today, prediction from the omniscient view: The present in the context of the past and the future. X :0 X 0 X 1: X 3 X 2 X 1 X 0 X 1 X 2 X 3 10
Notations: l-blocks: Past: Present: Future: X t:t+` = X t X t+1...x t+` 1 X :0 =...X 2 X 1 X 0 X 1: = X 1 X 2... Any information measure F : F(`) =F[X 0:`] 11
Recall block entropy: H(`) E + h µ` where: E = I[X :0 ; X 0: ] Recall total correlation: X T [X 0:N ]= H[X A ] H[X 0:N ] A2P (N) A =1 For stationary time series, get block total correlation: T (`) =`H[X 0 ] H(`) with T (0) = 0 T (1) = 0 Compares process s block entropy to the case of IID RVs. 12
Block entropy versus total correlation... Monotone increasing, with linear asymptote: T (`) E + µ` Convex up. H(`) T (`) Information [bits] E 0 E 0 1 2 3 4 5 6 Block length ` [symbols] 13
Block entropy versus total correlation... Note: H(`)+T (`) =`H(1) Plug in asymptotes gives first decomposition: H(1) = h µ + µ µ Anticipated from prior observations q µ w µ H[X] h µ Random component r µ r µ (a) (b) (c) (d) 14
Scaling of other block informations: Bound information Enigmatic B(`) =H(`) Q(`) =T (`) R(`) B(`) Local Exogenous W (`) =B(`)+T (`) Asymptotes: R(`) E R + `r µ B(`) E B + ` R(`) B(`) Q(`) W (`) Q(`) E Q + `q µ W (`) E W + `w µ E R E B E Q, E W 0 2 4 6 8 10 12 Block length ` [symbols] 15
Scaling of other block informations: Block entropy: H(`) =B(`)+R(`) =(E B + E R )+`( + r µ ) Thus, h µ = + r µ and E = E B + E R. Block total correlation: T (`) =B(`)+Q(`) =(E B + E Q )+`( + q µ ) Thus, µ = + q µ and E = E B E Q. µ Shared between the past and the current observation, but relevance stops there Gives second decomposition of H[X]. H[X] q µ Anticipated w µ by the past, present currently, and plays a role in future behavior. Can be negative! Relevant for predicting the future h µ r µ Ephemeral: r µ Exists only for the present Lecture 20: Natural Computation & Self-Organization, Physics 256A (a) (Winter 2014); (b) Jim Crutchfield (c) (d) 16
Scaling of other block informations: Block local exogenous: W (`) =B(`)+T (`) Thus, w µ = + µ. From definitions: =(E B E)+`( + µ ) R(`)+W (`) =`H(1) Plug in asymptotes gives third decomposition of H[X] : H(1) = w µ + r µ µ q µ w µ Structural information H[X] h µ r µ r µ Ephemeral: exists only for the present (a) (b) (c) (d) 17
Block multivariate mutual information: X I(`) =H(`) I[X A X Ā ] A2P (`) 0< A <` Asymptote: I(`) I + `i µ? Observations of finite-state process tend to independence: h µ > 0 : lim I(`) =0 i µ =0 I =0 `!1 I(`) Information [bits] I 0 2 4 6 8 10 12 Block length ` [symbols] 18
Measurement Decomposition in the Process I-diagram: H[X 0 ] r µ H[X :0 ] H[X 1: ] q µ µ 19
Predictive decomposition: H[X 0 ] h µ µ 20
Atomic decomposition: H[X 0 ] r µ q µ 21
Structural decomposition: H[X 0 ] r µ w µ 22
Measurement Decomposition in the Process I-diagram... H[X 0 ] r µ H[X :0 ] H[X 1: ] q µ µ What is? µ Not a component of the information in a single observation. Information shared between past and future, not in present! Hidden state information! 23
Measurement Decomposition in the Process I-diagram: H[X 0 ] r µ H[X :0 ] H[X 1: ] q µ µ Decomposition of Excess Entropy: E = + q µ + µ 24
Example Processes: 1 1 1 2 0 A B Even 1 2 1 1 1 1 2 1 A B 1 2 0 Golden Mean 1 0 B 1 2 1 C 1 2 0 A 1 1 D 1 0 E 1 2 0 1 2 1 Noisy Random Phase Slip 25
Example: Parametrized Golden Mean Process: p 1 2 1 Entropy rate [bits/symbol] 0.7 0.6 0.5 0.4 0.3 0.2 0.1 h µ r µ 1 1 A B 1 2 0 1-p 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Self loop probability p ~ Period 2: Randomness affects phase and so is remembered. r µ 0 ~ Period1: Randomness disrupts all 1s, but is not remembered. 0 h µ h µ r µ 26
Entropy-rate Decomposition: Logistic Map 27
Entropy-rate decomposition: Tent Map h µ = log 2 a 28
What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... 29
What is information? Depends on the question! Uncertainty, surprise, randomness,... H(X) Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... h µ 29
What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ 29
What is information? Depends on the question! Uncertainty, surprise, randomness,... H(X) h µ Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... R = log 2 A I[X; Y ] h µ 29
What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ I[X; Y ] E 29
What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ I[X; Y ] E T 29
What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ I[X; Y ] E T rµ 29
What is information? Depends on the question! Uncertainty, surprise, randomness,... Compressibility. Transmission rate. Memory, apparent stored information,... Synchronization. Ephemeral. Bound.... H(X) h µ R = log 2 A h µ I[X; Y ] E T rµ 29
Analysis Pipeline: 1 A 1 B 0 0 C 1...001011101000... α 0 1 0 1 γ 0 β 1 0 1 1 δ 0 System Instrument Process Modeller 30
Analysis Pipeline: 1 A 1 B 0 0 C 1...001011101000... α 0 1 0 1 γ 0 β 1 0 1 1 δ 0 System Instrument Process Modeller 1. An information source: a. Dynamical System: deterministic or stochastic low-dimensional or high-dimensional (spatial?) b. Design instrument (partition) 2. Information-theoretic analysis: a. How much information produced? b. How much stored information? Calculate or estimate c. How does observer synchronize? H(L) h µ E T Pr(s L ) 30
Course summary: Dynamical systems as sources of complexity: 1. Chaotic attractors (State) 2. Basins of attraction (Initial conditions) 3. Bifurcation sequences (Parameters) Dynamical systems as information processors: 1. Randomness: Entropy hierarchy: block entropy, entropy convergence, rate,... 2. Information storage: 1. Total predictability 2. Excess entropy 3. Transient information 31
Preview of 256B: Physics of Computation Answers a number of questions 256A posed: What is a model? What are the hidden states and equations of motion? Are these always subjective, depending on the observer? Or is there a principled way to model processes? To discover states? What are cause and effect? To what can we apply these ideas? How much can we calculate analytically? How much numerically? How much from data? 32
Preview of 256B: Physics of Computation Punch line: How nature is organized is How nature computes 33
Reading for next lecture: CMR articles RURO (Introduction), CAO, and ROIC.... next quarter! 34