Complex Systems Methods 3. Statistical complexity of temporal sequences

Similar documents
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos. Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2

Permutation Excess Entropy and Mutual Information between the Past and Future

Statistical Complexity of Simple 1D Spin Systems

Chaos, Complexity, and Inference (36-462)

Reconstruction Deconstruction:

Anatomy of a Bit. Reading for this lecture: CMR articles Yeung and Anatomy.

Optimal predictive inference

Exercises with solutions (Set D)

predictive information, multi-information, and binding information

Predictive information in Gaussian processes with application to music analysis

Optimal Deep Learning and the Information Bottleneck method

Local measures of information storage in complex distributed computation

Computational Mechanics of the Two Dimensional BTW Model

How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos

Section 11.1 Sequences

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations

Cellular automata in noise, computing and self-organizing

Lecture 6: Entropy Rate

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Quantum versus Classical Measures of Complexity on Classical Information

Markov Chains and MCMC

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

MARKOV PROCESSES. Valerio Di Valerio

The Organization of Intrinsic Computation: Complexity-Entropy Diagrams and the Diversity of Natural Information Processing

Informational and Causal Architecture of Discrete-Time Renewal Processes

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

PHY411 Lecture notes Part 5

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Detecting non-trivial computation in complex dynamics

CISC 876: Kolmogorov Complexity

Characterizing chaotic time series

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Path integrals for classical Markov processes

Pattern Discovery in Time Series, Part I: Theory, Algorithm, Analysis, and Convergence

Games and Abstract Inductive definitions

Cellular Automata. Jason Frank Mathematical Institute

Cellular automata are idealized models of complex systems Large network of simple components Limited communication among components No central

Workshop on Heterogeneous Computing, 16-20, July No Monte Carlo is safe Monte Carlo - more so parallel Monte Carlo

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Mechanisms of Emergent Computation in Cellular Automata

On Computational Limitations of Neural Network Architectures

3F1 Information Theory, Lecture 3

Computation in Sofic Quantum Dynamical Systems

ELEC546 Review of Information Theory

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Comparing Information-Theoretic Measures of Complexity in Boltzmann Machines

Persistent Chaos in High-Dimensional Neural Networks

Towards a Theory of Information Flow in the Finitary Process Soup

A Relation between Complexity and Entropy for Markov Chains and Regular Languages

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Are chaotic systems dynamically random?

1.2. Introduction to Modeling. P (t) = r P (t) (b) When r > 0 this is the exponential growth equation.

Estimation of information-theoretic quantities

6 Continuous-Time Birth and Death Chains

Coding on Countably Infinite Alphabets

How to do backpropagation in a brain

Inaccessibility and undecidability in computation, geometry, and dynamical systems

Information Theoretic Properties of Quantum Automata

Toward a Better Understanding of Complexity

SOME APPLICATIONS OF INFORMATION THEORY TO CELLULAR AUTOMATA

Chaotic Properties of the Elementary Cellular Automaton Rule 40 in Wolfram s Class I

Cellular Automata CS 591 Complex Adaptive Systems Spring Professor: Melanie Moses 2/02/09

Equivalence of History and Generator ɛ-machines

Chapter I: Fundamental Information Theory

Entropy and information estimation: An overview

The Modeling and Complexity of Dynamical Systems by Means of Computation and Information Theories

Artificial Neural Networks. MGS Lecture 2

STA 414/2104: Machine Learning

Pattern Discovery in Time Series, Part II: Implementation, Evaluation, and Comparison

The projects listed on the following pages are suitable for MSc/MSci or PhD students. An MSc/MSci project normally requires a review of the

Measures for information propagation in Boolean networks

An Introduction to Entropy and Subshifts of. Finite Type

However, in actual topology a distance function is not used to define open sets.

Words generated by cellular automata

The Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment

Free Energy, Fokker-Planck Equations, and Random walks on a Graph with Finite Vertices

Mechanisms of Chaos: Stable Instability

Computation at the Nanoscale:

II. Spatial Systems A. Cellular Automata 8/24/08 1

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

Gradient Estimation for Attractor Networks

LECTURE 3. Last time:

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

GWAS V: Gaussian processes

Regularities unseen, randomness observed: Levels of entropy convergence

First and Last Name: 2. Correct The Mistake Determine whether these equations are false, and if so write the correct answer.

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

MADHAVA MATHEMATICS COMPETITION, December 2015 Solutions and Scheme of Marking

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

Chaos, Complexity, and Inference (36-462)

The integrating factor method (Sect. 1.1)

The Fixed String of Elementary Cellular Automata

II. Spatial Systems. A. Cellular Automata. Structure. Cellular Automata (CAs) Example: Conway s Game of Life. State Transition Rule

Kolmogorov Complexity and Diophantine Approximation

Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias

Transcription:

Complex Systems Methods 3. Statistical complexity of temporal sequences Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 02.11.2007 1 / 24

Overview 1 Summary Kolmogorov sufficient statistic 2 Intuitive notions of complexity 3 Statistical complexity Entropy convergence and excess entropy Predicitive information 4 Entropy estimation Entropy of a discrete random variable finite sample corrections Olbrich (Leipzig) 02.11.2007 2 / 24

Summary Kolmogorov sufficient statistic Kolmogorv complexity K U (x l(x)) = min l(p) p:u(p,l(x))=x Kolmogorov structure function, x n denotes a string of length n K k (x n n) = min p : l(p) k U(p, n) = S x n S {0, 1} n Kolmogorv sufficient statistic: least k such that K k (x n n) + k K(x n n) + c. log S Regularities in x are desribed by describing the set S. Given S the string x is random, Algorithmic complexity randomness. But intuitively, complexity measure should quantify structure, not randomness. Thus it is related to an ensemble and not a single object. Olbrich (Leipzig) 02.11.2007 3 / 24

Dynamical Systems all continuous discetized space discretized time discretized state Partial Differential Equations (PDE s) coupled ordinary differential equations (ODE s) coupled map lattice (CML) cellular automata (CA) all dynamical systems either deterministic or stochastic Digital computer: Finite state automaton - finite number of discrete states Olbrich (Leipzig) 02.11.2007 4 / 24

Intuitive notions of complexity Stephen Wolframs classification of cellular automata Elementary cellular automata: binary states {0, 1}. Next neighbour interaction. Rules can be coded by numbers 0,..., 255: 111 110 101 100 011 010 001 000 e.g. Rule 30. 0 0 0 1 1 1 1 0 Class 1: Evolution leads to a homogeneous state. Class 2: Evolution leads to a set of separated simple stable or periodic structures. Class 3: Evolution leads to a chaotic pattern. Class 4: Evolution leads to complex localized structures, sometimes long-lived. Olbrich (Leipzig) 02.11.2007 5 / 24

Intuitive notions of complexity Stephen Wolframs classification of cellular automata 50 50 50 100 100 100 150 150 150 time 200 time 200 time 200 250 250 250 300 300 300 350 350 350 400 20 40 60 80 100 120 140 160 180 200 400 20 40 60 80 100 120 140 160 180 200 400 20 40 60 80 100 120 140 160 180 200 class 2 class 4 class 3 Olbrich (Leipzig) 02.11.2007 6 / 24

Intuitive notions of complexity Stephen Wolframs classification of cellular automata Propagation of a single perturbation difference rule 14 difference rule 110 difference rule 30 50 50 50 100 100 100 150 150 150 time 200 time 200 time 200 250 250 250 300 300 300 350 350 350 400 50 100 150 200 250 300 350 400 400 50 100 150 200 250 300 350 400 400 50 100 150 200 250 300 350 400 class 2 class 4 class 3 ordered complex chaotic Olbrich (Leipzig) 02.11.2007 7 / 24

Statistical complexity excess entropy K(x l(x)) minimal length of a program, which produces exactly the string x. Most complex strings are algorithmically random. Kolmogorov sufficient statistic: program that describes all regularities in x and computes the set S, which containes all strings with the same regularities as in x, but are otherwise random. Algorithmic complexity can be divided in two parts - regularities and randomness. Same idea for a time series (infinite sequence) provided with a stationary distribution: Randomness per symbol is given by the entropy rate h = lim n h n h n = H(X 0 X 1,..., X n ) Regularities quantified by the excess entropy (Crutchfield) or effective measure complexity (Grassberger) E = lim n E n with E n = (H n n h n ) and H n = H(X n,..., X 1 ) Olbrich (Leipzig) 02.11.2007 8 / 24

Entropy convergence and excess entropy The idea which led originally to this complexity measure was to use the converge rate of the conditional entropy to quantify the complexity of a time series: fast convergence short memory low complexity slow convergence long memory high complexity. Conditional entropies h n = H(X 0 X 1,..., X n ) = H(X 0, X 1,..., X n ) H(X 1,..., X n ) decrease monotonically. Decay is quantified by the conditional mutual information δh n := h n 1 h n = MI (X 0 : X n X 1,..., X n ). Olbrich (Leipzig) 02.11.2007 9 / 24

Entropy convergence and excess entropy Excess entropy: E N = H(X 1,..., X N ) Nh N N 1 = (h n h N ) using h 0 = H(X 1 ) = = n=0 N 1 N n=0 k=n+1 N kδh k k=1 δh k using h n 1 = δh n + h n The slower the convergence of the entropy the larger the excess entropy. If the sequence is Markov of order m: δh n = 0 for n > m. Thus E = E m. Olbrich (Leipzig) 02.11.2007 10 / 24

Predictive information Bialek et al. (2000) proposed to quantify the complexity of a time series by the amount of information which the past tells us about the future I pred = MI (past : future) = lim MI (X n n p,n f p,..., X 0 : X 1,..., X nf ) and called this predictive information. The predictive information is equal to the excess entropy if the corresponding limits exist: I pred = E Olbrich (Leipzig) 02.11.2007 11 / 24

Predictive information I pred = lim MI (X n n p,n f p,..., X 1, X 0 : X 1, X 2,..., X nf ) { = lim H(X 1,..., xnf ) + H(X np,..., X 0 ) n p,n f H(X np,..., X 0,..., X np ) } { = lim Enf + n f h nf + E np+1 + (n p + 1)h np+1 n p,n f } E np+nf +1 (n f + n p + 1)h nf +n p+1 = E The excess entropy measures the amount of information which is available from the past for predicting the time series. Olbrich (Leipzig) 02.11.2007 12 / 24

Causal states Causal equivalence: Two past sequences x 0 = x 0, x 1,... and x 0 = x 0, x 1,... are causal equivalent, if they have the same future, i.e. p(x1 x ) 0 = p(x1 x 0 ). The equivalence classes are called causal states. This notion was introduced by James P. Crutchfield et al. They called the transition graph between the causal states an ɛ-machine. In general it seems to be a subclass of hidden Markov models. Crutchfield and Young (1989): Statistical complexity C µ as the entropy of the stationary distribution over the states. In general there is C µ E. Grassberger (1986) called it set complexity in the context of regular languages. In general: The excess entropy provides a lower bound for the amount of information necessary for an optimal prediction. Olbrich (Leipzig) 02.11.2007 13 / 24

Excess entropy some examples Sequence with period n: h m = 0 for m n, in particular h = 0. The entropy H m n = log n remains constant for m n, thus E = log n. All periodic sequences of the same length have the same complexity according to this measure. (Alternative: transient information introduced by Crutchfield and Feldman). Markov chain: h = h 1. E = δh 1 = h 0 h 1 = MI (X n+1 : X n ). Chaotic maps: Usually exponential decay of h n finite E. Feigenbaum point: h = 0, but h n 1/n, thus E diverges. Olbrich (Leipzig) 02.11.2007 14 / 24

Complexity for the period doubling route to chaos From Crutchfield/Young 1990. Computation at the onset of chaos. Olbrich (Leipzig) 02.11.2007 15 / 24

Excess entropy and attractor dimension Deterministic system with continous state observables: The entropy H m (ɛ) = H(X 1,..., X m ; ɛ) scales with respect to the resolution ɛ: m < D H m (ɛ) = H c m m log ɛ + O(ɛ) m > D H m (ɛ) = const D log ɛ + O(ɛ) Because h = h KS does not depend on ɛ we have E D log ɛ. The better one knows the initial conditions of the system the better the system can be predicted. Olbrich (Leipzig) 02.11.2007 16 / 24

References Peter Grassberger, Toward a quantitative theory of self-generated complexity, International Journal of Theoretical Physics, 25 (1986), 907-938. James P. Crutchfield and David P. Feldman, Regularities unseen, randomness observed: Levels of entropy convergence Chaos, 13 (2003), 25-54. William Bialek, Ilya Nemenman and Naftali Tishby, Predictability, complexity, and learning. Neural Computation, 13 (2001), 2409-2463. Remo Badii and Antonio Politi, Complexity Hierarchical structures and scaling in physics, Cambridge University Press, 1997. Olbrich (Leipzig) 02.11.2007 17 / 24

Entropy of a discrete randomm variable finite sample corrections Reference: P. Grassberger, Entropy Estimates from Insufficient Samplings, arxiv:physics/0307138v1 N data points randomly and independently distributed on M boxes. The number n i of points in each box is a random variable with an expectation value z i := E[n i ] = p i N. Their distribution is binomial ( ) N P(n i ; p i, N) = p n i i (1 p i ) N n i n i For p i 1 i the n i can be assumed to be Poisson distributed P(n i ; z i ) = zn i i n i! e z i Olbrich (Leipzig) 02.11.2007 18 / 24

Entropy of a discrete random variable finite sample corrections Entropy M H = p i log p i = ln N 1 N i=1 Naive estimator M z i ln z i i=1 Ĥ naive = ln N 1 N M n i ln n i i=1 In general the estimator is biased, i.e. H := E[Ĥ] H 0. H naive < 0. Olbrich (Leipzig) 02.11.2007 19 / 24

Entropy of a discrete random variable finite sample corrections In the limit of large N and M each contribution z i ln z i will be statistically independent and can be estimated as function of n i : z i ln z i ˆ z i ln z i = n i φ(n i ) E[ ˆ z i ln z i ] = n i φ(n i )P(n i ; z i ). n i =1 Implizit assumption: n i = 0 gives no information about p i. Estimator Ĥ φ = ln N 1 N M n i φ(n i ) i=1 Olbrich (Leipzig) 02.11.2007 20 / 24

Finite sample corrections Grassbergers Result 1 Grassbergers result: with the digamma function E[nψ(n)] = z ln z + ze 1 (z) ψ(x) = d ln Γ(x) dx and E 1 = Γ(0, x) = e xt 1 t dt. For large z ze 1 (z) e z, thus neglecting this term gives the estimator Ĥ ψ = ln N 1 M n i ψ(n i ). N i=1 Olbrich (Leipzig) 02.11.2007 21 / 24

Finite sample corrections Comparison with other results Approximations for the digamma function leads to known estimators: 1 ψ(x) ln x: naive estimator 2 ψ(x) ln x 1/(2x): Miller correction 3 ψ(x) < ln x 1/(2x) < ln x leads to Ĥψ > ĤMiller > Ĥnaive Grassbergers best estimator: M Ĥ G = ln N 1 N i=1 n i G ni with G n = ψ(n) + ( 1) n k=1 1 (n + 2k)(n + 2k + 1) or G 1 = γ ln 2 G 2 = 2 γ ln 2 G 2n+1 = G 2n and G 2n+2 = G 2n + 2 2n+1 for n 1. Olbrich (Leipzig) 02.11.2007 22 / 24

Finite sample corrections Comparison between different corrections 2.5 2 Grassberger G ψ(n)+( 1) n /(n(n+1)) ψ(n) log(n) 1/2n log(n) 1.5 1 φ(n) 0.5 0 0.5 1 Note that ψ(x) + ( 1)n n(n+1) 1.5 10 0 10 1 G n = ψ(n) + ( 1) n n corresponds to an approximation of k=1 with considering only the k = 0 term in the sum. 1 (n + 2k)(n + 2k + 1) Olbrich (Leipzig) 02.11.2007 23 / 24

Test Entropy of a Gaussian distribution 1.48 1.46 1.44 H(ε)+log(ε) 1.42 1.4 1.38 1.36 1.34 naive N=10 4 Miller ψ(n) ψ(n)+( 1) n /(n(n+1)) naive N=10 6 1/2 (1+log(2 π)) 10 2 10 1 10 0 ε Differential entropy of a Gaussian distribution H C Gauss = 1 2 (1 + log(2πσ2 )) = H(ɛ) + log(ɛ) + O(ɛ) Olbrich (Leipzig) 02.11.2007 24 / 24