Complex Systems Methods 3. Statistical complexity of temporal sequences

Complex Systems Methods 3. Statistical complexity of temporal sequences Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 02.11.2007 1 / 24

Overview 1 Summary Kolmogorov sufficient statistic 2 Intuitive notions of complexity 3 Statistical complexity Entropy convergence and excess entropy Predicitive information 4 Entropy estimation Entropy of a discrete random variable finite sample corrections Olbrich (Leipzig) 02.11.2007 2 / 24

Summary Kolmogorov sufficient statistic Kolmogorv complexity K U (x l(x)) = min l(p) p:u(p,l(x))=x Kolmogorov structure function, x n denotes a string of length n K k (x n n) = min p : l(p) k U(p, n) = S x n S {0, 1} n Kolmogorv sufficient statistic: least k such that K k (x n n) + k K(x n n) + c. log S Regularities in x are desribed by describing the set S. Given S the string x is random, Algorithmic complexity randomness. But intuitively, complexity measure should quantify structure, not randomness. Thus it is related to an ensemble and not a single object. Olbrich (Leipzig) 02.11.2007 3 / 24

Dynamical Systems all continuous discetized space discretized time discretized state Partial Differential Equations (PDE s) coupled ordinary differential equations (ODE s) coupled map lattice (CML) cellular automata (CA) all dynamical systems either deterministic or stochastic Digital computer: Finite state automaton - finite number of discrete states Olbrich (Leipzig) 02.11.2007 4 / 24

Intuitive notions of complexity Stephen Wolframs classification of cellular automata Elementary cellular automata: binary states {0, 1}. Next neighbour interaction. Rules can be coded by numbers 0,..., 255: 111 110 101 100 011 010 001 000 e.g. Rule 30. 0 0 0 1 1 1 1 0 Class 1: Evolution leads to a homogeneous state. Class 2: Evolution leads to a set of separated simple stable or periodic structures. Class 3: Evolution leads to a chaotic pattern. Class 4: Evolution leads to complex localized structures, sometimes long-lived. Olbrich (Leipzig) 02.11.2007 5 / 24

Intuitive notions of complexity Stephen Wolframs classification of cellular automata 50 50 50 100 100 100 150 150 150 time 200 time 200 time 200 250 250 250 300 300 300 350 350 350 400 20 40 60 80 100 120 140 160 180 200 400 20 40 60 80 100 120 140 160 180 200 400 20 40 60 80 100 120 140 160 180 200 class 2 class 4 class 3 Olbrich (Leipzig) 02.11.2007 6 / 24

Intuitive notions of complexity Stephen Wolframs classification of cellular automata Propagation of a single perturbation difference rule 14 difference rule 110 difference rule 30 50 50 50 100 100 100 150 150 150 time 200 time 200 time 200 250 250 250 300 300 300 350 350 350 400 50 100 150 200 250 300 350 400 400 50 100 150 200 250 300 350 400 400 50 100 150 200 250 300 350 400 class 2 class 4 class 3 ordered complex chaotic Olbrich (Leipzig) 02.11.2007 7 / 24

Statistical complexity excess entropy K(x l(x)) minimal length of a program, which produces exactly the string x. Most complex strings are algorithmically random. Kolmogorov sufficient statistic: program that describes all regularities in x and computes the set S, which containes all strings with the same regularities as in x, but are otherwise random. Algorithmic complexity can be divided in two parts - regularities and randomness. Same idea for a time series (infinite sequence) provided with a stationary distribution: Randomness per symbol is given by the entropy rate h = lim n h n h n = H(X 0 X 1,..., X n ) Regularities quantified by the excess entropy (Crutchfield) or effective measure complexity (Grassberger) E = lim n E n with E n = (H n n h n ) and H n = H(X n,..., X 1 ) Olbrich (Leipzig) 02.11.2007 8 / 24

Entropy convergence and excess entropy The idea which led originally to this complexity measure was to use the converge rate of the conditional entropy to quantify the complexity of a time series: fast convergence short memory low complexity slow convergence long memory high complexity. Conditional entropies h n = H(X 0 X 1,..., X n ) = H(X 0, X 1,..., X n ) H(X 1,..., X n ) decrease monotonically. Decay is quantified by the conditional mutual information δh n := h n 1 h n = MI (X 0 : X n X 1,..., X n ). Olbrich (Leipzig) 02.11.2007 9 / 24

Entropy convergence and excess entropy Excess entropy: E N = H(X 1,..., X N ) Nh N N 1 = (h n h N ) using h 0 = H(X 1 ) = = n=0 N 1 N n=0 k=n+1 N kδh k k=1 δh k using h n 1 = δh n + h n The slower the convergence of the entropy the larger the excess entropy. If the sequence is Markov of order m: δh n = 0 for n > m. Thus E = E m. Olbrich (Leipzig) 02.11.2007 10 / 24

Predictive information Bialek et al. (2000) proposed to quantify the complexity of a time series by the amount of information which the past tells us about the future I pred = MI (past : future) = lim MI (X n n p,n f p,..., X 0 : X 1,..., X nf ) and called this predictive information. The predictive information is equal to the excess entropy if the corresponding limits exist: I pred = E Olbrich (Leipzig) 02.11.2007 11 / 24

Predictive information I pred = lim MI (X n n p,n f p,..., X 1, X 0 : X 1, X 2,..., X nf ) { = lim H(X 1,..., xnf ) + H(X np,..., X 0 ) n p,n f H(X np,..., X 0,..., X np ) } { = lim Enf + n f h nf + E np+1 + (n p + 1)h np+1 n p,n f } E np+nf +1 (n f + n p + 1)h nf +n p+1 = E The excess entropy measures the amount of information which is available from the past for predicting the time series. Olbrich (Leipzig) 02.11.2007 12 / 24

Causal states Causal equivalence: Two past sequences x 0 = x 0, x 1,... and x 0 = x 0, x 1,... are causal equivalent, if they have the same future, i.e. p(x1 x ) 0 = p(x1 x 0 ). The equivalence classes are called causal states. This notion was introduced by James P. Crutchfield et al. They called the transition graph between the causal states an ɛ-machine. In general it seems to be a subclass of hidden Markov models. Crutchfield and Young (1989): Statistical complexity C µ as the entropy of the stationary distribution over the states. In general there is C µ E. Grassberger (1986) called it set complexity in the context of regular languages. In general: The excess entropy provides a lower bound for the amount of information necessary for an optimal prediction. Olbrich (Leipzig) 02.11.2007 13 / 24

Excess entropy some examples Sequence with period n: h m = 0 for m n, in particular h = 0. The entropy H m n = log n remains constant for m n, thus E = log n. All periodic sequences of the same length have the same complexity according to this measure. (Alternative: transient information introduced by Crutchfield and Feldman). Markov chain: h = h 1. E = δh 1 = h 0 h 1 = MI (X n+1 : X n ). Chaotic maps: Usually exponential decay of h n finite E. Feigenbaum point: h = 0, but h n 1/n, thus E diverges. Olbrich (Leipzig) 02.11.2007 14 / 24

Complexity for the period doubling route to chaos From Crutchfield/Young 1990. Computation at the onset of chaos. Olbrich (Leipzig) 02.11.2007 15 / 24

Excess entropy and attractor dimension Deterministic system with continous state observables: The entropy H m (ɛ) = H(X 1,..., X m ; ɛ) scales with respect to the resolution ɛ: m < D H m (ɛ) = H c m m log ɛ + O(ɛ) m > D H m (ɛ) = const D log ɛ + O(ɛ) Because h = h KS does not depend on ɛ we have E D log ɛ. The better one knows the initial conditions of the system the better the system can be predicted. Olbrich (Leipzig) 02.11.2007 16 / 24

References Peter Grassberger, Toward a quantitative theory of self-generated complexity, International Journal of Theoretical Physics, 25 (1986), 907-938. James P. Crutchfield and David P. Feldman, Regularities unseen, randomness observed: Levels of entropy convergence Chaos, 13 (2003), 25-54. William Bialek, Ilya Nemenman and Naftali Tishby, Predictability, complexity, and learning. Neural Computation, 13 (2001), 2409-2463. Remo Badii and Antonio Politi, Complexity Hierarchical structures and scaling in physics, Cambridge University Press, 1997. Olbrich (Leipzig) 02.11.2007 17 / 24

Entropy of a discrete randomm variable finite sample corrections Reference: P. Grassberger, Entropy Estimates from Insufficient Samplings, arxiv:physics/0307138v1 N data points randomly and independently distributed on M boxes. The number n i of points in each box is a random variable with an expectation value z i := E[n i ] = p i N. Their distribution is binomial ( ) N P(n i ; p i, N) = p n i i (1 p i ) N n i n i For p i 1 i the n i can be assumed to be Poisson distributed P(n i ; z i ) = zn i i n i! e z i Olbrich (Leipzig) 02.11.2007 18 / 24

Entropy of a discrete random variable finite sample corrections Entropy M H = p i log p i = ln N 1 N i=1 Naive estimator M z i ln z i i=1 Ĥ naive = ln N 1 N M n i ln n i i=1 In general the estimator is biased, i.e. H := E[Ĥ] H 0. H naive < 0. Olbrich (Leipzig) 02.11.2007 19 / 24

Entropy of a discrete random variable finite sample corrections In the limit of large N and M each contribution z i ln z i will be statistically independent and can be estimated as function of n i : z i ln z i ˆ z i ln z i = n i φ(n i ) E[ ˆ z i ln z i ] = n i φ(n i )P(n i ; z i ). n i =1 Implizit assumption: n i = 0 gives no information about p i. Estimator Ĥ φ = ln N 1 N M n i φ(n i ) i=1 Olbrich (Leipzig) 02.11.2007 20 / 24

Finite sample corrections Grassbergers Result 1 Grassbergers result: with the digamma function E[nψ(n)] = z ln z + ze 1 (z) ψ(x) = d ln Γ(x) dx and E 1 = Γ(0, x) = e xt 1 t dt. For large z ze 1 (z) e z, thus neglecting this term gives the estimator Ĥ ψ = ln N 1 M n i ψ(n i ). N i=1 Olbrich (Leipzig) 02.11.2007 21 / 24

Finite sample corrections Comparison with other results Approximations for the digamma function leads to known estimators: 1 ψ(x) ln x: naive estimator 2 ψ(x) ln x 1/(2x): Miller correction 3 ψ(x) < ln x 1/(2x) < ln x leads to Ĥψ > ĤMiller > Ĥnaive Grassbergers best estimator: M Ĥ G = ln N 1 N i=1 n i G ni with G n = ψ(n) + ( 1) n k=1 1 (n + 2k)(n + 2k + 1) or G 1 = γ ln 2 G 2 = 2 γ ln 2 G 2n+1 = G 2n and G 2n+2 = G 2n + 2 2n+1 for n 1. Olbrich (Leipzig) 02.11.2007 22 / 24

Finite sample corrections Comparison between different corrections 2.5 2 Grassberger G ψ(n)+( 1) n /(n(n+1)) ψ(n) log(n) 1/2n log(n) 1.5 1 φ(n) 0.5 0 0.5 1 Note that ψ(x) + ( 1)n n(n+1) 1.5 10 0 10 1 G n = ψ(n) + ( 1) n n corresponds to an approximation of k=1 with considering only the k = 0 term in the sum. 1 (n + 2k)(n + 2k + 1) Olbrich (Leipzig) 02.11.2007 23 / 24

Test Entropy of a Gaussian distribution 1.48 1.46 1.44 H(ε)+log(ε) 1.42 1.4 1.38 1.36 1.34 naive N=10 4 Miller ψ(n) ψ(n)+( 1) n /(n(n+1)) naive N=10 6 1/2 (1+log(2 π)) 10 2 10 1 10 0 ε Differential entropy of a Gaussian distribution H C Gauss = 1 2 (1 + log(2πσ2 )) = H(ɛ) + log(ɛ) + O(ɛ) Olbrich (Leipzig) 02.11.2007 24 / 24