Sharpening Occam s Razor with Quantum Mechanics

Similar documents
Quantum versus Classical Measures of Complexity on Classical Information

Statistical Complexity of Simple 1D Spin Systems

Information Theoretic Properties of Quantum Automata

QUANTUM INFORMATION -THE NO-HIDING THEOREM p.1/36

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Computation at the Nanoscale:

Chapter 2 Review of Classical Information Theory

Information in Biology

Transmitting and Hiding Quantum Information

Entropy in Classical and Quantum Information Theory

Lecture notes on Turing machines

Intrinsic Quantum Computation

Information in Biology

to mere bit flips) may affect the transmission.

Lecture: Quantum Information

The P versus NP Problem in Quantum Physics

Optimal predictive inference

Information Accessibility and Cryptic Processes

Compression and entanglement, entanglement transformations

Lecture 22: Quantum computational complexity

Tensor network simulations of strongly correlated quantum systems

arxiv:quant-ph/ v1 4 Jul 2003

Local measures of information storage in complex distributed computation

Computation in Sofic Quantum Dynamical Systems

*WILEY- Quantum Computing. Joachim Stolze and Dieter Suter. A Short Course from Theory to Experiment. WILEY-VCH Verlag GmbH & Co.

A Refinement of Quantum Mechanics by Algorithmic Randomness and Its Application to Quantum Cryptography

Lecture 11: Quantum Information III - Source Coding

Entropies & Information Theory

Demon Dynamics: Deterministic Chaos, the Szilard Map, & the Intelligence of Thermodynamic Systems.

How to play two-players restricted quantum games with 10 cards

arxiv:quant-ph/ v5 6 Apr 2005

Quantum-controlled measurement device for quantum-state discrimination

Unitary evolution: this axiom governs how the state of the quantum system evolves in time.

Lecture 6: Quantum error correction and quantum capacity

Quantum Computing. Joachim Stolze and Dieter Suter. A Short Course from Theory to Experiment. WILEY-VCH Verlag GmbH & Co. KGaA

arxiv:quant-ph/ v1 15 Apr 2005

Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations

1 Measurements, Tensor Products, and Entanglement

Reconstruction Deconstruction:

Probabilistic exact cloning and probabilistic no-signalling. Abstract

A Holevo-type bound for a Hilbert Schmidt distance measure

Algorithmic Probability

Quantum Information Types

Physics, Time and Determinism

Mechanisms of Emergent Computation in Cellular Automata

Permutation Excess Entropy and Mutual Information between the Past and Future

CS120, Quantum Cryptography, Fall 2016

ROM-BASED COMPUTATION: QUANTUM VERSUS CLASSICAL

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Quantum Measurements: some technical background

Quantum error correction in the presence of spontaneous emission

The information content of a quantum

Other Topics in Quantum Information

Quantum Finite-State Machines

Substituting a qubit for an arbitrarily large amount of classical communication

Lecture 18: Quantum Information Theory and Holevo s Bound

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 1: Quantum circuits and the abelian QFT

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Quantum Algorithms. Andreas Klappenecker Texas A&M University. Lecture notes of a course given in Spring Preliminary draft.

Most General computer?

Quantum Error Correcting Codes and Quantum Cryptography. Peter Shor M.I.T. Cambridge, MA 02139

Lecture 2: From Classical to Quantum Model of Computation

QUANTUM COMPUTER SIMULATION

ON THE POSSIBILITY OF NONLINEAR QUANTUM EVOLUTION AND SUPERLUMINAL COMMUNICATION

Lecture 1: Introduction, Entropy and ML estimation

New Quantum Algorithm Solving the NP Complete Problem

5. Communication resources

Computational Tasks and Models

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Introduction to Quantum Information Hermann Kampermann

Quantum secret sharing based on quantum error-correcting codes

Thermodynamics of computation

A statistical mechanical interpretation of algorithmic information theory

Multiplicativity of Maximal p Norms in Werner Holevo Channels for 1 < p 2

Chapter 9 Fundamental Limits in Information Theory

1. Basic rules of quantum mechanics

Complex Systems Methods 3. Statistical complexity of temporal sequences

Introduction to Quantum Computing

Quantum theory without predefined causal structure

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

COMPARATIVE ANALYSIS ON TURING MACHINE AND QUANTUM TURING MACHINE

Quasi-gedanken experiment challenging the nosignaling

Towards control over fading channels

Quantum decoherence. Éric Oliver Paquette (U. Montréal) -Traces Worshop [Ottawa]- April 29 th, Quantum decoherence p. 1/2

Quantum Thermodynamics

A Computational Model of Time-Dilation

Lecture 1: Introduction to Quantum Computing

PHYS 414 Problem Set 4: Demonic refrigerators and eternal sunshine

Quantum Information Processing with Liquid-State NMR

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Stochastic Processes

arxiv:quant-ph/ v1 28 May 1998

arxiv:quant-ph/ v1 21 Feb 2002

Lecture 4 Noisy Channel Coding

Ph 219b/CS 219b. Exercises Due: Wednesday 22 February 2006

D.5 Quantum error correction

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Entanglement of projection and a new class of quantum erasers

arxiv: v2 [quant-ph] 16 Nov 2018

Transcription:

Sharpening Occam s Razor with Quantum Mechanics ternative is more efficient and thus superior; it demands only a single bit of input, when the original model required two. Preference for efficient models of physical phenomena is not mere aesthetics; it also carries significant physical consequence. Consider both aforementioned models embedded within separate black boxes. The first box stores two bits of information, while the second requires only one. Any physical realization of a model that demands C bits of input requires the capacity to store those bits. The construction of simpler mathematical models for a given process lets us simulate the process with reduced information storage requirements. Thus knowledge of the provably simplest model of a given observed process alarxiv:1102.1994v4 [quant-ph] 9 Aug 2011 Mile Gu, 1 Karoline Wiesner, 2 Elisabeth Rieper, 1 and Vlatko Vedral 3, 4 1 Center for Quantum Technology, National University of Singapore, Republic of Singapore 2 School of Mathematics, Centre for Complexity Sciences, University of Bristol, Bristol BS8 1TW, United Kingdom 3 Atomic and Laser Physics, Clarendon Laboratory, University of Oxford, Parks Road, Oxford OX13PU, United Kingdom 4 Department of Physics, National University of Singapore, Republic of Singapore (Dated: January 28, 2019) Mathematical models are an essential component of quantitative science. They generate predictions about the future, based on information available in the present. In the spirit of Occam s razor, simpler is better; should two models make identical predictions, the one that requires less input is preferred. Yet, for almost all stochastic processes, even the provably optimal classical models waste information. For each bit of predictive output, they generally require more than a single bit of input. We systematically construct quantum models that break this classical bound. This indicates that the system of minimal entropy that simulates such processes must necessarily feature quantum dynamics, and that many observed phenomena could be significantly simpler than classically possible should quantum effects be involved. PACS numbers: 02.50.-r, 89.70.-a, 03.67.-a, 02.50.Ey, 03.67.Ac Occam s razor, the principle that plurality is not to be posited without necessity, is an important heuristic that guides the development of theoretical models in quantitative science. In the words of Isaac Newton, We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances. Take for example application of Newton s laws on an apple in free fall. The future trajectory of the apple is entirely determined by a second order differential equation, that requires only its current location and velocity as input. We can certainly construct alternative models that predict identical behavior, that demand the apple s color, or even postulate that its motion was dictated by a flying spaghetti monster. Such theories, however, are dismissed by Occam s razor, since they demand input information that is either unnecessary or redundant. Generally, a mathematical model of a system of interest is an algorithmic abstraction of its observable output. Envision that the given system is incased within a black box, such that we observe only its output. Within a second box resides a computer that executes a model of this system with appropriate input. For the model to be accurate, we expect these boxes to be operationally indistinguishable; their output is statistically equivalent, such that no external observer can differentiate which box contains the original system. There are numerous distinct models for any given system. Consider a system consisting of two binary switches. At each time-step, the system emits a 0 or 1 depending on whether the state of the two switches coincides, and one of the two switches is chosen at random and flipped. The obvious model that simulates this system keeps track of both switches, and thus requires 2 bits of input. Yet, the output is simply a sequence of alternating 0s and 1s, and can thus be modeled knowing only the value of the previous emission. Occam s razor stipulates that this al- FIG. 1: Any computation is physical. To implement the mathematical model of a particular observable phenomena, we (a) encode the inputs of the model within a suitable physical system, (b) evolve the system according to some physical laws and (c) retrieve the predictions of model by appropriate measurement. Thus, if a model demands a particular piece of data is input, the physical implementation of that model must have the capacity to store that data.

2 lows direct inference of its complexity. If a process exhibits observed statistics that cannot be modeled with C bits of input, then whatever the underlying mechanics of the observed process, we require a system of entropy of at least C bits to predict its future statistics. Occam s razor thus has direct physical interpretation; the optimal simulator of a particular behavior is the one of least internal entropy. These observations motivate the importance of maximally efficient models; models that generate desired statistical behavior, while requiring minimal input information. In this article, we show that even when such behavior aligns with simple stochastic processes, such models are almost always quantum. The system of least internal memory that generates statistically identical simulations demands quantum resources. In fact, unless improvement over the optimal classical model violates the second law of thermodynamics, a superior quantum model can always be constructed. We can characterize the observable behavior of any dynamical process by a joint probability distribution P ( X, X), where X and X are random variables that govern the system s observed behavior respectively, in the past and the future. Each particular realization of the process has a particular past x, with probability P ( X = x ). Should there exist a system that simulates such behavior with entropy C, then there must exist a model that can generate random variables whose statistics obey P ( X X = x ), when given only C bits of information about x. We seek the maximally efficient model, such that C is minimized. Since the past contains exactly E = I( X : X) bits (the mutual information between past and future) about the future, the system must have entropy of at least E (this remains true for quantum systems [2]). On the other hand, there appears no obvious reason a model should require anything more than exactly these E bits. We say that the resulting model, where C = E = I( X : X), is ideal. It turns out that for many systems such models do not exist. Consider a dynamical system observed at discrete times t Z, with possible discrete outcomes x t Σ dictated by random variables X t. Such a system can be modeled by a stochastic process[3], where each realization is specified by a sequence of past outcomes x =... x 3 x 2 x 1, and exhibits a particular future x = x0 x 1 x 2... with probability P ( X = x X = x ). Here, E = I( X : X), referred to as excess entropy[6, 7], is a quantity of relevance in diverse disciplines ranging from spin systems[8] to measures of brain complexity[9]. How can we construct the simplest simulator of such behavior, preferably exhibiting entropy of no more than E? The brute force approach is to create an algorithm that samples from P ( X X = x ) given complete knowledge of x. Such a construction accepts x directly as input, resulting in the required entropy of C = H( X), where H( X) denotes the Shannon entropy of the complete past. This is wasteful. Consider the output statistics resulting from a sequence of coin flips, such that P ( X, X) is the uniform distribution over all binary strings. E equals 0 and yet C is infinite. It should not require infinite memory to mimic a single coin, better approaches exist. ɛ-machines are the provably optimal classical solution[4, 5]. They rest on the rationale that to exhibit desired future statistics, a system needs not distinguish differing pasts, x and x, if their future statistics coincide. This motivates the equivalence relation,, on the set of all past output histories, such that x x iff P ( X x ) = P ( X x ). To sample from P ( X x ) for a particular x, a ɛ-machine need not store x, only which equivalence class, ɛ( x ) { x : x x }, x belongs to. Each equivalence classes is referred to as a causal state. For any stochastic process P ( X, X) with emission alphabet Σ, we may deduce its causal states S = {S i } N i=1 that form the state space of its corresponding ɛ-machine. At each time step, the machine operates according to a set of transition probabilities j,k ; the probability that the machine will output r Σ, and transition to S k given that it is in state S j. The resulting ɛ-machine, when initially set to state ɛ( x ), generates a sequence x according to probability distribution P ( X X = x ) as it iterates through these transitions. The resulting ɛ-machine thus has internal entropy C = H(S) = j S p j log p j C µ (1) where S is the random variable that governs S j = ɛ( x ) and p j is the probability that ɛ( x ) = S j. The provable optimality of ɛ-machines among all classical models motivates C µ as an intrinsic property of a given stochastic process, rather than just a property of ɛ-machines. Referred to in literature as the statistical complexity [5, 10], its interpretation as the minimal amount of information storage required to simulate such a given process has been applied to quantify selforganization[11], the onset of chaos[4] and complexity of protein configuration space[12]. Such interpretations, however, implicitly assume that classical models are optimal. Should a quantum simulator be capable of exhibiting the same output statistics with reduced entropy, this fundamental interpretation of C µ may require review. There is certainly room for improvement. For many stochastic processes, C µ is strictly greater than E[10]; the ɛ-machine that models such processes is fundamentally irreversible. Even if the entire future output of such an ɛ-machine was observed, we would still remain uncertain which causal state the machine was initialized in. Some of that information has been erased, and thus, in principle, need never be stored. In this paper we show that for all such processes, quantum processing helps; for any ɛ-machine such that C µ > E, there exists a quantum ɛ-machine with entropy C q, such that C µ > C q E.

3 The key intuition for our construction lies in identifying the cause of irreversibility within classical ɛ-machines, and addressing it within quantum dynamics. An ɛ- machine distinguishes two different causal states provided they have differing future statistics, but makes no distinction based on how much these futures differ. Consider two causal states, S j or S k, that both have potential to emit output r at the next time-step and transition to some coinciding causal state S l. Should this occur, some of the information required to completely distinguish S j and S k has been irreversibly lost. We say that S j and S k share non-distinct futures. In fact, this is both necessary and sufficient for C µ > E. Theorem 1 Given a stochastic process P ( X, X) with excess entropy E and statistical complexity C µ. Let its corresponding ɛ-machine have transition probabilities j,k. Then C µ > E iff there exists a non-zero probability that two different causal states, S j and S k will both make a transition to a coinciding causal state S l upon emission of a coinciding output r Σ, i.e., j,l, 0. Proof: See Appendix A. This result highlights the fundamental limitation of any classical model. In order to generate desired statistics, any classical model must record each binary property A such that P ( X A = 0) P ( X A = 1), regardless of how much these distributions overlap. In contrast, quantum models are free of such restriction. A quantum system can store causal states as quantum states that are not mutually orthogonal. The resulting quantum ɛ-machine differentiates causal states sufficiently to generate correct statistical behavior. Essentially, they save memory by partially discarding A, and yet retain enough information to recover statistical differences between P ( X A = 0) and P ( X A = 1). Given an ɛ-machine with causal states S j and transition probabilities j,k, we define quantum causal states S j = N k=1 r Σ jk r k, (2) where r and k form orthogonal bases on Hilbert spaces of size Σ and S respectively. A quantum ɛ-machine accepts a quantum state S j as input in place of S j. Thus, such a system has an internal entropy of C q = Trρ log ρ, (3) where ρ = j p j S j S j. C q is clearly strictly less than C µ provided not all S j are mutually orthogonal[15]. S j is sufficient for generating accurate future statistical behaviour. A simple method is to (i) measure S j in the basis r k, resulting in measurement values r, k. (ii) Set r as output x 0 and prepare the quantum state S k. Repetition of this process generates a sequence of outputs x 1, x 2,... according to the same probability distribution as the original ɛ-machine and hence P ( X x ). Thus, a quantum ɛ-machine initialized in state S j has a black-box behavior which is statistically identical to a classical ɛ-machine initialized in state S j (We note that while the simplicity of the above method makes it easy to understand and amiable to experimental realization, there s room for improvement. The decoding process prepares S k based of the value of k, and thus still requires C µ bits of memory. However, there exist more sophisticated protocols without such limitation, see Appendix B). This observation, combined with Theorem 1 leads to the central result of our paper. Theorem 2 Consider any stochastic process P ( X, X) with excess entropy E, whose optimal classical simulator has internal entropy C µ > E. Then we may construct a quantum simulator that exhibits identical statistics, with internal entropy C q < C µ. Proof: Let {S j } be the causal states of P ( X, X). We construct a quantum ɛ-machine with quantum causal states { S j }, and C q as defined in Eq. (3). Since C µ > E, we know from Theorem 1 that there exists two causal states, S j and S k, which will both make a transition to a coinciding causal state S l upon emission of a coinciding output r Σ, i.e., j,l, 0. Consequently S j S k Tj,l r T r (r) > 0 iff T j,l, 0. Thus, not all quantum causal states are mutually orthogonal, and hence C q < C µ. Quantum systems thus can always simulate the statistics of any stochastic process more efficiently than the provably optimal classical model, unless the optimal classical model is already ideal. We briefly highlight these ideas with a concrete example of a perturbed coin. Consider a process P ( X, X) realized by a box that contains a single coin. At each time step, the box is perturbed such that the coin flips with probability 0 < p < 1, and the state of the coin is then observed. This results in a stochastic process, where each x t {0, 1}, governed by random variable X t, represents the result of the observation at time t. For any p 0.5, this system has two causal states, corresponding to the two possible states of the coin; the set of pasts ending in 0, and the set of pasts ending in 1. We call these S 0 and S 1. The perturbed coin is its own best classical model, requiring exactly C µ = 1 bit of information, namely the current state of the coin, to generate correct future statistics. As p 0.5, the future statistics of S 0 and S 1 become increasingly similar. The stronger the perturbation, the less it matters what state the coin was in prior to perturbation. This is reflected by the observation that E 0 (in fact E = 1 H s (p)[8], where H s (p) = p log p (1 p) log(1 p) is the Shannon entropy of a biased coin that outputs head with probability p [13]). Thus only E/C µ = 1 H s (p) of every bit stored is useful, which tends to 0 as p 0.5. Quantum ɛ-machines offer dramatic improvement. We encode the quantum causal states S 0 = 1 p 0 +

4 bits 1.0 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1.0 p FIG. 2: While the excess entropy of the perturbed coin approaches zero as p 0.5 (c), generating such statistics classically requires an entire bit of memory (a). Encoding the past within a quantum system leads to significant improvement (b). Note, however, that even the quantum protocol still requires more input than E (c). p 1 or S1 = p 0 + 1 p 1 within a qubit, which results in entropy C q = Trρ ln ρ, where ρ = 1 2 ( S 0 S 0 + S 1 S 1 ). The non-orthogonality of S 0 and S 1 ensures that this will always be less than C µ [14]. As p 0.5, a quantum ɛ-machines tends to require negligible amount of memory to generate the same statistics compared to its classical counterpart (Fig. 2). This improvement is readily apparent when we model a lattice of K independent perturbed coins, which output a number x Z 2K that represents state of the lattice after each perturbation. Any classical model must necessarily differentiate between 2 K equally likely causal states, and thus require K bits of input. A quantum ɛ- machine reduces this to KC q. For p > 0.2, C q < 0.5, the initial condition of two perturbed coins may, in principle, be compressed within a single qubit (on average). For p > 0.4, C q < 0.1; one qubit can encode the initial condition of 10 such coins. This memory reduction has direct physical consequence. Encoding the initial conditions of a quantum ɛ-machine by single photons of frequency ω, for example, leads to an instant reduction in expected energy cost. A unitary rotation allows us to encode S 0 = 0 and S 1 = 2 p(1 p) 0 + (1 2p) 1. Such an encoding incurs an average energy cost of ω(1 2p) 2 /2, which tends to 0 as p 0.5. In comparison, if we require S 0 and S 1 to be orthogonal, an energy cost of ω/2 is unavoidable. I. DISCUSSION In this article, we have demonstrated that any stochastic process with no reversible classical model can be further simplified by quantum processing. Such stochastic (a) (b) (c) processes are almost ubiquitous. Even the statistics of perturbed coins can be simulated by a quantum system of reduced entropy. In addition, the quantum reconstruction can be remarkably simple. The capacity to make single photon measurements, for example, allows construction of a quantum epsilon machine that simulates a perturbed coin with reduced energy expenditure. This allows potential for experimental validation with present day technology. This result has significant implications. Stochastic processes play an ubiquitous role in the modeling of dynamical systems that permeate quantitative science, from climate fluctuations to chemical reaction processes. Classically, the statistical complexity C µ is employed as a measure of how much structure a given process exhibits. The rationale is that the optimal simulator of such a process requires at least this much memory. The fact that this memory can be reduced quantum mechanically implies the counterintuitive conclusion that quantizing such simulators can reduce their complexity beyond this classical bound, even if the process they re simulating is purely classical. Many organisms and devices operate based on the ability to predict and thus react to the environment around them, the fact that it is possible to make identical predictions with less memory by exploiting quantum dynamics implies that such systems need not be as complex as one originally thought. This improved efficiency has direct physical consequences. Landauer s principle[17] implies that recording information incurs a fundamental energy cost (since this information will eventually need to be erased). In addition, we may exploit the non-orthogonality of quantum causal states to directly reduce the cost of encoding the past in certain physical implementations (such as the quantum optical regime). Thus, the potential for quantum systems to simulate a certain statistical output with reduced system entropy lowers the fundamental energy requirements of a stochastic computation. This leads to the open question, is it always possible to find an ideal predictive model? Certainly, Fig. 2 shows that our construction, while superior to any classical alternative, is still not wholly reversible. While this irreversibility may indicate that more efficient, ideal quantum models exist, it is also possible that ideal models remain forbidden within quantum theory. Both cases are interesting. The former would indicate that the notion of stochastic processes hiding information from the present[10] is merely a construct of inefficient classical probabilistic models, while the latter hints at a source of temporal asymmetry within the framework of quantum mechanics; that it is fundamentally impossible to simulate certain observable statistics reversibly. Acknowledgments M.G. would like to thank C. Weedbrook, H. Wiseman, M. Hayashi, W. Son and K. Modi for helpful discussions. M.G. and E.R. are supported by the National Research Foundation and Ministry of Education, in Singapore. K.W. is funded through EPSRC grant EP/E501214/1. V.V. would like to thank

5 EPSRC, QIP IRC, Royal Society and the Wolfson Foundation, National Research Foundation (Singapore) and the Ministry of Education (Singapore) for financial support. Appendix A: Proof of Theorem 1: Let the aforementioned ɛ-machine have causal states S = {S i } N 1 and emission alphabet Σ. Consider an instance of the ɛ-machine at a particular time-step t. Let S t and X t be the random variables that respectively governs its causal state and observed output at time t, such that the transition probabilities that define the ɛ-machine can be expressed as j,k = P (S t = S k, X t = r S t 1 = S j ). (A1) We say an ordered pair (S j S, r Σ) is a valid emission configuration iff j,k 0 for some S k S. That is, it is possible for an ɛ-machine in state S j to emit r and transit to some S k. Denote the set of all valid emission configurations by Ω E. Similarly, we say an ordered pair (S k S, r Σ) is a valid reception configuration iff j,k 0 for some S j S, and denote the set of all valid reception configurations by Ω R. We define the transition function f : Ω E Ω R. Such that f(s j, r) = (S k, r) if the ɛ-machine set to state S j will transition to state S k upon emission of r. We also introduce the shorthand X b a to denote the the list of random variables X a, X a+1,..., X b. We first prove the following observations. 1. f is one-to-one iff there exist no distinct causal states, S j and S k, such that j,l, 0 for some S l. Proof: Suppose f is one-to-one, then f(s j, r) = f(s k, r) iff S j = S k. Thus, there does not exist two distinct causal states, S j and S k such that j,l, 0 for some S l. Conversely, if f is not one-to-one, so that f(s j, r) = f(s k, r) for some S j S k. Let S l be the state such that f(s j, r) = (S l, r), then j,l, 0. 2. H(S t 1 X t S t ) = 0 iff f is one-to-one. Proof: Suppose f is one-to-one. Then for each (S j, r) Ω R, there exists a unique (S k, r) such that f(s k, r) = (S j, r). Thus, given S t = S j and X t = r, we may uniquely deduce S k. Therefore H(S t 1 X t S t ) = 0. Conversely, should H(S t 1 X t S t ) = 0, then H(S t 1 X t X t S t ) = 0, and thus f is one-to-one. 3. H(S t 1 X t S t ) = 0 implies H(S t 1 X t 0) = H(S t X t 0). Proof: Note that (i) H(S t 1 X t 0S t ) = H(S t X t 0S t 1 ) + H(X t 0S t 1 ) H(X t 0S t ) and (ii) that, since the output of f is unique for a given (r, S) Ω E, H(S t X t S t 1 ) = 0. (ii) implies that H(S t X t 0S t 1 ) = 0 since uncertainty can only decrease with additional knowledge and is bounded below by 0. Substituting this into (i) results in the relation H(S t 1 X t S t ) = H(X t 0S t 1 ) H(X t 0S t ).Thus H(S t 1 X t S t ) = 0 implies H(S t 1 X t 0) = H(S t X t 0). 4. H(S t 1 X t 0) = H(S t X t 0) implies C µ = E. Proof: The result follows then from two known properties of ɛ-machines, (i) lim t H(S t X t 0) = 0 and (ii) C µ E = H(S 1 X 0 ) [5]. Now assume that H(S t 1 X t 0) = H(S t X t 0), recursive substitutions imply that H(S 1 X t 0) = H(S t X t 0). In the limit where t, the above equality implies C µ E = 0. 5. C µ = E implies H(S t 1 X t S t ) = 0. Proof: Since (i) C µ = E = H(S 1 X 0 ) H(S 1 X 0 S 0 ), and (ii) H(S t 1 X t S t ) = H(S 1 X 0 S 0 ), it suffices to show that H(S 1 X 0 S 0 ) = H(S 1 X 0 S 0 ). Now H(S 1 X 0 S 0 ) = H(X 0 S 1 S 0 ) H(X 0 S 0 ) = H(X 1 S 1 X 0 S 0 ) + H(X 0 S 1 S 0 ) H(X 1 X 0 S 0 ) H(X 0 S 0 ). But, by the Markov property of causal states, H(X 1 S 1 X 0 S 0 ) = H(X 1 X 0 S 0 ), thus H(S 1 X 0 S 0 ) = H(X 0 S 1 S 0 ) H(X 0 S 0 ) = H(S 1 X 0 S 0 ), as required. Combining (1), (2), (3) and (4), we see that there exists a non-zero probability that two distinct causal states, S j and S k such that j,l, 0 for some S l only if C µ E. Meanwhile (1), (2), and (5) imply that there exists no two distinct causal states, S j and S k such that j,l, 0 for some S l only if C µ = E. Theorem 1 follows. Appendix B: Memory Conserving Prediction Protocol: Recall that in the simple prediction protocol, the preparation of the next quantum causal state was based on the result of a measurement in basis k. Thus, although we can encode the initial conditions of a stochastic process within a system of entropy C q, the decoding process required C µ bits. While this protocol establishes our main result that quantum models require less knowledge of the past, quantum systems implementing this specific prediction protocol may still need C µ bits of memory at some stage during their evolution. This limitation is unnecessary. In this section, we present a more sophisticated protocol whose implementation requires no more than memory C q. Consider a quantum ɛ-machine initialized in state S j =

exists, since it is defined by Krauss operators B k = S k k that satisfy k B k B k = 1. 2. Output R 1. Measurement of R 1 in the r basis leads to a classical output r whose statistics coincide with that of its classical counterpart, x 1. 3. The remaining subsystem R 2 K is retained as the initial condition of the quantum ɛ-machine at the next timestep. 6 FIG. 3: Quantum circuit representation of the refined prediction protocol. n k=1 r Σ jk r k. We refer the subsystem spanned by r as R 1, and the subsystem spanned by k as K. To replicate the future statistics, we 1. Apply a general quantum operation on K that maps any given S j to S j = n k=1 r Σ jk r S k on R 1 R 2 K, where R 2 is a second Hilbert space of dimension Σ. Note that this operation always See Fig. 3 for a circuit representation of the protocol. Step (1) conserves system entropy since entropy is conserved under addition of pure ancilla. Step (2) does not increase entropy since S j S k S j S k for all j, k. Tracing out R 1 in step (3) leaves the epsilon machine in state p j S j S j, which has entropy C q. Finally, the execution of the protocol does not require knowledge of the measurement result r (In fact, the quantum ɛ- machine can thus execute correctly even if all outputs remained unmeasured, and thus are truly ignorant of which causal state they re in!). Thus, the physical application of the above protocol generates correct predication statistics without requiring more than memory C q. [1] Turing, A. M. On computable numbers, with an application to the Entscheidungsproblem. Proc. Lond. Math. Soc. 2 42, 230 (1936). [2] Holevo, A. S. Statistical Problems in Quantum Physics. In Maruyama, G. & Prokhorov, J. V. (eds.) Proceedings of the Second Japan USSR Symposium on Probability Theory, 104 119 (Springer-Verlag, Berlin, 1973). Lecture Notes in Mathematics, vol. 330. [3] Doob, J. L. Stochastic Processes (Wiley, New York, 1953). [4] Crutchfield, J. P. & Young, K. Inferring statistical complexity. Phys. Rev. Lett. 63, 105 108 (1989). [5] Rohilla Shalizi, C. & Crutchfield, J. P. Computational Mechanics: Pattern and Prediction, Structure and Simplicity. Journal of Statistical Physics 104, 817 879 (2001). arxiv:cond-mat/9907176. [6] Crutchfield, J. P. & Feldman, D. P. Regularities unseen, randomness observed: Levels of entropy convergence. Chaos: An Interdisciplinary Journal of Nonlinear Science 13, 25 54 (2003). URL http://link.aip.org/ link/?cha/13/25/1. [7] Grassberger, P. Toward a Quantitative Theory of Self- Generated Complexity. International Journal of Theoretical Physics 25, 907 938 (1986). [8] Crutchfield, J. P. & Feldman, D. P. Statistical complexity of simple one-dimensional spin systems. Phys. Rev. E 55, R1239 R1242 (1997). [9] Tononi, G., Sporns, O. & Edelman, G. M. A Measure for Brain Complexity: Relating Functional Segregation and Integration in the Nervous System. Proceedings of the National Academy of Science 91, 5033 5037 (1994). [10] Crutchfield, J. P., Ellison, C. J. & Mahoney, J. R. Time s barbed arrow: Irreversibility, crypticity, and stored information. Phys. Rev. Lett. 103, 094101 (2009). [11] Shalizi, C. R., Shalizi, K. L. & Haslinger, R. Quantifying self-organization with optimal predictors. Phys. Rev. Lett. 93, 118701 (2004). [12] Chun-Biu Li, T. K., Haw Yang. Multiscale complex network of protein conformational fluctuations in single-molecule time series. Proceedings of the National Academy of Sciences 105, 536 541 (2008). [13] Shannon, C. E. Prediction and entropy of printed English. Bell Sys. Tech. J. 30, 50 64 (1951). [14] G. Benenti, G. S., G. Casati. Principles of Quantum Information and computation II. (World Scientific, 2007). [15] Nielsen, M. A. & Chuang, I. L. Quantum computation and quantum information (Cambridge University Press, Cambridge, 2000). [16] Schumaker, B Quantum Coding. Phys. Rev. A 51, 50 64 (1995). [17] Landauer, R. Irreversibility and Heat Generation in the Computing Process. IBM J. Res. Dev. 5, 183 (1961). [18] W. lohr, N. Ay On the Generative Nature of Prediction. Advances in Complex Systems 12, 169 (2009).