SUPPLEMENTARY INFORMATION - PDF Free Download

doi:0.038/nature03 I. MAIN RESULTS Erasure, work-extraction and their relation to Maxwell s demon have spurred rather extensive literature (for overviews see [ 4]) as well as debates (see, e.g., [5 8]). Several approaches have been proposed in the past to formalize the idea of a thermal process and to study erasure, work-extraction and their relation to Maxwell s demon [9 9]. Correlations and entanglement can affect erasure and work-extraction, as has been noted by several authors. For instance, in [0] the system to be erased is bipartite and the observer is restricted to local operations and classical communication (LOCC); the difference between quantum and classical demons is addressed in []; see also [] for a discussion on local and global demons in the context of the thermodynamic arrow of time. Here, we assume a setting as described in the manuscript, where an observer with a quantum memory, Q, tries to erase a system, S, using a heat bath and a battery. Memory contents about other systems must be preserved. We introduce a reference system, R, that includes those other systems and purifies the initial global state; information-conservation then means that the joint state ρ QR must be preserved. The final Hamiltonian of S and Q should equal the initial one, which is totally degenerate. A. General single-shot results In general, the work required to erase a system is a random variable. We characterize a single instance of erasure with a probabilistic statement, and later we consider the average work cost of erasure in a thermodynamic limit. Supplementary Theorem I. guarantees that the cost of erasing a system does not exceed a bound given in terms of the entropy of S conditioned on Q, except with a small probability. Supplementary Theorem I.. [Theorem from the Methods Summary] Given a system, S, and a memory, Q, in initial joint state ρ, there exists a process, P, to erase S that acts at temperature T, whose work cost, denoted W P (S Q) T ρ, satisfies W P (S Q) T ρ [H ε max(s Q) ρ + ] kt ln, except with probability less than δ =,ε 0. + ε, The quantity H ε max(s Q) denotes the ε-smooth maxentropy of system S conditioned on the quantum memory Q, a single-shot generalization of the von Neumann entropy [3] (definition and properties in Supplementary Section II). The term can be chosen to be small and does not scale with the size of the systems involved, so it may be neglected in the limit of large systems. For instance, to allow a maximum probability of failure of only δ = 3%, one pays a price of approximately 0 kt ln in the work consumption of the process (in addition to the one dictated by the entropy), assuming that ε is negligible. In proving Supplementary Theorem I. we are faced with the problem that, in general, the joint state of S, Q, and R may be not as neat and clean as in the example of Quasimodo given in the manuscript, where we had a maximally entangled state between S and a subsystem of Quasimodo s memory. Nevertheless, this special example contains the essence of the general case. Proof of Theorem I.. The erasure process, P, that gives the bounds of Theorem I. can be described as follows. Firstly we find a subsystem of S Q, X, that is δ-close to a pure state. Supplementary Theorem III. tells us that there is such a subsystem, of size log X log S H ε max(s Q)+log (δ ε). We can then apply the elementary work-extraction procedure on X to gain energy log X kt ln, leaving X in a fully mixed state (see Supplementary Theorem IV.). To satisfy the information-preservation condition, the joint state of Q and R must be preserved when the pure state of X is replaced with a fully mixed state. This is guaranteed by Supplementary Theorem III., because the state of X is maximally entangled between a subsystem of S and a subsystem of the combination of S and Q. Finally, we let S thermalize and apply the elementary erasure process on S, performing work log S kt ln (Supplementary Corollary IV.). The total work cost of erasing S using this process, P, is W P (S Q) T ρ = [log S log X ] kt ln [H ε max(s Q) ρ log (δ ε)] kt ln. WWW.NATURE.COM/NATURE

The fact that the state of X is at most δ-distant from the desired pure state bounds the probability of error of the procedure (see Supplementary Lemma IV.3). As a byproduct of the above proof we find an analogous result for work-extraction. The goal of this process is to extract work from a system, S, under the assumption that the state ρ QR is kept intact (while the final state of S is arbitrary). Supplementary Corollary I.. Given an n-qubit system, S, and a memory, Q, in initial joint state ρ, there exists a work-extraction process, P ex, acting at temperature T, with work gain, W Pex, that satisfies W Pex (S Q) T ρ [n H ε max(s Q) ρ ] kt ln, except with a probability of at most δ =,ε 0. B. Asymptotic limit + ε, We typically expect thermal fluctuations to disappear in macroscopic systems. Theoretically, this is usually handled by taking a thermodynamic limit, where we in some sense increase the size of the system such that fluctuations are averaged away. In a similar spirit we imagine to perform the erasure on a large collection of identical and independently distributed (iid) systems. We define the work cost rate of an erasure process as the average work cost of an optimal erasure process in this limit. Supplementary Definition I.3 (work cost rate). Given a system, S, and a memory, Q, in initial joint state ρ, the work cost rate of erasure of S is defined as w(s Q) T ρ := inf{w : {P n } n : [ ] lim inf Prob W Pn (S Q) T n ρ nw =}. n Here, P n is a process that erases n iid copies of S given the knowledge of n copies of Q. This definition gives the optimal average work cost of erasure in the limit of large n: we take the best process (i.e., the one with the least work cost) among those that always succeed in the asymptotic limit. Using the Asymptotic Equipartition Property (AEP; see Supplementary equation (II.3)), we can bound the work cost rate of erasure as follows. Supplementary Corollary I.4. Given a system, S, and a memory, Q, in initial joint state ρ, the work cost rate of S at temperature T is bounded as w(s Q) T ρ H(S Q) ρ kt ln. One can prove the converse bound under a natural assumption: Landauer s principle for a classical observer. Supplementary Lemma I.5. If the erasure of any system, Z, correlated with a classical memory, C, in a quantum-classical state, ρ, at temperature T has work cost rate w(z C) T ρ H(Z C) ρ kt ln, then the work cost rate of erasing a system S conditioned on a quantum memory, Q, initially in an arbitrary bipartite quantum state, ρ, is bounded as Proof. We can write w(s Q) T ρ H(S Q) ρ kt ln. w(s QC) T ρ + H(Q C) ρ kt ln w(s QC) T ρ + w(q C) T ρ w(sq C) T ρ H(SQ C) ρ kt ln. The first inequality comes from Supplementary Corollary I.4. The second inequality stands because erasing S given the knowledge of Q and later erasing Q is one particular way of erasing SQ. The last inequality comes from Landauer s principle for a classical observer (with Z = SQ). Rearranging the terms, we obtain w(s QC) T ρ [H(SQ C) ρ H(Q C) ρ ] kt ln = H(S QC) ρ kt ln, which recovers the desired result. The following lemma shows that local operations on the memory cannot decrease the work cost of erasure. This may be used to demonstrate the data processing inequality [4 6], as shown in the manuscript. Supplementary Lemma I.6. Let ρ be a state of S Q R, ξ a trace-preserving completely positive map from the memory Q to a system Q, and ρ = ξ Q I SR (ρ). The work cost rate of erasing S cannot decrease after applying ξ, w(s Q) T ρ w(s Q ) T ρ. Proof. Using Stinespring dilation we find a system Q and an isometry U : Q Q Q such that ξ(σ) =Tr Q U(σ). Since Q and Q Q are related by an isometry, for every erasure process conditioned on Q there exists an equivalent process conditioned on Q Q, so w(s Q) T ρ = w(s Q Q ) T ρ. Clearly w(s Q ) T ρ w(s Q Q ) T ρ, because every erasure process conditioned on Q is a particular case of a process conditioned on Q Q that does not use the information of Q. Note that the information-preservation condition requires that the state of Q and Q be preserved even for processes conditioned only on Q, because Q can be considered part of the reference. WWW.NATURE.COM/NATURE

For the above results on the work cost rate we have limited ourselves to the case of iid states. Needless to say, increasing the size of a system does not necessarily correspond to taking many iid copies. However, the iid assumption is important for the above derivations only so far that it allows us to prove that the smooth max-entropy (per qubit, or some other elementary unit) converges to the corresponding von Neumann entropy, via the AEP. We could thus expect to obtain similar results for any family of states that satisfy this limit. That there exist reasonable non-iid generalizations of this type can be seen, for instance, from the Shannon-McMillan- Breiman theorem in classical information theory (see e.g., [7]), and similar results in quantum information theory (see e.g., [8, 9] and references therein). We note that there is a similar consideration in statistical mechanics, concerning the relation between the Gibbs entropy and Boltzmann s entropy formula S = k ln W, where W is interpreted as the effective phase space volume compatible with a given macro state (see [30] for a discussion). The latter is akin to the unconditional smooth max-entropy, which is the logarithm of the effective support size of the density operator. Similarly, the Gibbs entropy is essentially the Shannon or von Neumann entropy. II. SMOOTH ENTROPIES Our main result, Supplementary Theorem I., relies on the smooth max-entropy, H ε max, as a measure to quantify uncertainty [3]. Smooth entropies have, so far, mainly been used in information theory, where they proved to be the relevant quantities to characterize informationprocessing tasks such as randomness or entanglement distillation, channel coding, data compression, or key distribution. The formulation of the entropy-work relation in terms of the smooth max-entropy rather than the more standard von Neumann entropy has the advantage that the relation is valid independently of the structure of the underlying quantum states. A work-entropy relation involving the von Neumann entropy (Corollary I.4) is obtained from this general result by introducing appropriate assumptions, as explained below. In the following, we briefly review the definition of smooth entropies and show how they are related to the von Neumann entropy. For a more detailed discussion of smooth entropies, their properties, and their informationtheoretic significance, we refer to [3, 3 33]. S conditioned on Q can be expressed in terms of fidelity, F, as Hmax(S Q) ε ρ := inf sup log F (ρ SQ, ρ S σ Q ). SQ σ Q The supremum ranges over all density operators σ Q on Q. The infimum is taken over all (subnormalized) density operators ρ SQ that are ε-close to ρ SQ, where ε 0 is the smoothness parameter, which is usually chosen to be small but nonzero. Our technical arguments also involve the smooth minentropy, which can be seen as the dual of the smooth max-entropy, in the following sense. Consider a purification ρ SQR of the given bipartite state ρ SQ, with a purifying system R. The ε-smooth min-entropy of S conditioned on R then corresponds to the negative smooth max-entropy conditioned on Q [3, 34], H ε min(s R) ρ = H ε max(s Q) ρ. (II.) Smooth entropies have properties analogous to those of the conditional von Neumann entropy. For example, for ε 0, both Hmin ε (S Q) ρ and Hmin ε (S Q) ρ are 0 if the reduced state on S is pure, for a qubit S that is fully mixed and uncorrelated with Q, and for a qubit S that is maximally entangled with Q. Furthermore, they satisfy a data-processing inequality. It asserts that the entropy of S conditioned on Q can only increase if information is processed locally at Q. Formally, H ε max(s Q ) ρ H ε max(s Q) ρ, where ρ = ρ SQ is the state obtained from ρ SQ after applying a trace-preserving completely positive map M on system Q. B. Specialization to the von Neumann entropy For a bipartite quantum state ρ SQ, the von Neumann entropy of S conditioned on Q is defined by H(S Q) ρ = H(ρ SQ ) H(ρ Q ), where H(σ) denotes the usual (nonconditional) von Neumann entropy of σ, i.e., H(σ) = Tr(σ log σ). The conditional von Neumann entropy is always bounded by the smooth min- and max-entropies, lim ε 0 Hε min(s Q) ρ H(S Q) ρ lim H ε ε 0 max(s Q) ρ. (II.) In particular, if the smooth min- and max-entropies coincide, they are automatically equal to the von Neumann A. Definition and properties Let ρ = ρ SQ be the state of a bipartite system, consisting of subsystems S and Q. The ε-smooth max-entropy of Note that the fidelity can be defined for arbitrary (not necessarily normalized) positive operators, R and S, by F (R, S) := R S, where is the L -norm. Closeness is measured in terms of the purified distance [34]. WWW.NATURE.COM/NATURE 3

entropy. Hence, under this condition, the smooth maxentropy occurring in Supplementary Theorem I. can be replaced by the von Neumann entropy (Supplementary Corollary I.4). A typical situation where equation II. holds (approximately) with equality is that of a large n-partite system with weakly correlated parts. In the limit when the correlations disappear, the state of the system is of the form σ n (iid). Such states are common in information theory and physics they arise, for instance, naturally for systems with sufficiently high symmetries (e.g., when a system is invariant under permutations of its n parts [35]). One can show that the smooth min- and max-entropies converge for states of the form ρ S n Q n = σ SQ n [36]. Hence, by virtue of equation II., and using the fact that the von Neumann entropy is additive, one has, for any 0 <ε<, lim n n Hε max(s n Q n ) σ n n Hε min(s n Q n ) σ n = lim n = H(S Q) σ. (II.3) In other words, for iid states, the work-entropy relation of Supplementary Theorem I. asymptotically also holds for the von Neumann entropy. We note that equation II.3 can be seen as a reformulation of the Asymptotic Equipartition Property, which plays a crucial role in the area of information theory. There, operational quantities (such as the compression rate of a random source or the amount of randomness that can be distilled from a given source) are usually related to either the smooth min- or the smooth max-entropy. The widespread use of the von Neumann entropy in (text-book) information theory is therefore mainly a consequence of the fact that one typically considers asymptotic iid situations a limit where all these entropy measures are equal. III. PURITY EXTRACTION In this section we describe purity extraction, used in the first step of the erasure process of Supplementary Theorem I.. In this procedure we use correlations between two systems, S and Q, to find a pure state in a subsystem of S Q. We will often refer to the distance between two states, measured by the trace distance, D(ρ, σ) = ρ σ. We say that ρ and σ are δ-close if D(ρ, σ) δ. Supplementary Theorem III.. Consider a system, S Q R, in a pure state, ρ. There is an l-qubit system, S P, with S S and P S Q equally large systems, in a state δ-close to a pure state maximally entangled between S and P. The size of this system is bounded as l log S H ε max(s Q) ρ + log (δ ε), ε 0. The proof of Supplementary Theorem III. relies on the notion of decoupling, first introduced in [37] and generalized in [38]. Starting from recent tight bounds on decoupling [38], our proof proceeds in two steps, corresponding to Subsections A and B below (see also Fig. ). In the first step we show that a subsystem S S, of l/-qubits, can be decoupled from R. In the second step, we show that, since the global state is pure, S is purified by a subsystem of S Q of the same dimension. The pure state created has a total of l qubits. A. Decoupling In this first step, we give a bound on the size of the subsystem of S that can be decoupled from R, according to the following definition. Supplementary Definition III. (Decoupling). A system X is δ -decoupled from another system, R, if their joint state is δ -close to a product state where the reduced state of X is fully mixed, ( D ρ XR, X X ρ R ) δ. Lemma III.3 shows that the size of the decoupled system depends on the correlations between S and Q, as measured by an entropy measure, the smooth conditional max-entropy, Hmax(S Q). ε Supplementary Lemma III.3. Let ρ be a pure state of a system S Q R. There exists an l -qubit subsystem S S that is δ -decoupled from R, with l log S Hmax(S Q) ε ρ + log (δ ε), ε 0. Proof. Decoupling results [38, Cor. 6.] imply that the state obtained after applying a random unitary on S is close to the desired decoupled state if correlations between S and R are weak. We use a formulation proposed in [39], where those correlations are measured by the smooth conditional min-entropy, Hmin ε (S R) ρ, ) ) (Tr S ([U S R ] ρ SR, du S U S D S l/ ρ R (log S l+) Hε min (S R)ρ +6ε, (III.) where S = S S. Here, the integral is taken over all unitary operations on system S, U S, according to the Haar measure, and Hmin ε (S R) ρ is the smooth conditional min-entropy of S, given the information that R may provide about that system. The bound of equation III. applies to the average of a positive quantity over all unitaries. It follows that there is at least one fixed unitary, corresponding to a choice of basis of S, that respects the bound. If we fix an upper bound of δ on the distance between the desired and the obtained states, we have l log S + Hmin ε (S R) ρ + log (δ ε). (III.) 4 WWW.NATURE.COM/NATURE

Reference S S P Q FIG. : Purity extraction. A subsystem S of S is decoupled from the reference. The size of S decreases with the strength of correlations between S and the reference, and therefore increases with correlations between S and the memory, Q (see Supplementary Section II). Since the global state is pure, S is purified by a system P of equal size that belongs to the remaining systems, S and Q. The state of S P is maximally entangled. The arrows symbolize correlations between systems. The global state is pure, so one may use the duality relation between entropy measures, introduced in equation II., H ε min (S R) ρ = H ε max (S Q) ρ, where the latter is the smooth max-entropy of system S conditioned on the memory. Inserting this in equation III., we obtain l log S Hmax(S Q) ε ρ + log (δ ε). B. Purification To complete the proof of Supplementary Theorem III., it remains to show that, given an l -qubit system S decoupled from R, it is possible to find an l-qubit subsystem of S Q in a state approximately pure. Note that the global state of S Q R is still pure. Supplementary Lemma III.4. Consider a system, (S S ) Q R, in a pure state, such that the l -qubit system S is δ -decoupled from R. There is an l -qubit subsystem P S Q such that the joint state of S P is δ -close to a fully entangled state. Proof. In a first step we assume that the state of S is perfectly decoupled from R (δ = 0). We can expand it as S l/ ρ R = l/ k k k S i λ i i i R. We can find systems A and A that purify ρ S and ρ R respectively. The composite system A A purifies ρ S ρ R, φ = φ SA φ RA = l/4 k k S k A i λi i R i A. The statement for δ = 0 follows from the fact that any two purifications of the same state are related by a unitary transformation on the purifying system. In particular, P is given as the image of A under this unitary. The claim for strictly positive δ follows similarly, using Uhlmann s theorem and properties of the trace distance [34, Lem. 6]. IV. WORK EXTRACTION Here we describe a process, similar to [, 6], that allows us to extract energy from a heat bath, given a system in a pure state, as illustrated in Fig.. We assume a setting where we can raise or lower energy levels of the system arbitrarily, i.e., we can change the energy E n of each level, n, into a new value E n. If the system is in level n we define the work gain of this operation as the difference E n E n, where we implicitly assume that this energy is added to a system that stores useful energy, i.e., the battery. To model the thermalization by a temperature T heat bath, we put the system in the Gibbs state corresponding to the given energies E n. In the context of the erasure process described in Supplementary Theorem I. we use this work-extraction scheme on system S P. This system is approximately in a pure state, as shown in Section III B. Note that since the Hamiltonian of S Q is assumed to be fully degenerate, it follows that the subsystem S P also has a well-defined and fully degenerate Hamiltonian. Hence, we can apply the following theorem. Supplementary Theorem IV.. Given an l-qubit system, with a fully degenerate Hamiltonian, in a pure state, there is a process at temperature T that extracts l kt ln of work. This process leaves the system in the maximally mixed state, and the final Hamiltonian is the same as the initial. Proof. If φ 0 is the initial state of the system, we extend it to an orthonormal basis { φ j } N j=0, where N =l. Since the initial Hamiltonian is fully degenerate, all the states { φ j } are energy eigenstates. We start by lifting all the energy levels,...,n to a high value E (Fig. a). This operation has no energy cost. Next we thermalize the system, which puts it in its temperature T Gibbs state. After this, the probability to find the system in one of the raised levels,...,n is p(e ) = [ + e β(e E 0) /N ]. We next lower the levels,...,n WWW.NATURE.COM/NATURE 5

Battery Battery Battery a b l k T ln c E 0 E 0 E 0 Heat bath T Heat bath T Heat bath T FIG. : Extracting work from a l-qubit system in a pure state. This process can be seen as the reverse of erasure. a) Only one state is occupied, at energy E 0; the energy of the empty levels is raised to a very high value at zero cost. b) We couple the system to the bath and slowly decrease the energy of the empty states. These will become gradually populated according to the Gibbs distribution. Lowering the partially occupied states results in energy gain of l kt ln in total. This energy is stored in the battery. c) In the end of the procedure, the Hamiltonian of the system is fully degenerate and the state is maximally mixed. by a small step. With probability p(e ) this yields a work gain of (Fig. b). If we repeat this we gain work with probability p(e ). This procedure is repeated until all levels,...,n are back at energy E 0. With a final thermalization, the state becomes maximally mixed (Fig. c). Furthermore, since all the levels are back at their original energy E 0, the final Hamiltonian is the same as the initial. In the limit 0 and E +, this process yields the average work gain E E lim p(e)de = lim de E E E 0 E 0 + eβ(e E 0 ) N = l kt ln. Assuming independence between the work gain in each small step,, one can show that the fluctuations around this average vanish in the limit 0. The process described in Supplementary Theorem IV. takes a system from a pure to a fully mixed state, extracting work. By inverting this process, one can bring a system from a fully mixed to a pure state in other words, erase the system. Supplementary Corollary IV.. Given an l-qubit system, with a fully degenerate Hamiltonian, in a fully mixed state, there exists a process at temperature T, which for a work cost of l kt ln takes the system to a pure state, with the final Hamiltonian the same as the initial. When compressing information between the system and the memory, we allowed the state created to be at most δ-distant from a pure state (Section III). The following lemma shows how that affects the probability of failure of the work-extraction procedure. Supplementary Lemma IV.3. If the process described in Supplementary Theorem IV. is applied to a state that is δ-close to a pure state, it succeeds with probability at least δ. Proof. Consider the following setting: we are given one of two possible states, ρ and σ, with equal probability, and have to guess which of them we have. We are allowed to apply any physical process to the state, such as a measurement after a unitary evolution. The probability of successfully distinguishing the states is upperbounded by P (ρ, σ) = [ + D(ρ, σ)], where D(ρ, σ) is the trace distance between those states [40, Theorem. 9.]. To prove Supplementary Lemma IV.3, let σ be thepure state we expected, and ρ be δ-close to σ. An example of a process to distinguish two states is the work-extraction procedure described in Supplementary Theorem IV.. If the process is applied to the expected pure state, σ, the probability of error is zero and the quantity of work extracted is l kt ln. We denote the probability of failure of the work-extraction process for an arbitrary state, ρ, by p ρ. Our guessing strategy is the following. We apply the work-extraction process from Supplementary Theorem IV.. If we obtain an amount of work less than l kt ln, we know that the state was ρ, and say so. This happens with probability p ρ /. On the other hand, if we extract work l kt ln, we should say we had state σ: the probability that we extract all the work and had ρ is only ( p ρ )/, while the probability of having σ (and extracting the same amount) is /. In total, we will be right with probability [ + p ρ]. This guessing probability is upper bounded by P (ρ, σ), so p ρ D(σ, ρ). Since we imposed a maximum distance δ between the pure state σ and ρ, the probability of failure of the process is at most δ. 6 WWW.NATURE.COM/NATURE

V. ERASURE IN THE PERIOD-FINDING ALGORITHM In this section we demonstrate with a simple toy example that certain quantum algorithms can be made more thermodynamically efficient by a judicious application of memory erasure. A. The period-finding algorithm Given a function f as an input, a quantum computer can efficiently find the period of the function, i.e., the smallest number r such that f(x + r) =f(x) for all x (where x and r are assumed to be integers, and f to be integer valued). This seemingly innocent task has an important role, e.g., via order-finding in Shor s famous algorithm for efficient factoring [4]. In the following we give a brief account of a few pertinent aspects of the period finding algorithm. For more details the reader is referred to [40]. In the quantum computer the number x, regarded as a bit-string x = x n x 0, is represented as a computational basis state x = x n x 0 of a corresponding collection of qubits. Given a register X in state x X, and a register F in state 0 F, it is assumed that there exists a sub-routine (an oracle ) that can transform all states x X 0 F into x X f(x) F coherently. In other words, given x in register X, the value of the function, f(x), is put in register F. The first step of the period-finding algorithm is what we will refer to as a coherent evaluation of the function f, where the above operation is applied to a superposition of all possible inputs n x x X 0 F f = n x X f(x) F. (V.) The period-finding algorithm next proceeds by a clever choice of operations and measurements to extract the period of the function. We will refer to this as the periodextraction step of the the protocol. The details of the period-extraction procedure are irrelevant for our purposes. We only use the fact that the register F is not needed in any of the subsequent steps of the protocol. All the remaining operations and measurements can be done on register X alone, and register F can be discarded right after state ψ f has been formed. This may at first sight seem paradoxical, but the role of register F is to group all x that are mapped to the same value f(x) into a coherent superposition, while the coherence is destroyed between those groups. This can be seen from ψ f = n/ y ( x:f(x)=y x X) y F. According to our main result there exists a process that erases register F at the work cost H(F X) kt ln this conditional entropy is negative since X and F are entangled, so we would gain work. However, if we want to make such an erasure process part of the algorithm, x it must itself be computationally efficient, including an efficient description of the state to be erased. When thus restricting ourselves to efficient algorithms, we can reasonably only hope to achieve a part of the above work yield. In the following, we illustrate such partial work extraction with a toy example. The idea is to identify parts of the register F that are highly correlated with parts of register X, such that the state of these parts admits an efficient description, which in our case is even independent of the input, i.e., the function f. B. A toy example It is common to have some a priori information on the likelihood of the various inputs to a computational process. 3 Suppose, e.g., that we have the promise that with probability p the function f has a special dependence on the parity of the input, while with probability p it is arbitrary (and in both cases we assume they are drawn from some suitable ensembles of periodic functions). To specify the above-mentioned promise on f, note that a bit string can be decomposed as x = ( x, x 0 ), where x = x/, and where x 0 = x mod is the least significant bit, i.e., x 0 is 0 whenever x is even, and when x is odd. We then assume that with probability p there exists a function f such that f( x, x 0 )=( f( x),x 0 ). In particular, f maps even numbers to even numbers, and odd to odd. The coherent evaluation of such a function yields ψ f = n = n = x X f(x) F x x n x 0=0, x X x 0 X0 f( x) F x 0 F0 x X f( x) F x ( 0 X0 0 F0 + X0 F0 ). We thus find that the least significant qubits X 0 and F 0 of registers X and F are in the maximally entangled state χ = ( 00 + ). However, we also have to take into account that we get an arbitrary periodic function with probability p. As one intuitively may suspect, the reduced state of X 0 F 0 is to a good approximation close to the maximally mixed state when averaged over a sufficiently random ensemble of functions. 4 Hence, after the 3 Such a promise may be obtained simply by statistical analysis of the inputs, or by theoretical considerations. For example, if an overlying factoring algorithm is used to break cryptographic keys, the distribution of the inputs to the period finding algorithm can be inferred from the specification of the key generation scheme. 4 We can decompose the value of an arbitrary function as WWW.NATURE.COM/NATURE 7

coherent evaluation, the reduced state of the two qubits X 0 F 0 is the computer stores the output. The final state of these registers after the execution of the algorithm is ρ X0F 0 = p χ χ + ( p) 4 X 0F 0. ρ IO = f p f f f I output(f) output(f) O We have thus found a correlated state with a computationally trivial description (it is even independent of the input f). If we now consider the work cost of erasure in the iid limit 5 of erasing qubit F 0 conditioned on X 0 we find it to be proportional to (ideally, output(f) is the period of f). We want to make sure that if we insert an erasure procedure in the algorithm, as described here, the final state is not affected. To see that this is indeed the case, consider the state of the input register I and the two computer registers XF right before the erasure procedure, H(F 0 X 0 )= 4 ( + 3p) log ( + 3p) ρ IXF = f p f f f I ψ f ψ f XF, (V.) 3 4 ( p) log ( p). This is a monotonically decreasing function in p, which for sufficiently large p is negative (e.g., for p =3/4) and thus corresponds to a work gain in the erasure of F 0. C. The role of the information preservation condition The information preservation condition is crucial for the erasure process to maintain the correlations between the inputs and outputs of the computation. To see what we mean by this, imagine that we run the period finding algorithm over an ensemble of input functions, each with probability p f, storing the input f in an extra memory I. Similarly, we can imagine an output register O, where where ψ f is as defined in equation V.. The period extraction procedure can be described as a channel Φ X acting on register X alone, with an output in O. The joint state of the input and output memories can thus be written ρ IO =(I I Φ X )(Tr F ρ IXF ), (V.3) where the partial trace Tr F denotes that we have discarded register F (or simply ignored it), and I I denotes the identity channel on register I. An erasure of F (or a part of F ) conditioned on X (or a part of X) leads to a new state ρ IXF. However, by the information preservation condition, it follows that Tr F ρ IXF = Tr F ρ IXF. Note that I, and whatever parts of X and F that do not take part in the erasure, together with a purifying system, take the role of the reference. Hence, the state ρ IO, generated by the period extraction, is left unaffected. ( f( x, x0 ),f 0 ( x, x 0 ) ), where f 0 ( x, x 0 ) is the least significant bit of f( x, x 0 ). If the function f is picked with probability q f, the density operator of registers XF, after the coherent evaluation, is σ XF = f q f ψ f ψ f. It can be shown that the reduced density operator σ X0 F 0 = Tr X F σ XF is bounded as σ X0 F 0 4 w0 + w + q, where w 0 = (n ) x f 0 ( x, 0) f w = x (n ) f 0 ( x, ), f and q = (n ) { x : f( x, ) = f( x, } 0). f As an example, if we pick f uniformly over all functions f : {0, } n {0, } s, then w 0 = w =/, and q =/ s, i.e., the distance to the maximally mixed state is exponentially small in the number of bits in the output. 5 This application of the iid limit means that we run the computer on several independent inputs and apply the erasure jointly on all registers. While this iid consideration gives bounds that hold asymptotically, better estimates, for a finite number of runs, can be obtained using our one-shot bound., 8 WWW.NATURE.COM/NATURE

[] Leff, H. S. & Rex, A. F. Maxwell s demon: Entropy, information, computing (Taylor and Francis, 990). [] Leff, H. S. & Rex, A. F. Maxwell s demon : Entropy, classical and quantum information, computing (Taylor and Francis, 00). [3] Plenio, M. B. & Vitelli, V. The physics of forgetting: Landauer s erasure principle and information theory. Contemporary Physics 4, 5 60 (00). [4] Maruyama, K., Nori, F. & Vedral, V. Colloquium: The physics of Maxwell s demon and information. Reviews of Modern Physics 8, 3 (009). [5] Earman, J. & Norton, J. D. Exorcist XIV: The wrath of Maxwell s demon. part i. from Maxwell to Szilard. Studies in the History and Philosophy of Modern Physics 9, 435 47 (998). [6] Earman, J. & Norton, J. D. Exorcist XIV: The wrath of Maxwell s demon. Part II. from Szilard to Landauer and beyond. Studies in the History and Philosophy of Modern Physics 30, 40 (999). [7] Bub, J. Maxwell s demon and the thermodynamics of computation. Studies in History and Philosophy of Modern Physics 3, 569 579 (00). [8] Bennett, C. H. Notes on Landauer s principle, reversible computation and Maxwell s demon. Studies in History and Philosophy of Modern Physics 34, 50 50 (003). [9] Landauer, R. Dissipation and heat generation in the computing process. IBM Journal of Research and Development 5, 48 56 (96). [0] Szilard, L. Über die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. Zeitschrift für Physik 53, 840 856 (99). [] Shizume, K. Heat generation required by information erasure. Physical Review E 5, 3495 3499 (995). [] Piechocinska, B. Information erasure. Physical Review A 6, 0634 (000). [3] Janzing, D., Wocjan, P., Zeier, R., Geiss, R. & Beth, T. Thermodynamic cost of reliability and low temperatures: Tightening Landauer s principle and the second law. International Journal of Theoretical Physics 39, 77 753 (000). [4] Parrondo, J. M. R. The Szilard engine revisited: Entropy, macroscopic randomness, and symmetry breaking phase transitions. Chaos, 75 736 (00). [5] Allahverdyan, A. E., Balian, R. & Nieuwenhizen, T. I. M. Maximal work extractions from finite quantum systems. Europhysics Letters 67, 565 57 (004). [6] Alicki, R., Horodecki, M., Horodecki, P. & Horodecki, R. Thermodynamics of quantum information systems Hamiltonian description. Open Systems & Information Dynamics, 05 7 (004). [7] Maroney, O. J. E. Generalizing Landauer s principle. Physical Review E 79, 0305 (009). [8] Hilt, S., Shabbir, S., Anders, J. & Lutz, E. Validity of Landauer s principle in the quantum regime (00). URL http://arxiv.org/abs/004.599. [9] Dahlsten, O., Renner, R., Rieper, E. & Vedral, V. The work value of information (009). URL http://arxiv. org/abs/0908.044v. [0] Oppenheim, J., Horodecki, M., Horodecki, P. & Horodecki, R. Thermodynamical approach to quantifying quantum correlations. Physical Review Letters 89, 8040 (00). [] Zurek, W. H. Quantum discord and Maxwell s demons. Physical Review A 67, 030 (003). [] Jennings, D. & Rudolph, T. Entanglement and the thermodynamic arrow of time. Physical Review E 8, 630 (00). [3] Renner, R. Security of quantum key distribution. Ph.D. thesis, ETH Zurich (005). URL http://arxiv.org/ abs/quant-ph/0558. [4] Lieb, E. H. & Ruskai, M. B. A fundamental property of quantum-mechanical entropy. Physical Review Letters 30, 434 436 (973). [5] Lieb, E. H. Convex trace functions and the Wigner- Yanase-Dyson conjecture. Advances in Mathematics, 67 88 (973). [6] Lieb, E. H. & Ruskai, M. B. Proof of the strong subadditivity of quantum-mechanical entropy. Journal of Mathematical Physics 4, 938 94 (973). [7] Cover, T. M. & Thomas, J. A. Elements of Information Theory. Wiley Series in Telecommunications (Wiley- Interscience, 00). [8] Schoenmakers, B., Tjoelker, J., Tuyls, P. & Verbitskiy, E. Smooth Rényi entropy of ergodic quantum information sources (007). URL http://arxiv.org/abs/0704.3504. [9] Bjelaković, I., Krüger, T., Siegmund-Schultze, R. & Szko la, A. The Shannon-McMillan theorem for ergodic quantum lattice systems. Inventiones Mathematicae 55, 03 (004). [30] Jaynes, E. T. Gibbs vs Boltzmann entropies. American Journal of Physics 33, 39 398 (965). [3] Renes, J. M. & Renner, R. One-shot classical data compression with quantum side information and the distillation of common randomness or secret keys (00). URL http://arxiv.org/abs/008.045. [3] König, R., Renner, R. & Schaffner, C. The operational meaning of min- and max-entropy. IEEE Transactions on Information Theory 55, 4337 4347 (009). [33] Datta, N. & Renner, R. Smooth Rényi entropies and the quantum information spectrum. IEEE Transactions on information theory 55, 807 85 (009). [34] Tomamichel, M., Colbeck, R. & Renner, R. Duality between smooth min- and max-entropies. IEEE Transactions on information theory 56, 4674 468 (00). [35] Renner, R. Symmetry of large physical systems implies independence of subsystems. Nature Physics 3, 645 649 (007). [36] Tomamichel, M., Colbeck, R. & Renner, R. A fully quantum asymptotic equipartition property. IEEE Transactions on information theory 55, 5840 5847 (009). [37] Horodecki, M., Oppenheim, J. & Winter, A. Partial quantum information. Nature 436, 673 676 (005). [38] Dupuis, F. The decoupling approach to quantum information theory. Ph.D. thesis, Université de Montréal (009). URL http://arxiv.org/abs/004.64. [39] Dupuis, F., Berta, M., Wullschleger, J. & Renner, R. The decoupling theorem (00). URL http://arxiv. org/abs/0.6044. [40] Nielsen, M. L. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 000). [4] Shor, P. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Journal on Computing 6, 484 509 (997). WWW.NATURE.COM/NATURE 9