Quantum Statistics -First Steps Michael Nussbaum 1 November 30, 2007 Abstract We will try an elementary introduction to quantum probability and statistics, bypassing the physics in a rapid first glance. From a formal point of view, classical probability distributions on finite sets are replaced by matrices, and certain operations on these matrices (called measurements ) give rise to random variables. An important class of measurements are projection operators. It will be seen that quantum statistics has a distinct "linear algebra flavor". We will then describe the problem of quantum hypothesis testing and its relation to optimal measurements, and the quantum version of the Neyman-Pearson lemma. We will conclude with the quantum analog of a classical large sample result for Bayesian testing. 1 Mathematical setting This discussion focuses on the aspect of generalizing classical probability. For an introduction with more physical background, see e.g. Gill [3]. A. Finite probability spaces and random variables. Let us first recall some basic and simple facts about classical probability. Consider a finite sample space S, whichfor convenience we take to be the numbers S = {1,...,k}. Assume a probability distribution P given by p j, j =1,...,k ( P k p j =1, p j 0). Together they form a finite probability space (S, P ). A random variable is a map : S 7 R taking real values (j) =x j, j =1,...,k.If is one-to-one then Pr ( = x j )=p j, (1) in general takes values x j with probability Pr ( = x j )= p j. i:(i)=x j In any case, the expectation of under P is E P = (j)p j. 1 Dept. of Mathematics, Cornell University 1
B. First generalization. Quantum probability uses complex numbers, but we start with a simplified version using only real numbers. States. The state of a physical system is a symmetric nonnegative definite k k matrix ρ = k ρ ij having trace 1: i, Tr [ρ] := ρ ii =1. i=1 Recall that ρ is symmetric if ρ = ρ > and ρ is nonnegative definite if for any x R k we have x > ρx 0. For brevity we call such matrices positive (and strictly positive if x > ρx > 0 for all x). Recall the spectral decomposition of a symmetric matrix: ρ = λ j e j e > j (2) where e 1,...,e k is an orthonormal basis of R k ;thee j are the eigenvectors pertaining to the real eigenvalues λ j. Another way of writing (2) is ρ = CΛC > where Λ is the diagonal matrix with diagonal elements λ j and C is the k k matrix having e 1,...,e k as columns. Then C is an orthogonal matrix, i.e. C > C = I. The eigenvectors e j are uniquely determined if all eigenvalues λ j are different. When all eigenvalues are the same: λ j = λ then ρ = λi and e 1,...,e k can be chosen as any orthonormal basis. If there are only two different eigenvalues, λ 0 and λ 1 say, then there are two eigenspaces (linear subspaces of R k ) within which two bases can be arbitrarily chosen. Measurements. Suppose we have different mechanisms which from a given states ρ generate different probability distributions. Such a mechanism is called a measurement. Ameasure- ment M is defined to be a symmetric k k matrix M (notethatitisnotrequiredtobe positive). Let M = P k x jm j m > j be the spectral decomposition of M; the eigenvalues x 1,...,x k can be any real numbers. We postulate that M generates a random variable with values in {x 1,...,x k } in the following way. Definition 1 The random variable M generated by the measurement M = P k x jm j m > j takes real values x 1,...,x k.ifthex j are all different then Pr ( M = x j )=m > j ρm j. In general Pr ( M = x j )= i:x i =x j m > i ρm i. (3) 2
Let us check that this indeed gives a probability distribution. First m > j ρm j 0 since ρ is positive. Furthermore m > j ρm j =1. Proof. To see this note that for any a, b R k,wemaywriteforthethek k matrix ab > Setting a = m > j, b = ρm j we obtain h Tr ab >i = m > j ρm j = a j b j = a > b. h i Tr ρm j m > j. Also the trace operation is linear on matrices A, B: Tr [A]+Tr[B] =Tr[A + B]. Hence h i Tr ρm j m > j =Tr ρ m j m > j. Now P k m jm > j = I since the left side is a spectral decomposition of the unit matrix I (recall that any orthonormal system can be chosen for I and the eigenvalues are all 1). Hence the right side above is Tr [ρ] =1 by the assumption on the state ρ. As a consequence we may write the expectation of M under ρ E ρ M = x j m > j ρm j. Applying the same reasoning as in the proof above, including now the real numbers x j we may write x j m > j ρm j =Tr ρ x j m j m > j =Tr[ρM]. Thus we have shown Proposition 2 (Trace rule) The random variable M defined above from the measurement M has expectation E ρ M =Tr[ρM]. Note that we write E ρ for the expectation, i.e. "expectation under the state ρ". Different measurements M give different distributions of the random variable M. In statistics, to 3
discriminate between two possible states, one now has to select a measurement first, from which one obtains two different distributions of the random variable M. Then one has to discriminate between these two distributions by means of a classical test. Classical probability as a special case. Suppose our state ρ is a diagonal matrix ρ 11 0 ρ =.... 0 ρ kk Setting p j = ρ jj we obtain P k p j = Tr[ρ] = 1 and p j = η > j ρη j 0 where η > j = (0,...,1,...,0) with the 1 at the j-th position. Thus a diagonal state ρ gives a classical probability distribution p j, j =1,...,k. Suppose also that we admit only one kind of measurement: we fix the orthonormal basis η 1,...,η k and allow all measurements M = P k x jη j η > j where the real values x 1,...,x k are arbitrary. Then M is also a diagonal matrix x 1 0 M =... 0 x k By definition 1 we obtain a bunch of random variables M where Pr ( M = x j )=η > j ρη j = p j i.e. we have reproduced (1). That means we have obtained all random variables on a given probability space (S, P ) where P is the measure p 1,...,p k and S = {1,...,k}. It turns out that fixing the set of eigenvectors of the measurement M reduces the setup to the classical one. It can be seen that we may fix any other set of eigenvectors m 1,...,m k of the measurement and fix anarbitrystateρ, and in this way obtain all random variables on a given probability space (S, P 0 ). Then P 0 is the measure p 0 j = m> j ρm j,,...,k. In the physical context, the set of eigenvectors m 1,...,m k of the measurement determines "directions" or "angles" in which one measures. Changing the angle means changing the underlying probability space. An example: qbits. Aqbit(quantumbit)isa2 2 state. (Recall however that we are not yet in "true" quantum probability which requires use of complex numbers). Such states generalize Bernoulli distributions: any random variable M from a measurement M can at most take two different values x 1, x 2. Consider the qbit µ 1 ε 0 ρ = 0 ε which is diagonal. a) If we admit only diagonal measurements then we obtain all functions of a Bernoulli random variable with distribution Bern(ε). Indeed suppose we measure with system of eigenvectors (1, 0), (0, 1). Then we obtain the probability space (S, P ) with S = {1, 2} and P =Bern(ε) (identifying formally S with {0, 1}). b) If we admit only measurements with system of eigenvectors m 1,m 2 where m > 1 =(cosφ, sin φ), 4
m > 2 =( sin φ, cos φ) then we obtain the probability space (S, P 0 ) with S = {1, 2} and P 0 given by p 0 1 = m > 1 ρm 1 =(1 ε)cos 2 φ + ε sin 2 φ, p 0 2 = m > 2 ρm 2 =(1 ε)sin 2 φ + ε cos 2 φ. We may choose φ = π/4 so that cos φ =sinφ =1/ 2,then p 0 1 = 1/2. p 0 2 = 1/2 which means we have the uniform distribution Bern(1/2). Here is a small bit of physical background: the famous Stern Gerlach double slit experiment (1922) on the deflection of particles was used to show how the angle φ in which a certain binary outcome (let s call it "spin up or down") was measured affects the probability distribution of the outcome. For one angle a uniform distribution resulted, in another angle a non-uniform distribution like (1 ε, ε). Observe that in our example it is possible that ε =0: then the first measurement results in a probability distribution (1, 0) on the set of outcomes S and the second measurement still results in a uniform distribution (1/2, 1/2). Thisiseven more baffling: an outcome which appears deterministic when measured in one angle appears random with uniform distribution when measured in another angle. Projection measurements and hypothesis testing. In the last example we have seen measurements with only two possible outcomes. Such measurements can be made on arbitrary states ρ, not just qbits. In that case one uses projection measurements. Recall that a projection matrix is a symmetric matrix M which has only eigenvalues x j =0or x j =1. Thus M = x j m j m > j = m j m > j j µ where µ is a subset of the indices {1,...,k}. If µ = {1,...,k} then M = I and if µ = {j} then M = m j m > j is a projector of rank one. In every case we see that M fulfills MM = M and Mx = x for all x in the linear subpace spanned by {m j, j µ}. That subspace is called the eigenspace of M; within the eigenspace we can freely change the basis from {m j, j µ} to { m j, j µ} and still obtain M = m j m > j. j µ Also, I M is again a projection, and projects onto the orthogonal complement of the eigenspace of M. Projections are the measurements used in quantum hypothesis testing; they come up in the following way. Suppose we have two states ρ, σ andwewishtomeasureandthendecide which one of the two states is the true one. So one has to produce a random variable M by measurement and then decide according to the outcome. We have to select M not knowing which state is the true one. Suppose we have selected an arbitrary M = P k x jm j m > j ; accordingtodefinition 1 we obtain two probability distributions Pr ( M = x j ρ) = m > j ρm j =: p j, Pr ( M = x j σ) = m > j σm j =: q j,,...,k. 5
(assume that all x j are different; indeed we might want this to obtain maximal information). After this the problem becomes classical: find a test, i.e. a function ϕ : {x 1,...,x k } 7 {0, 1} and decide "true state is σ" if the random variable ϕ ( M ) takes value 1. Thus effectively we use the Bernoulli random variable ϕ ( M ) for our decision. It has distribution given by Pr (ϕ ( M )=1 ρ) = p j = m > j ρm j =Tr ρ m j m > j j:ϕ(x j )=1 j:ϕ(x j )=1 j:ϕ(x j )=1 Now define the projection matrix M ϕ = m j m > j, j:ϕ(x j )=1 then Pr (ϕ ( M )=1 ρ) = Tr[ρM ϕ ], Pr (ϕ ( M )=1 σ) = Tr[σM ϕ ] and as a consequence Pr (ϕ ( M )=0 ρ) = Tr[ρ (I M ϕ )], Pr (ϕ ( M )=0 σ) = Tr[σ (I M ϕ )] where I M ϕ is also a projection. Thus we can identify a quantum test with a projection matrix M ϕ. Here M ϕ can be of any rank and pertaining to any eigenspace; we have not fixed the classical test ϕ (even included the constants ϕ =1or ϕ =0) nor the vector basis m 1,...,m k. The last four displays determine the error probabilities: the error of first kind is and the error of second kind is Err 1 (M ϕ )=Pr(ϕ ( M )=1 ρ) =Tr[ρM ϕ ] Err 2 (M ϕ )=Pr(ϕ ( M )=0 σ) =Tr[σ (I M ϕ )]. C. Second generalization: actual quantum probability. Before we come to the problem of finding the best test between ρ and σ, let us pay tribute to the fact that quantum probability uses complex numbers. States. The state of a physical system is a self adjoint positive k k matrix ρ = ρ ij k i, with complex elements having trace 1: Tr [ρ] := ρ ii =1. i=1 Recall that ρ is self-adjoint (or Hermitian) if ρ = ρ where ρ is the complex conjugate transpose: take the transpose ρ > first and then all complex conjugates of the elements (or 6
the other way round). Also, ρ is positive means that for any complex vector x C k we have x ρx 0. It is well known that Hermitian matrices have a spectral decomposition analogous to real symmetric ones: ρ = λ j e j e j (4) where e 1,...,e k is an orthonormal basis of C k ;thee j are the eigenvectors pertaining to the real eigenvalues λ j. In fact it can be shown that all eigenvalues for selfadjoint ρ must be real, the same is true for all diagonal elements ρ jj : if λ j is the complex conjugate of λ j and v j is any vector from C k then λ j = λ j = v j ρv j = v j ρ v j = v j ρv j = λ j (here we used (AB) = B A and A = A for any complex matrices, but ρ = ρ for selfadjoint ρ), hence λ j = λ j and λ j is real. Thus Tr [ρ] is always real and can be set 1. Basically everything is analogous to the real case if R k is replaced by C k and the transpose A > of any matrix (and any vector) is replaced by the complex conjugate transpose (or adjoint) A. Measurements. AmeasurementM is defined to be a selfadjoint k k matrix M (again not required to be positive). Let M = P k x jm j m j be the spectral decomposition of M; the eigenvalues x 1,...,x k can be any real numbers. We postulate that M generates a real random variable with values in {x 1,...,x k } as before: if the x j are all different then Pr ( M = x j ρ) =m jρm j. Asaboveitisseenthatallm j ρm j are real, nonnegative and they sum to one. Then the trace rule holds E ρ M =Tr[ρM] and again classical probability is a special case since the whole "simplified" setup above with real states ρ is a special case. But we may also fix anybasism 1,...,m k in C k and limit ourselves to measurements P k x jm j m j ; this also gives a classical probability space. 2 Quantum Neyman-Person lemma Again consider hypothesis testing between states ρ and σ, but now these are states with complex elements. As above it can be shown that the procedure of obtaing a real valued r.v. by measurement M and then apply classical testing is equivalent to generating a Bernoulli random variable ϕ ( M )=ϕ using a projection measurement M, and the error probabilities are Err 1 (M) = Pr(ϕ =1 ρ) =Tr[ρM], Err 2 (M) = Pr(ϕ =0 σ) =Tr[σ (I M)]. 7
In what follows a quantum test is a complex projection matrix M, i.e. M is selfadjoint and has only eigenvalues 0 and 1. Suppose we have prior probabilities 1 π,π on the states ρ, σ, i.e. π is the a priori probability that the true state is σ. The Bayesian error probability is Err(M) =(1 π)err 1 (M)+πErr 2 (M). To find the best (Bayesian) test, let us define for any selfadjoint matrix A the expression suppa + : if A = P k α ja j a j is a spectral decomposition then suppa + = a j a j. j:α j >0 It obvious that suppa + is always a projection (and independent of the choice of basis a j if this choice is not unique). This projection is called the support projection for the positive part A + of A (with obvious definition A + = P j:α j >0 α ja j a j ). If A is strictly positive then suppa + is trivial: suppa + = I. Theorem 3 (Holevo-Helstrom). Suppose 0 < π < 1. All tests M fulfill Err (M) Err (R) where R is the test R = supp (πσ (1 π) ρ) +. Note that the matrix πσ (1 π) ρ is selfadjoint but not positive (a difference of two positives). Thus R is not trivial. However if both ρ, σ are diagonal then R corresponds to the Bayesian likelihood ratio test: R is diagonal with diagonal elements whichinthecaseofπ =1/2 reduces to r ii = 1 {πσ ii (1 π) ρ ii > 0}, i =1,...,k r ii = 1 {σ ii >ρ ii }, i =1,...,k However in the general case, even when π =1/2 the best test R is does not have such an explicit expression: then R = supp (σ ρ) + which cannot in general be expressed in terms of the two eigenbases involved (or ρ and of σ). In fact the eigenbasis of σ ρ is neither that of ρ nor of σ and there is no explicit expression known. Facts like this, i.e. problems associated to behaviour of eigenbases of composed matrices, make out much of the challenge of quantum statistics. Proof. Write the error probability Err(M) = (1 π)tr[ρm]+πtr [σ (I M)] = = π +Tr[((1 π) ρ πσ) M] = π Tr [(πσ (1 π) ρ) M]. 8
Thus to minimize Err(M) we have to maximize Tr [(πσ (1 π) ρ) M] for projections M. Let A = P k α ja j a j be a spectral decomposition of then A =(πσ (1 π) ρ), Tr [AM] = α j a jma j Now clearly the numbers a j Ma j are between 0 and 1 (indeed a j Ma j 0 since M is positive, a j Ma j 1 since the largest eigenvalue of m is one). This implies the above sum cannot exceed the sum of all positive α j,i.e. Tr [AM] j:α j >0 α j This upper bound is attained for M = R = a j a j, j:α j >0 indeed: Ã! Tr [AR] = α j a jra j = α j a j a i a i = j:α j >0 α j. i:α i >0 a j 3 Asymptotics for quantum hypothesis testing What is the analog of n i.i.d. data in the quantum setting? Consider the n-fold tensor product of a density matrix (or state) ρ, i.e. ρ n. Recall that the tensor product of two matrices A, B is given by a 11 B... a 1k B A B =......... a k1 B... a kk B which is a matrix of dimension k 2 k 2.Itcanbeshownρ ρ is again a state, i.e it is positive and has trace one (for that check Tr [A B)] = Tr [A] Tr [B]). It can also be verified that if ρ is diagonal and we limit ourselves to diagonal measurements then the classical notion of product measure is obtained. Now for a large sample asymptotics with n, one assumes the state is ρ n = ρ ρ... ρ. This is a k n k n matrix. In the testing problem, we have to discriminate between ρ n and σ n, using a test measurement M on the whole system, i.e. M is a k n k n projection matrix 9
(a projection in C kn ). The error criterion for symmetric Bayesian hypothesis testing (with π =1/2) is Err n (M) = 1 Tr ρ n M +Tr σ n (I M) 2 where I is the identity operator in C kn. AccordingtoTheorem3thebesttestistheHolevo- Helstrom projection R n = supp σ n ρ n + If we are intested in the asympotics of the errror probability as n, we are faced with the fact that the Holevo-Helstrom projection uses the eigenbasis of σ n ρ n. As noted already, σ n ρ n has a completely different eigenbasis than either σ n or ρ n. Also, this is a computation in k n -dimensional space. The following generalizes a classical result on the asymptotics of the Bayesian error probability for π =1/2, the Chernoff bound. Theorem 4 [Quantum Chernoff Lower Bound] Let ρ, σ be two k k density matrices representing quantum states. Then any sequence of k n k n test projections M n, n N, satisfies lim inf n 1 n log Err n(m n ) inf log Tr ρ 1 s 0 s 1 0 ρ s 1. (5) For the proof and further references see [1]. The lower bound is attainable, cf. [2]. For quantum computing, entanglement and paradoxes cf [4]. References [1] Nussbaum, M. and Szkoła, A. (2007). The Chernoff lower bound for symmetric quantum hypothesis testing. To appear, The Annals of Statistics. Available under www.minu.de/math/papers. [2] Audenaert, K. M. R., Nussbaum, M., Szkoła, A. and Verstraete, F. (2007). Asymptotic Error Rates in Quantum Hypothesis Testing. ariv:0708.4282v1 [quant-ph], To appear, Commun. Math. Phys [3] Gill, R. (2001). Asymptotics in quantum statistics. In: State of the Art in Probability and Statistics (A.W. van der Vaart, M. de Gunst, C.A.J. Klaassen, Eds.), IMS Lecture Notes - Monograph Series 36, 255-285. Also at ariv:math/0405571v1 [4] Nielsen, M. and Chuang, I. (2000). Quantum Computation and Quantum Information, Cambridge University Press 10