DISCRIMINATING two distributions is treated as a fundamental problem in the field of statistical inference. This problem

Similar documents
Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

arxiv: v2 [quant-ph] 7 Apr 2014

On Composite Quantum Hypothesis Testing

Arimoto Channel Coding Converse and Rényi Divergence

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

THE ANALYTICAL EXPRESSION OF THE CHERNOFF POLARIZATION OF THE WERNER STATE

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels

EECS 750. Hypothesis Testing with Communication Constraints

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Information Theory and Hypothesis Testing

Lecture 8: Information Theory and Statistics

Extremal properties of the variance and the quantum Fisher information; Phys. Rev. A 87, (2013).

Estimating entanglement in a class of N-qudit states

A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks

Soft Covering with High Probability

Different quantum f-divergences and the reversibility of quantum operations

arxiv: v1 [quant-ph] 31 Aug 2007

Lecture 8: Information Theory and Statistics

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

Hypothesis Testing with Communication Constraints

arxiv: v1 [math.pr] 24 Sep 2009

Bounds for entropy and divergence for distributions over a two-element set

Multiplicativity of Maximal p Norms in Werner Holevo Channels for 1 < p 2

A Holevo-type bound for a Hilbert Schmidt distance measure

List coloring hypergraphs

The Sherrington-Kirkpatrick model

Strong log-concavity is preserved by convolution

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information

arxiv: v3 [quant-ph] 5 Jun 2015

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

L 2 -induced Gains of Switched Systems and Classes of Switching Signals

Large Deviations Performance of Interval Algorithm for Random Number Generation

Auxiliary signal design for failure detection in uncertain systems

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY

COMMUTATIVITY OF OPERATORS

Journal of Inequalities in Pure and Applied Mathematics

Unitary Process Discrimination with Error Margin

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Optimal Min-Max Tests for Quantum Hypothesis Testing Problems on Gaussian States

On Improved Bounds for Probability Metrics and f- Divergences

The Method of Types and Its Application to Information Hiding

Large scale correlation detection

Quiz 2 Date: Monday, November 21, 2016

Quantum Minimax Theorem (Extended Abstract)

Unambiguous Discrimination Between Linearly Dependent States With Multiple Copies

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Concentration Inequalities

Symmetrizing the Kullback-Leibler distance

A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees

Applications of Information Geometry to Hypothesis Testing and Signal Detection

On the Logarithmic Calculus and Sidorenko s Conjecture

Capacity of AWGN channels

Upper Bounds on the Capacity of Binary Intermittent Communication

arxiv:math/ v1 [math.st] 19 Jan 2005

Ch 4. Linear Models for Classification

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Norm inequalities related to the matrix geometric mean

The 123 Theorem and its extensions

WE start with a general discussion. Suppose we have

Contents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16

arxiv: v4 [cs.it] 17 Oct 2015

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Arimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract

Limiting Negations in Non-Deterministic Circuits

Some New Information Inequalities Involving f-divergences

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing

Belief Propagation, Information Projections, and Dykstra s Algorithm

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Algebraic Information Geometry for Learning Machines with Singularities

arxiv: v1 [math.fa] 2 Jan 2017

Quantum Statistics -First Steps

Scaling Limits of Waves in Convex Scalar Conservation Laws Under Random Initial Perturbations

CONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir

Scaling Limits of Waves in Convex Scalar Conservation Laws under Random Initial Perturbations

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015)

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Information measures in simple coding problems

Computational and Statistical Learning theory

Research Article On Some Improvements of the Jensen Inequality with Some Applications

Lecture 7 Introduction to Statistical Decision Theory

Jensen-Shannon Divergence and Hilbert space embedding

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Testing Statistical Hypotheses

An Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle and Bayes Codes

Quantum Fisher information and entanglement

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER

On Independence and Determination of Probability Measures

arxiv: v1 [math.st] 26 Jun 2011

arxiv: v3 [quant-ph] 2 Jun 2009

Quasi-Stationary Simulation: the Subcritical Contact Process

Expansion and Isoperimetric Constants for Product Graphs

Goodness of Fit Test and Test of Independence by Entropy

The sample complexity of agnostic learning with deterministic labels

Transcription:

Discrimination of two channels by adaptive methods and its application to quantum system Masahito Hayashi 1 arxiv:0804.0686v1 [quant-ph] 4 Apr 2008 Abstract The optimal exponential error rate for adaptive discrimination of two channels is discussed. In this problem, adaptive choice of input signal is allowed. This problem is discussed in various settings. It is proved that adaptive choice does not improve the exponential error rate in these settings. These results are applied to quantum state discrimination. Index Terms Simple hypothesis testing, Channel, Discrimination, Quantum state, One-way LOCC, Active learning, Experimental design, Stein s lemma, Chernoff bound, Hoeffding bound, Han-Kobayashi bound I. INTRODUCTION DISCRIMINATING two distributions is treated as a fundamental problem in the field of statistical inference. This problem can be regarded as simple hypothesis testing because both hypotheses consist of a single distribution. Many researchers, Stein, Chernoff[3], Hoeffding[16], and Han-Kobayashi[10] have studied the asymptotic behavior when the number n of identical and independent observations is sufficiently large. They formulated a simple hypothesis testing/discrimination of two distributions as an optimization problem and derived the respective optimum value, e.g., the optimal exponential error rate. We call these optimum values the Stein bound, the Chernoff bound, the Hoeffding bound, and the Han-Kobayashi bound, respectively. Han [8], [9] later extended these results to the discrimination of two general sequences of distributions, including the Markovian case. Nagaoka-Hayashi [21] simplified Han s discussion and generalized Han s extension of the Han-Kobayashi bound. In the present paper, we consider another extension of the above results. That is, we extend the above results to the discrimination of two (classical) channels, in which two probabilistic transition matrices are given. Such a problem has appeared in Blahut[2]. In this problem, the number of applications of this channel is fixed to a given constant n, and we can choose appropriate inputs for this purpose. In this case, we assume that the given channel is memoryless. If we use the same input to all applications of the given channel, the n output data obeys an identical and independent distribution. This property holds even if we choose the input randomly based on the same distribution on input signals. This strategy is called the non-adaptive method. In particular, when the same input is applied to all channels, it is called the deterministic non-adaptive method. If the input is determined stochastically, it is called the stochastic non-adaptive method, which was treated by Blahut[2]. In the non-adaptive method, our task is choosing the optimal input for distinguishing two channels most efficiently. In the present paper, we assume that we can choose the k-th input signal based on the preceding k 1 output data. This strategy is called the adaptive method, which is the main focus of the present paper. In the parameter estimation, such an adaptive method improves estimation performance. That is, in the one-parameter estimation, the asymptotic estimation error is bounded by the inverse of the optimum Fisher information. However, if we do not apply the adaptive method, it is generally impossible to realize the optimum Fisher information in all points at the same time. It is known that the adaptive method realizes the optimum Fisher information in all points[13], [7]. Therefore, one may expect that the adaptive method improves the performance of discriminating two channels. As our main result, we succeeded in proving that the adaptive method cannot improve the non-adaptive method in the sense of all of the above mentioned bounds, i.e., the Stein bound, the Chernoff bound, the Hoeffding bound, and the Han- Kobayashi bound. That is, there is no difference between the non-adaptive method and the adaptive method in these asymptotic formulations. Indeed, as is proven herein, the deterministic non-adaptive method gives the optimum performance with respect to the Stein bound, the Chernoff bound, and the Hoeffding bound. However, in order to attain the Han-Kobayashi bound, in general, we need the stochastic non-adaptive method. On the other hand, the research field in quantum information has treated the discrimination of two quantum states. Hiai- Petz[15] and Ogawa-Nagaoka[18] proved the quantum version of Stein s lemma. Audenaert et al. [1] and Nussbaum-Szkoła [23], [24] obtained the quantum version of the Chernoff bound. Ogawa-Hayashi [17] derived a lower bound of the quantum version of the Hoeffding bound. Later, Hayashi [12] and Nagaoka [20] obtained its tight bound based on the results by Audenaert et al. [1] and Nussbaum-Szkoła [23], [24]. Hayashi [11] (in p.90) obtained the quantum version of the Han-Kobayashi bound based on Nagaoka[19] s discussion. These discussions M. Hayashi is with Graduate School of Information Sciences, Tohoku University, Aoba-ku, Sendai, 980-8579, Japan (e-mail: hayashi@math.is.tohoku.ac.jp)

2 assume that any measurement on the n-tensor product system is allowed for testing the given state. Hence, the next goal is the derivation of these bounds under some locality restrictions on an n-partite system for possible measurements. One easy setting is restricting the present measurement to be identical to that in the respective system. In this case, our task is the choice of the optimal measurement on the single system. By considering the measurement and the quantum state as the input and the channel, respectively, we can treat this problem by the non-adaptive method of the classical channel. Another setting is restricting our measurement to one-way local operations and classical communications (one-way LOCC). In the abovementioned correspondence, the one-way LOCC setting can be regarded as the adaptive method of the classical channel. Hence, applying the above argument to discrimination of two quantum states, we can conclude that one-way communication does not improve discrimination of two quantum states in the respective asymptotic formulations. Furthermore, the same problem appears in adaptive experimental design and active learning. In learning theory, we identify the given system by using the obtained sequence of input and output pairs. In particular, in active learning, we can choose the inputs using the preceding data. Hence, the present result indicates that active learning does not improve the performance of learning when the candidates of the unknown system are given by only two classical channels. In experimental design, we choose suitable design of our experiment for inferring the unknown parameter. Adaptive improvement for the design is allowed in adaptive experimental design. When the candidates of the unknown parameter are only two values, the obtained result can be applied. That is, adaptive improvement for design does not work. The remainder of the present paper is organized as follows. Section II reviews the Stein bound, the Chernoff bound, the Hoeffding bound, and the Han-Kobayashi bound in discrimination of two probability distributions. In Section III, we present our formulation and notations of the adaptive method in the discrimination of two (classical) channels, and discuss the adaptivemethod versions of the Stein bound, the Chernoff bound, the Hoeffding bound, and the Han-Kobayashi bound, respectively. In Section IV, we consider a simple example, in which the stochastic non-adaptive method is required for attaining the Han- Kobayashi bound. In Section V, we apply the present result to discrimination of two quantum states by one-way LOCC. In Sections VI, VII, and VIII, we prove the adaptive-method versions of Stein bound, the Chernoff bound, the Hoeffding bound, and the Han-Kobayashi bound, respectively. II. DISCRIMINATION/SIMPLE HYPOTHESIS TESTING BETWEEN TWO PROBABILITY DISTRIBUTIONS In preparation for the main topic, we review the simple hypothesis testing problem for the null hypothesis H 0 : P n versus the alternative hypothesis H 1 : P n, where P n and P n are the n-th identical and independent distributions of P and P, respectively on the probability space Y. The problem is to decide which hypothesis is true based on n outputs y 1,..., y n. In the following, randomized tests are allowed as our decision. Hence, our decision method is described by a [0, 1]-valued function f on Y n. When we observe n outputs y 1,...,y n, we accept the alternative hypothesis P with the probability f(y 1,..., y n ). We have two types of errors. In the first type, the null hypothesis P is rejected despite being correct. In the second type, the alternative P is rejected despite being correct. Hence, the first type of error probability is given by E P nf, and the second type of error probability is by E P n(1 f). Note that E P describes the expectation under the distribution P. In the following, we assume that Φ(s P P) := φ(s P P) := log Φ(s P P) Y ( P P (y))s P(dy) < and φ(s P P) is C 2 -continuous. In the present paper, we choose the base of the logarithm to be e. In the discrimination of two distributions, we treat two types of probabilities equally. Then, we simply minimize the equal sum E P nf + E P n(1 f). Its optimal rate of exponential decrease is characterized by the Chernoff bound[3]: 1 C(P, P) := n n log(min f n E P nf n + E P n(1 f n )) = min φ(s P P). 0 s 1 In order to treat these two error probabilities asymmetrically, we often restrict the first type of error probability E P nf to below a particular threshold ǫ, and minimize the second type of error probability E P n(1 f): (ǫ) := min EP n(1 f) E P nf ǫ }. Then, the Stein s lemma holds. For 0 < ǫ < 1, the equation holds, where the relative entropy D(P P) is defined by D(P P) = β n f 1 n n log β n(ǫ) = D(P P) (1) Y log P P (y)p(dy).

3 Indeed, this lemma has the following variant form. Define } loge B(P P) := P n(1 f n ) f n} n n E P nf n = 0 n } B loge (P P) := inf P n(1 f n ) f n} n n E P nf n < 1. n Then, these two quantities satisfy the following relations: B(P P) = B (P P) = D(P P). As a further analysis, we focus on the decreasing exponent of the error probability of the first type under an exponential constraint for the error probability of the second type. When the decreasing exponent of for the error probability of the second type is greater than the relative entropy D(P P), the error probability of the second type converges to 1. In this case, we focus on the decreasing exponent of the probability of correctly accepting the null hypothesis P. For this purpose, we define B e (r P P) := f n} B e(r P P) := inf f n} Then, the two quantities are calculated as B e (r P P) = Be (r P P) = min n n loge P nf n n loge P n(1 f n ) n } log E P n(1 f n ) r n n loge P n(1 f n ) n n sr φ(s P P) min D(Q P) = Q:D(Q P) r 0 s 1 D(Q P) + r D(Q P) = Q:D(Q P) r s 0 The first expressions of (2) and (3) are illustrated by Figs. 1 and 2. } r. (2) sr φ(s P P). (3) P D( Q P) Q = r D( Q P) P Fig. 1. Figure of B e(r P P) D( Q P) = r Q P P Fig. 2. Figure of Be (r P P) when r0 r D(P P)

4 Now, we define the new function B(r): Be (r P P) r D(P P) B e (r) := Be (r P P) r > D(P P). Then, its graph is shown in Fig. 3. Graph of Be ( r ) Be ( r ) D( P P ) C ( P, P ) D( P P ) r 0 r Fig. 3. Graph of B e(r) In order to give other characterizations of (2), we introduce a one-parameter family P s,p,p (dy) := which is abbreviated as P s. Then, since φ(s) is C 1 continuous, 1 Φ(s P P) ( P P (y))s P(dy), D(P s P 1 ) = (s 1)φ (s) φ(s) s (, 1] (4) D(P 0 P s ) = φ(s) sφ (0) s [0, ). (5) Since d(s 1)φ (s) φ(s) = φ (s) < 0, ds D(P s P 1 ) is monotonically decreasing with respect to s. As is mentioned in Theorem 4 of Blahut [2], when r D(P P), there exists s r [0, 1] such that Then, (4) and (5) imply that Thus, we obtain another expression. On the other hand, min D(Q P) = D(P sr P 0 ). Q:D(Q P) r r = D(P sr P 1 ) = (s r 1)φ(s r ) φ(s r ). min D(Q P) = min D(P s P). (6) Q:D(Q P) r s [0,1]:D(P s P) r d sr φ(s P P) = r + (s 1)φ (s) φ(s) ds () 2 = D(P s P 1 ) () 2. (7) d sr φ(s P P) Since D(P s P 1 ) is monotonically decreasing with respect to s, ds 1 s = 0 if and only if s = s r. The equation can be checked. sr φ(s P P) min D(Q P) = Q:D(Q P) r 0 s 1 (8)

5 In the following, we present some explanations concerning (3). As is mentioned by Han-Kobayashi[10] and Ogawa- Nagaoka[18], when r 0 := D(P P 1 ) r D(P P), the relation holds, where s r (, 0] is defined as Thus, similar to (6) and (8), the relation B e(r P P) = D(P sr P 0 ) r = D(P sr P 1 ) = (s r 1)φ(s r ) φ(s r ). sr φ(s P P) min D(Q P) + r D(Q P) = D(P sr P) = Q:D(Q P) r s 0 holds, where s r 0 is defined by D(P sr P) = r[18]. As mentioned by Nakagawa-Kanaya[22], when r r 0, the relation min D(Q P) + r D(Q P) = D(P P) + r D(P P) = min (D(Q P) + r 0 D(Q P)) + r r 0 Q:D(Q P) r Q:D(Q P) r 0 holds. This bound is attained by the following randomized test. The hypothesis P is accepted with the probability only when the logarithmic likelihood ratio takes the maximum value r 0. Since D(P s P 1 ) < r, (7) implies that sr φ(s P P) sr φ(s P P) sr 0 φ(s P P) = = + r r 0 s 0 s s = min Q:D(Q P) r 0 (D(Q P) + r 0 D(Q P)) + r r 0. (10) Remark 1: The classical Hoeffding bound in information theory is due to Blahut[2] and Csiszár-Longo[4]. The corresponding ideas in statistics were first put forward by Hoeffding[16], from whom the bound received its name. Some authors prefer to refer this bound as the Hoeffding-Blahut-Csiszár- Longo bound. On the other hand, Han-Kobayashi[10] gave the first equation of (3), and proved that this equation among non-randomized tests when r 0 r D(P P). They pointed out that the minimum min Q:D(Q P) r D(Q P) + r D(Q P) can be attained by Q satisfying D(Q P) = r. Ogawa-Nagaoka[18]showed the second equation of (3) for this case. Nakagawa-Kanaya[22] proved the first equation when r > r 0. Indeed, as pointed by Nakagawa-Kanaya[22], when r > r 0, any non-randomized test cannot attain the minimum min Q:D(Q P) r D(Q P) + r D(Q P). In this case, the minimum min Q:D(Q P) r D(Q P) + r D(Q P) cannot be attained by Q satisfying D(Q P) = r. (9) III. MAIN RESULT: ADAPTIVE METHOD Let us focus on two spaces, the set of input signals X and the set of outputs Y. In this case, the channel from X and Y is described by the map from the set X to the set of probability distributions on Y. That is, given a channel W W x represents the output distribution when the input is x X. When X and Y have finite elements, the channel is given by transition matrix. The main topic is the discrimination of two classical channels W and W. In particular, we treat its asymptotic analysis when we can use the unknown channel only n times. That is, we discriminate two hypotheses, the null hypothesis H 0 : W n versus the alternative hypothesis H 1 : W n, where W n and W n are the n uses of the channel W and W Then, our problem is to decide which hypothesis is true based on n inputs x 1,...,x n and n outputs y 1,..., y n. In this setting, it is allowed to choose the k-th input based on the previous k 1 output adaptively. We choose the k-th input x k subject to the distribution P k (x 1,y 1),...,(x k 1,y k 1 ) (x k) on X. That is, the k-th input x k depends on k conditional distributions P k = (P 1, P 2,...,P k ). Hence, our decision method is described by n conditional distributions P n = (P 1, P 2,...,P n ) and a [0, 1]-valued function f n on (X Y) n. In this case, when we choose n inputs x 1,..., x n and observe n outputs y 1,...,y n, we accept the alternative hypothesis W with the probability f n (x 1, y 1,..., x n, y n ). That is, our scheme is illustrated by Fig. 4. In order to treat this problem mathematically, we introduce the following notation. For a channel W from X to Y and a distribution P on X, we define two notations, the distribution WP on X Y and the distribution W P on Y as Using the distribution WP, we define two quantities: WP(x, y) := W x (y)p(x) W P(x, y) := W x (y)p(dx). D(W W P) := D(WP WP) φ(s W W P) := φ(s WP WP). X

6 Channel discrimination with adaptive improvement x1 y1 W or W x 2 W or W y 2 Adaptive improvement is allowed x n W or W y n Fig. 4. The adaptive method Based on k conditional distributions P k = (P 1, P 2,...,P k ), we define the following distributions: Q W, P n := WP n WP n 1 WP 1 P W, P n := P n Q W, P n 1 Q s,w W, P n := P s,qw, n,q W, n P s,w W, P n := P n Q s,w W, P n 1. Then, the first type of error probability is given by E QW, nf n, and the second type of error probability is by E QW, n(1 f n). In order to treat this problem, we introduce the following quantities: and B(W W) := B (W W) := B e (r W W) := B e (r W W) := inf 1 C(W, W) := log( min E QW, n n nf n + E QW, n(1 f n)) n,f n βn (ǫ) := min EQW, n(1 f n) E QW, nf n ǫ }, n,f n ( P n,f n)} inf ( P n,f n)} ( P n,f n)} ( P n,f n)} loge QW, n(1 f } n) n n E Q n nf W, n = 0 loge QW, n(1 f } n) n n E QW, nf n < 1 n loge QW, nf n n n loge QW, n(1 f } n) r n n loge QW, n(1 f n) n log E QW, n(1 f } n) r. n n We obtain the following channel version of Stein s lemma. Theorem 1: Assume that φ(s W x W x ) is C 1 continuous, and φ( ǫ W W) ǫ +0 ǫ n = D(W x W x ), (11) where φ(s W W) := φ(s W x W x ) = P P(X) φ(s W W P), and P(X) is the set of distributions on X. Then, The following is another expression of Stein s lemma. B(W W) = B (W W) = D := D(W x W x ). (12)

7 Corollary 1: Under the same assumption, 1 n n log β n (ǫ) = D(W x W x ). Condition (11) can be replaced by another condition. Lemma 1: When any element x X satisfies and there exists a real number ǫ > 0 such that C 1 := φ (0 W x W x ) = D(W x W x ) s [ ǫ,0] then condition (11) holds. In addition, we obtain a channel version of the Hoeffding bound. Theorem 2: When and then B e (r W W) = Corollary 2: Under the same assumption, 0 s 1 d 2 φ(s W x W x ) ds 2 <, (13) d 2 φ(s W x W x ) s [0,1] ds 2 < (14) D(W x W x ) <, sr φ(s W x W x ) = min Q:D(Q W x) r D(Q W x ). (15) C(W, W) = min φ(s W x W x ). (16) 0 s 1 These arguments imply that adaptive improvement does not improve the performance in the above senses. For example, when we apply the best input x M := argmax x D(W x W x ) to all of n channels, we can achieve the optimal performance in the sense of the Stein bound. The same fact is true concerning the Hoeffding bound and the Chernoff bound. Proof: The relation holds. Since C(W, W) = r B e (r W W) r} r = sr φ(s W x W x ) 0 s 1 r 0 s 1 = min φ(s W x W x ), 0 s 1 sr φ(s W x W x ) the relation (16) holds. The channel version of the Han-Kobayashi bound is given as follows. Theorem 3: When φ(s W x W x ) is C 1 continuous, then } r } r Be(r W sr φ(s W W) W) = = inf s 0 sr φ(s W W P) = inf P P(X) s 0 P P 2 (X) s 0 where P 2 (X) is the distribution on X that takes positive probability only on at most two elements. As shown in Section IV, the equality sr φ(s W W) = inf s 0 sr φ(s W x W x ) s 0 sr φ(s W W P), (17) does not necessarily hold in general. In order to understand the meaning of this fact, we assume that the equation (18) does not hold. When we apply the same input x to all channels, the best performance cannot be achieved. However, the best performance can be achieved by the following method. Assume that the best input distribution argmax P P 2 (X) sr φ(s W W P) s 0 1 s has the port x, x }, and the probabilities λ and 1 λ. Then, applying x or x to all channels with the probability λ and 1 λ, we can achieve the best performance in the sense of the Han-Kobayashi bound. That is, the structure of optimal strategy of the Han-Kobayashi bound is more complex than those of the above cases. (18)

8 IV. SIMPLE EXAMPLE In this section, we treat a simple example that does not satisfy (18). For four given parameters p, q, a > 1, b > 1, we define the channels W and W : Then, we obtain In this case, W 0 (0) := aq, W 0 (1) := 1 aq, W 0 (0) := q, W 0 (1) := 1 q, W 1 (0) := bq, W 1 (1) := 1 bq, W 1 (0) := q, W 1 (1) := 1 q. φ(s W 0 W 0 ) = a, s s φ(s W 1 W 1 ) = b. s s D(W 0 W 0 ) =ap log a + (1 ap)log 1 ap 1 p D(W 1 W 1 ) =bq log b + (1 bq)log 1 bq 1 q. When a > b and D(W 0 W 0 ) < D(W 1 W 1 ), the magnitude relation between φ(s W 0 W 0 ) and φ(s W 1 W 1 ) on (, 0) depends on s (, 0). For example, the case of a = 100, b = 1.5, p = 0.0001, q = 0.65 is shown in Fig. 5. In this case, Be (r W 0 W 0 ), Be (r W 1 W 1 ), and Be (r W W) are calculated by Fig. 6. Then, the inequality (18) does not hold. Φ s 1 0.8 0.6 0.4 0.2 0-1 -0.8-0.6-0.4-0.2 0 s Fig. 5. Magnitude relation between φ(s W 0 W 0 ) and φ(s W 1 W 1 ) on ( 1, 0). The upper solid line indicates φ(s W 0 W 0 ), the dotted line indicates φ(s W 1 W 1 ). 0.5 0.4 B 0.3 0.2 0.1 0 0.4 0.6 0.8 1 1.2 1.4 r Fig. 6. Magnitude relation between Be (r W 0 W 0 ), Be (r W 1 W 1 ), and Be (r W W) on ( 1, 0). The upper solid line indicates B e (r W 0 W 0 ), the dotted line indicates Be (r W 1 W 1 ), and the lower solid line indicates Be (r W W).

9 V. APPLICATION TO ADAPTIVE QUANTUM STATE DISCRIMINATION Quantum state discrimination between two states ρ and σ on a d-dimensional system H with n copies by one-way LOCC is formulated as follows. We choose the first POVM M 1 and obtain the data y 1 through the measurement M 1. In the k-th step, we choose the k-th POVM M k ((M 1, y 1 ),..., (M k 1, y k 1 )) depending on (M 1, y 1 ),..., (M k 1, y k 1 ). Then, we obtain the k-th data y k through M k ((M 1, y 1 ),..., (M k 1, y k 1 )). Therefore, this problem can be regarded as classical channel discrimination with the correspondence W M (y) = Tr M(y)ρ and W M (y) = Tr M(y)σ. That is, in this case, the set of input signal corresponds to the set of extremal points of the set of POVMs on the given system H. The proposed scheme is illustrated in Fig. 7. One-way adaptive improvement ρ or σ Measurement M 1 y1 ρ or σ Measurement M 2 y 2 Adaptive improvement is allowed ρ or σ Measurement M n y n Fig. 7. Adaptive quantum state discrimination Now, we assume that ρ > 0 and σ > 0. In this case, X is compact, and the map (s, M) d2 φ(s W M W M) ds is continuous. 2 Then, the condition (13) holds. Therefore, one-way improvement does not improve the performance in the sense of the Stein bound, the Chernoff bound, the Hoeffding bound, or the Han-Kobayashi bound. That is, we obtain B(W W) =B (W W) = max D(P ρ M Pσ M ) M:POVM B e (r W W) = max M:POVM 0 s 1 B e(r W W) = s 0 sr φ(s P M ρ P M σ ) sr max M:POVM φ(s Pρ M P σ M). Therefore, there exists a difference between one-way LOCC and collective measurement. VI. PROOF OF THE STEIN BOUND: (12) Now, we prove the Stein bound: (12). For any x X, by choosing the input x in n times, we obtain Taking the remum, we have Furthermore, from the definition, it is trivial that Therefore, it is sufficient to show the strong converse part: B(W W) D(W x W x ). B(W W) D(W x W x ). B(W W) B (W W). B (W W) D. (19) However, in preparation for the proof of (15), we present a proof of the weak converse part: B(W W) D (20)

10 which is weaker argument than (19), and is valid without assumption (11). In the following proof, it is essential to evaluate the KL-divergence concerning the obtained data. In order to prove (20), we prove that when It follows from the definitions of Q W, P n and Q W, P n that 1 n n log E Q n(1 f W, n) D (21) D(Q W, P n Q W, P n) = E QW, nf n 0. (22) n D(W W P W, P k). Since E QW, nf n log E QW, nf n 0, information processing inequality concerning the KL divergence yields the following: That is, k=1 h(e QW, n(1 f n)) (E QW, n(1 f n))log E QW, n(1 f n) E QW, n(1 f n)(log E QW, n(1 f n) log E QW n(1 f, n)) + E QW, nf n(log E QW, nf n log E QW, nf n) n D(Q W, P n Q W, P n) = D(W W P W, P k) nd. (23) k=1 Therefore, (22) yields (21). Next, we prove the strong converse part, i.e., we show that when Since we obtain Φ(s Q W, P n Q W, P n) =Φ(s Q W, P n 1 Q W, P n 1) Applying (27) inductively, we obtain the relation 1 n log E Q n(1 f W, n) D + 1 n h(e Q n(1 f W, n)) E QW, n(1 f. (24) n) E QW, n(1 f n) 0 (25) log E QW, r := n(1 f n) > D. (26) n n ( X ( ( W ) ) x n (y n )) s W xn (dy n ) P Y W s,w W, P n(dx n ), xn φ(s Q W, P n Q W, P n) = φ(s Q W, P n 1 Q W, P n 1) + φ(s W W P s,w W, P n). (27) φ(s Q W, P n Q W, P n) = n φ(s W W P s,w W, P k) nφ(s W W). (28) Since the information quantity φ(s P P) satisfies the information processing inequality, we have k=1 (E QW, n(1 f n)) 1 s (E QW, n(1 f n)) s (E QW, n(1 f n)) 1 s (E QW, n(1 f n)) s + (E QW, nf n) 1 s (E QW, nf n) s e φ(s Q W, n Q W, n) e nφ(s W W), for s 0. Taking the logarithm, we obtain ()log E QW, n (1 f n ) s loge QW, n (1 f n ) + nφ(s W W). (29)

11 That is, When n log E P n(1 f n) n holds. Taking the remum, we obtain 1 1 n log E Q n(1 f s n W, n) log E Q W, (1 f n n ) φ(s W W). r, the inequality Be(r W 1 W) n n log E Q n(1 f sr φ(s W W) W, n) Be (r W W) s 0 sr φ(s W W). From conditions (11) and (26), there exists a small real number ǫ > 0 such that r > sr φ(s W W) ǫr φ( ǫ W W) > 0. s 0 1 + ǫ φ( ǫ W W) ǫ. Thus, Therefore, we obtain (25). Remark 2: The technique of the strong converse part except for (28) was developed by Nagaoka [19]. Hence, deriving (28) can be regarded as the main contribution in this section of the present paper. Proof of Lemma 1: It is sufficient for a proof of (11) to show that the uniformity of the convergence φ( ǫ Wx W x) ǫ x X. Now, we choose ǫ > 0 satisfying condition (13). Then, there exists s [ ǫ, 0] such that 1 2 ǫφ(s W x W x ) C1 2 ǫ. Therefore, the condition (11) holds. VII. PROOF OF THE HOEFFDING BOUND: (15) In this section, we prove the Hoeffding bound: (15). Since the inequality B e (r W W) sr φ(s W x W x ) 0 s 1 = min Q:D(Q W x) r D(W x W x ) 0 concerning φ( ǫ Wx W x) ǫ D(W x W x ) = D(Q W x ) is trivial, we prove the opposite inequality. In the following proof, the geometric characterization Fig. 1 and the weak and the strong converse parts are essential. Equation (6) guarantees that min Q:D(Q W x) r D(Q W x ) = min s [0,1]:D(P s,wx,wx W x) r For this purpose, for arbitrary ǫ > 0, we choose a channel V : V x = P s(x),wx,w x by s(x) := Assume that a sequence ( P n, f n )} satisfies argmin D(P s,wx,w x W x ). s [0,1]:D(P s,wx,w x W x) r 1 n n log E Q n(1 f W, n) = r. By substituting V into W, the strong converse part of the Stein bound:(25) implies that The condition (13) can be checked by the following relations: E QV, n(1 f n) = 0. D(P s,wx,w x W x ). dφ(t P s(x),wx,w x W x ) = ((x))φ (s(x)(1 t) + t W x W x ) dt (30) d 2 φ(t P s(x),wx,w x W x ) dt 2 = ((x)) 2 φ (s(x)(1 t) + t W x W x ). (31) Thus, by substituting V and W into W and W, the relation (24) implies that Similar to (30) and (31), we can check the condition (13). 1 n n log E Q n(1 f W, n) D(V x W x ).

12 From the construction of V, we obtain The uniform continuity guarantees that 1 n n log E Q n(1 f W, n) max 1 n n log E Q n(1 f W, n) max x x min Q:D(Q W x) r ǫ min Q:D(Q W x) r D(Q W x ). D(Q W x ). sr φ(s W Now, we show the uniformity of the function r x W x) 0 s 1 1 s concerning x. As mentioned in p. 82 of Hayashi[11], the relation holds, where Since we have d dr sr φ(s W x W x ) = s r 0 s 1 s r 1 s r := argmax 0 s 1 d dr sr φ(s W x W x ). sr φ(s W x W x ) = 0, s=sr r = (s r 1)φ (s r W x W x ) φ(s r W x W x ). Since φ(s r W x W x ) 0, (s r 1) 0, and φ (s W x W x ) 0, Thus, Hence, r (s r 1)φ (s r W x W x ) (s r 1)φ (1 W x W x ) = ( r )D(W x W x ). r D(W x W x ) ( r). s r s r 1 1 D(W x W x ) r r x D(W x W x ). r Therefore, the function r 0 s 1 sr φ(s W x W x) 1 s is uniform continuous with respect to x. The inequality has been shown in Section VI, and the inequality VIII. PROOF OF THE HAN-KOBAYASHI BOUND: (17) B e (r W W) s 0 B e (r W W) sr φ(s W W). (32) inf sr φ(s W W P) P P 2 (X) s 0 can be easily check by considering the input P. Therefore, it is sufficient to show the inequality inf sr φ(s W W P) sr φ(s W W) P P 2 (X) s 0 s 0 = inf s 0 P P 2 (X) sr φ(s W W P). (33) sr φ(s W W P) 1 s This relation seems to be guaranteed by the mini-max theorem (Chap. VI Prop. 2.3 of [5]). However, the function is not necessarily concave concerning s while it is convex concerning P. Hence, this relation cannot be guaranteed by the mini-max theorem.

13 sr φ(s W W) Now, we prove this inequality when the maximum max s 0 1 s s, φ(s W W) is also convex concerning s. Then, we can define + φ(s + ǫ W W) φ(s W W) φ(s W W) := ǫ +0 ǫ φ(s W W) := ǫ +0 Hence, the real number s r := argmax s 0 sr φ(s W W) 1 s φ(s W W) φ(s ǫ W W). ǫ satisfies that exists. Since φ(s W x W x ) is convex concerning ( r ) φ(s r W W) + φ(s r W W) r ( r ) + φ(s r W W) + φ(s r W W). That is, there exists λ [0, 1] such that r = ( r )(λ + φ(s r W W) + (1 λ) φ(s r W W)) + φ(s r W W). (34) For an arbitrary real number 1 > ǫ > 0, there exists 1 > δ > 0 such that Then, we choose x +, x X such that Thus, (37) implies that Similarly, (38) implies that φ(s + δ W W) φ(s W W) + φ(s W W) + ǫ δ (35) φ(s W W) φ(s δ W W) φ(s W W) ǫ. δ (36) φ(s r + λδ W W) δǫ φ(s r + λδ W x + W x +) φ(s r + λδ W W) (37) φ(s r (1 λ)δ W W) δǫ φ(s r (1 λ)δ W x W x ) φ(s r (1 λ)δ W W). (38) φ(s r + λδ W x + W x +) φ(s r (1 λ)δ W x + W x +) δ φ(s r + λδ W W) δǫ φ(s r (1 λ)δ W W) δ φ(s r + λδ W W) φ(s r + W W) + φ(s r + W W) φ(s r (1 λ)δ W W) δǫ δ λδ + φ(s r W W) + (1 λ)δ( φ(s r + W W) ǫ) δǫ δ =λ + φ(s r W W) + (1 λ) φ(s r + W W) ǫ. (39) φ(s r + λδ W x W x ) φ(s r (1 λ)δ W x W x ) δ λ + φ(s r W W) + (1 λ) φ(s r + W W) + ǫ. (40) Therefore, there exists a real number λ [0, 1] such that ϕ(s r + λδ λ ) ϕ(s r (1 λ)δ λ ) (λ + φ(s r W W) + (1 λ) φ(s r + W W)) δ ǫ. where ϕ(s λ ) := λ φ(s W x + W x +) + (1 λ )φ(s W x W x ). Thus, there exists s r [s r (1 λ)δ, s r + λδ] such that ϕ (s r λ ) (λ + φ(s r W W) + (1 λ) φ(s r W W)) ǫ. (42) The relation (41) also implies that 0 ϕ(s r (1 λ)δ λ ) ϕ(s r λ ) ϕ(s r (1 λ)δ λ ) ϕ(s r + λδ λ ) [ǫ ((λ + φ(s r W W) + (1 λ) φ(s r W W))]δ (ǫ φ(s r W W))δ. (43) (41)

14 Since relations (36) and (37) guarantee that Therefore, φ(s r (1 λ)δ W x + W x +) φ(s r + λδ W x + W x +), 0 φ(s r (1 λ)δ W W) φ(s r (1 λ)δ W x + W x +) φ(s r (1 λ)δ W W) φ(s r + λδ W W) + φ(s r + λδ W W) φ(s r + λδ W x + W x +) (ǫ φ(s r W W))(s r + λδ s r ) + δǫ (ǫ φ(s r W W))δ + δǫ = (2ǫ φ(s r W W))δ. 0 φ(s r (1 λ)δ W W) ϕ(s r (1 λ)δ λ ) λ (φ(s r (1 λ)δ W W) φ(s r (1 λ)δ W x + W x +)) + (1 λ )(φ(s r (1 λ)δ W W) φ(s r (1 λ)δ W x W x )) λ (ǫ φ(s r W W))δ + (1 λ )δǫ (ǫ φ(s r W W))δ. (44) Since (36) implies that relations (43) and (44) guarantee that ϕ(s r λ ) φ(s r W W) φ(s r (1 λ)δ W W) φ(s r W W) (ǫ φ(s r W W))δ, ϕ(s r λ ) ϕ(s r (1 λ)δ λ ) + ϕ(s r (1 λ)δ λ ) φ(s r (1 λ)δ W W) + φ(s r (1 λ)δ W W) φ(s r W W) (4ǫ 3 φ(s r W W))δ C 2 δ, where C 2 := 4 3 φ(s r W W)) 4ǫ 3 φ(s r W W). Note that the constant C 2 does not depend on ǫ or δ. We choose a real number r := ( r )ϕ(s r λ ) + ϕ (s r λ ). Then, (45), (42), and the inequality s r s r δ imply that r r ( r )ϕ(s r λ ) ( r )φ(s r W W)) + ϕ (s r λ ) (λ + φ(s r W W) + (1 λ) φ(s r + W W)) ( r )(ϕ(s r λ ) φ(s r W W)) + φ(s r W W)(s r s r ) + ϕ (s r λ ) (λ + φ(s r W W) + (1 λ) φ(s r + W W)) ( r )C 2 δ + φ(s r W W) δ + ǫ C 3 δ + ǫ, (46) where C 3 :=(2 s r )C 2 + φ(s r W W) ( r + (1 λ)δ)c 2 + φ(s r W W) ( r )C 2 + φ(s r W W). Note that the constant C 3 does not depend on ǫ or δ. The function sr ϕ(s λ ) (46), we can check that this maximum is approximated by the value s rr ϕ(s r λ ) r s rr φ(s r W W) r 1 s srr φ(sr W W) 1 s r (45) takes the maximum at s = s r. Using (45) and s rr ϕ(s r λ ) r s rr φ(s r W W) r + s rr φ(s r W W) r s rr φ(s r W W) r s rr s r r + ϕ(s r λ ) φ(s r W W) + s rr φ(s r W W)(s r s r ) r r ( r )( r ) (s r(r r) + r(s r s r ) + ϕ(s r λ ) φ(s r W W) + s rr φ(s r W W) r r ( r + 1)( r ) δ ( s r + δ)(c 3 δ + ǫ) + rδ C 2 ǫ + + s rr φ(s r W W) δ 2 s r 2 s r (2 s r )( r ) C 4 ǫ + C 5 δ, as (47)

15 where we choose C 4 and C 5 as follows. C 4 := s r + 1 C 2 + 2 s r 2 s r C 2 s r + δ + 2 s r 2 s r C 5 := ( s r + 1)C 3 + rδ + s rr φ(s r W W) 2 s r (2 s r )( r ) ( s r + δ)c 3 + rδ + s rr φ(s r W W). 2 s r (2 s r )( r ) Note that the constants C 4 and C 5 do not depend on δ or ǫ. Since (46) implies that max s 0 sr ϕ(s λ ) sr ϕ(s λ ) sr ϕ(s λ ) s r r r r, max s 0 Since ϕ(s λ ) φ(s W W), (48) and (47) guarantee that 0 max s 0 sr ϕ(s λ ) We define the distribution P λ P 2 (X) by Since the function x log x is concave, the inequality holds. Hence, (49) and (50) imply that sr ϕ(s λ ) r r C 3 δ + ǫ. (48) s rr φ(s r W W) r (C 4 + 1)ǫ + (C 3 + C 5 )δ. (49) P λ (x + ) = λ, P λ (x ) = 1 λ. ϕ(s λ ) φ(s W W P λ ) (50) 0 inf max sr φ(s W W P) s rr φ(s r W W) P P 2 (X) s 0 r max s 0 sr φ(s W W P λ ) s rr φ(s r W W) r (C 4 + 1)ǫ + (C 3 + C 5 )δ. We take the it δ +0. After this it, we take the it ǫ +0. Then, we obtain (33). Next, we prove the inequality (33) when the maximum max s 0 sr φ(s W W) s φ(s W W) s satisfies r R. Thus, sr φ(s r W W) = r + R. s 0 For any ǫ > 0, there exists s 0 < 0 such that any s < s 0 satisfies that We choose x 0 such that Thus, Hence, for any s < s 0, R φ(s 0 W W) φ(s W W) s 0 s 1 s does not exist. The real number R := R + ǫ. φ(s 0 1 W W) ǫ φ(s 0 1 W x0 W x0 ) φ(s 0 1 W W). φ(s 0 W x0 W x0 ) φ(s 0 1 W x0 W x0 ) φ(s 0 W W) φ(s 0 1 W W) + ǫ R + 2ǫ. φ(s 0 W x0 W x0 ) φ(s W x0 W x0 ) s 0 s φ(s 0 W x0 W x0 ) φ(s 0 1 W x0 W x0 ) φ(s 0 W W) φ(s 0 1 W W) + ǫ R + 2ǫ.

16 Thus, Therefore, Taking ǫ 0, we obtain (33). r R φ(s W x0 W x0 ) R + 2ǫ. s s sr φ(s r W x0 W x0 ) r + R + 2ǫ. s 0 IX. CONCLUDING REMARKS AND FUTURE STUDY We have obtained a general asymptotic formula for the discrimination of two classical channels with adaptive improvement concerning the several asymptotic formulations. We have proved that any adaptive method does not improve the asymptotic performance. That is, the non-adaptive method attains the optimum performance in these asymptotic formulations. Applying the obtained result to the discrimination of two quantum states by one-way LOCC, we have shown that one-way communication does not improve the asymptotic performance in these senses. On the other hand, as shown in Section 3.5 of Hayashi[11], we cannot improve the asymptotic performance of the Stein bound even if we extend the class of our measurement to the separable POVM in the n-partite system. Hence, two-way LOCC does not improve the Stein bound. However, other asymptotic performances in two-way LOCC and separable POVM have not been solved. Therefore, it is an interesting problem to solve whether two-way LOCC improves the asymptotic performance for other than the Stein s bound. Furthermore, the discrimination of two quantum channels (TP-CP maps) is an interesting related topic. An open problem remains as to whether choosing input quantum states adaptively improves the discrimination performance in an asymptotic framework. The solution to this problem will be sought in a future study. ACKNOWLEDGMENTS The author would like to thank Professor Emilio Bagan, Professor Ramon Munoz Tapia, and Dr. John Calsamiglia for their interesting discussions. The present study was ported in part by MEXT through a Grant-in-Aid for Scientific Research on Priority Area Deepening and Expansion of Statistical Mechanical Informatics (DEX-SMI), No. 18079014. REFERENCES [1] K.M.R. Audenaert, J. Calsamiglia, R. Munoz-Tapia, E. Bagan, Ll. Masanes, A. Acin and F. Verstraete, Discriminating States: The Quantum Chernoff Bound, Phys. Rev. Lett., 98, 160501 (2007). [2] R.E. Blahut, Hypothesis Testing and Information Theory, IEEE Trans. Infor. Theory, 20, 405 417 (1974). [3] H. Chernoff, A Measure of Asymptotic Efficiency for Tests of a Hypothesis based on the Sum of Observations, Ann. Math. Stat., 23, 493-507 (1952). [4] I. Csiszár and G. Longo, On the error exponent for source coding and testing simple hypotheses, Studia Sci. Math. Hungarica, 6, 181 191 (1971). [5] I. Ekeland and R. Téman, Convex Analysis and Variational Problems, (North-Holland, 1976); (SIAM, 1999). [6] V.V. Fedorov, Theory of Optimal Experiments, Academic Press (1972). [7] A. Fujiwara, Strong consistency and asymptotic efficiency for adaptive quantum estimation problems, J. Phys. A: Math. Gen., 39, No 40, 12489-12504, (2006). [8] T.S. Han, Hypothesis testing with the general source, IEEE Trans. Infor. Theory, 46, 2415 2427, (2000). [9] T. S. Han: Information-Spectrum Methods in Information Theory, (Springer-Verlag, New York, 2002) (Originally published by Baifukan 1998 in Japanese) [10] T. S. Han and K. Kobayashi, The strong converse theorem for hypothesis testing, IEEE Trans. Infor. Theory, 35, 178-180 (1989). [11] M. Hayashi, Quantum Information: An Introduction, Springer, Berlin (2006). (Originally published by Saiensu-sha 2004 in Japanese) [12] M. Hayashi, Error Exponent in Asymmetric Quantum Hypothesis Testing and Its Application to Classical-Quantum Channel coding, Phys. Rev. A, 76, 062301 (2007). [13] M. Hayashi and K. Matsumoto, Statistical model with measurement degree of freedom and quantum physics, Surikaiseki Kenkyusho Kokyuroku, 1055, 96 110, (1998). (In Japanese) (Its English translation is also appeared as Chapter 13 of Asymptotic Theory of Quantum Statistical Inference, M. Hayashi eds.) [14] M. Hayashi and K. Matsumoto, Two Kinds of Bahadur Type Bound in Adaptive Experimental Design, IEICE Trans., J83-A, 629-638 (2000). (In Japanese) [15] F. Hiai and D. Petz, The proper formula for relative entropy and its asymptotics in quantum probability, Comm. Math. Phys., 143, 99 114, (1991). [16] W. Hoeffding, Asymptotically optimal test for multinomial distributions, Ann. Math. Stat., 36, 369-401 (1965). [17] T. Ogawa and M. Hayashi, On Error Exponents in Quantum Hypothesis Testing, IEEE Trans. Infor. Theory, 50, 1368 1372 (2004). [18] T. Ogawa and H. Nagaoka, Strong converse and Stein s lemma in quantum hypothesis testing, IEEE Trans. Infor. Theory, 46, 2428 2433 (2000); [19] H. Nagaoka, Strong converse theorems in quantum information theory, Proc. ERATO Conference on Quantum Information Science (EQIS) 2001, 33 (2001). It is also appeared as Chapter 4 of Asymptotic Theory of Quantum Statistical Inference, M. Hayashi eds.) [20] H. Nagaoka, The Converse Part of The Theorem for Quantum Hoeffding Bound, arxiv.org E-print quant-ph/0611289 (2006). [21] H. Nagaoka and M. Hayashi, An information-spectrum approach to classical and quantum hypothesis testing, IEEE Trans. Infor. Theory, 53, 534-549 (2007). [22] K. Nakagawa and F. Kanaya, On the converse theorem in statistical hypothesis testing, IEEE Trans. Infor. Theory, 39, Issue 2, 623-628 (1993). [23] M. Nussbaum and A. Szkoła, A lower bound of Chernoff type in quantum hypothesis testing, arxiv.org E-print quant-ph/0607216 (2006). [24] M. Nussbaum and A. Szkoła, The Chernoff lower bound in quantum hypothesis testing, Preprint No. 69/2006, MPI MiS Leipzig.