Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Similar documents
A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks

Converse bounds for private communication over quantum channels

On Composite Quantum Hypothesis Testing

Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

Second-Order Asymptotics in Information Theory

Arimoto Channel Coding Converse and Rényi Divergence

On Third-Order Asymptotics for DMCs

EECS 750. Hypothesis Testing with Communication Constraints

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Quantum Achievability Proof via Collision Relative Entropy

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Secret Key Agreement: General Capacity and Second-Order Asymptotics. Masahito Hayashi Himanshu Tyagi Shun Watanabe

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

On asymmetric quantum hypothesis testing

Multivariate Trace Inequalities

Arimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract

Strong converse theorems using Rényi entropies

Dispersion of the Gilbert-Elliott Channel

LECTURE 3. Last time:

ECE 4400:693 - Information Theory

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

arxiv: v4 [cs.it] 17 Oct 2015

Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Second-Order Asymptotics for the Gaussian MAC with Degraded Message Sets

Convexity/Concavity of Renyi Entropy and α-mutual Information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

Information Theory and Hypothesis Testing

Soft Covering with High Probability

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

5 Mutual Information and Channel Capacity

Non-Asymptotic and Asymptotic Analyses on Markov Chains in Several Problems

A One-to-One Code and Its Anti-Redundancy

LECTURE 13. Last time: Lecture outline

A New Metaconverse and Outer Region for Finite-Blocklength MACs

EE 4TM4: Digital Communications II. Channel Capacity

Consistency of the maximum likelihood estimator for general hidden Markov models

Channels with cost constraints: strong converse and dispersion

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

The Method of Types and Its Application to Information Hiding

Classical and Quantum Channel Simulations

Quiz 2 Date: Monday, November 21, 2016

Capacity Estimates of TRO Channels

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Semidefinite programming strong converse bounds for quantum channel capacities

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Lecture 11: Quantum Information III - Source Coding

Information measures in simple coding problems

Information Theoretic Limits of Randomness Generation

Source Coding with Lists and Rényi Entropy or The Honey-Do Problem

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

LECTURE 10. Last time: Lecture outline

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Hypothesis testing for Stochastic PDEs. Igor Cialenco

Information-theoretic Secrecy A Cryptographic Perspective

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Strong converse theorems using Rényi entropies

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Probability Background

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 22: Final Review

Secret Key Agreement: General Capacity and Second-Order Asymptotics

Lecture 4 Noisy Channel Coding

Primer on statistics:

TUTORIAL 8 SOLUTIONS #

Entropy, Inference, and Channel Coding

ECE534, Spring 2018: Solutions for Problem Set #3

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 14 February 28

Hypothesis Testing with Communication Constraints

Information Theory Primer:

Hands-On Learning Theory Fall 2016, Lecture 3

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

Classical communication over classical channels using non-classical correlation. Will Matthews University of Waterloo

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

(Classical) Information Theory II: Source coding

UCSD ECE 255C Handout #14 Prof. Young-Han Kim Thursday, March 9, Solutions to Homework Set #5

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

Common Randomness Principles of Secrecy

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 8: Information Theory and Statistics

Asymptotic Statistics-III. Changliang Zou

Different quantum f-divergences and the reversibility of quantum operations

Discrete Memoryless Channels with Memoryless Output Sequences

Secret Key Agreement: General Capacity and Second-Order Asymptotics

A Formula for the Capacity of the General Gel fand-pinsker Channel

Series 7, May 22, 2018 (EM Convergence)

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Summary of Chapters 7-9

Simple Channel Coding Bounds

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 5 Channel Coding over Continuous Channels

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Transcription:

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information Masahito Hayashi 1, Marco Tomamichel 2 1 Graduate School of Mathematics, Nagoya University, and Centre for Quantum Technologies, National University of Singapore 2 School of Physics, The University of Sydney ISIT 2015, Hong Kong (arxiv: 1408.6894)

Outline and Motivation Rényi Entropy and divergence (Rényi 61) have found various applications in information theory: e.g. error exponents for hypothesis testing and channel coding, cryptography, the Honey Do problem, etc. Conditional Rényi entropy and Rényi mutual information are less understood. Mathematical properties of different proposed definitions have recently been investigated see, e.g., Fehr Berens (TIT 14) or Verdú (ITA 15), and many works in quantum We want to find an operational interpretation of the measures. 2 / 21

Mutual Information Two discrete random variables (X, Y ) P XY. Many expression for the mutual information are available: I(X : Y ) = H(X) + H(Y ) H(XY ) (1) Which one to generalize? = H(X) H(X Y ) (2) = D(P XY P X P Y ) (3) = min D(P XY P X ) (4) = min Q X, D(P XY Q X ). (5) 3 / 21

Rényi Mutual Information Two discrete random variables (X, Y ) P XY. Many expression for the mutual information are available: 1 I α (X : Y ) = H α (X) + H α (Y ) H α (XY ) (1) 2 I α (X : Y ) = H α (X) H α (X Y ) (2) 3 I α (X : Y ) = D α (P XY P X P Y ) (3) 4 I α (X : Y ) = min D α (P XY P X ) (4) 5 I α (X : Y ) = min Q X, D α (P XY Q X ). (5) We want the mutual information to be non-negative! We want it to be non-increasing under local processing! 4 / 21

Rényi Mutual Information Two discrete random variables (X, Y ) P XY. Many expression for the mutual information are available: 1 I α (X : Y ) = H α (X) + H α (Y ) H α (XY ) (1) 2 I α (X : Y ) = H α (X) H α (X Y ) (2) 3 I α (X : Y ) = D α (P XY P X P Y ) (3) 4 I α (X : Y ) = min D α (P XY P X ) (4) 5 I α (X : Y ) = min Q X, D α (P XY Q X ). (5) We want the mutual information to be non-negative! We want it to be non-increasing under local processing! 5 / 21

Rényi Mutual Information Two discrete random variables (X, Y ) P XY. Many expression for the mutual information are available: 1 I α (X : Y ) = H α (X) + H α (Y ) H α (XY ) (1) 2 I α (X : Y ) = H α (X) H α (X Y ) (2) 3 I α (X : Y ) = D α (P XY P X P Y ) (3) 4 I α (X : Y ) = min D α (P XY P X ) (4) 5 I α (X : Y ) = min Q X, D α (P XY Q X ). (5) We want the mutual information to be non-negative! We want it to be non-increasing under local processing! 6 / 21

Rényi Mutual Information Two discrete random variables (X, Y ) P XY. Many expression for the mutual information are available: 1 I α (X : Y ) = H α (X) + H α (Y ) H α (XY ) (1) 2 I α (X : Y ) = H α (X) H α (X Y ) (2) 3 I α (X : Y ) = D α (P XY P X P Y ) (3) I α (X : Y ) = min D α (P XY P X ) (4) 5 I α (X : Y ) = min Q X, D α (P XY Q X ). (5) We want the mutual information to be non-negative! We want it to be non-increasing under local processing! This is Sibson s proposal. 7 / 21

Rényi Entropy and Divergence For two pmf s P X Q X, the Rényi divergence is defined as ( ) D α (P X Q X ) = 1 α 1 log P X (x) α Q X (x) 1 α. for any α (0, 1) (1, ) and as a limit for α {0, 1, }. Monotonicity: for α β, we have D α (P X Q X ) D β (P X Q X ). x Kullback-Leibler divergence: lim α 1 D α(p X Q X ) = D(P X Q X ) = x P X (x) log P X(x) Q X (x). Data-processing inequality (DPI): for any channel W, we have D α (P X Q X ) D α (P X W Q X W ). 8 / 21

Rényi Mutual Information Recall: I α (X : Y ) = min D α (P XY P X ) Inherits monotonicity and DPI from divergence. We have lim α 1 I α (X : Y ) = I(X : Y ). Sibson s identity (Sibson 69): minimizer satisfies (y) α P X (x)p Y X (y x) α, x I α (X : Y ) = 1 α 1 log ( ) 1 α P X (x)p Y X (y x) α. y x Additivity: (X 1, X 2, Y 1, Y 2 ) P X1 Y 1 P X2 Y 2 independent: I α (X 1 X 2 : Y 1 Y 2 ) = I α (X 1 : Y 1 ) + I α (X 2 : Y 2 ). 9 / 21

Correlation Detection and One-Shot Converse Correlation Detection: given a pmf P XY, consider Null Hypothesis: (X, Y ) P XY Alternative Hypothesis: X P X independent of Y For a test T Z XY with Z {0, 1} define errors α(t ) = Pr[Z = 1], (X, Y, Z) P XY T Z XY β(t ) = max Pr[Z = 0], (X, Y, Z) P X T Z XY The one-shot (meta-) converse can be stated in terms of this composite hypothesis testing problem (Polyanskiy 13). Any code on W Y X with input distribution P X using M codewords and average error ε satisfies (P XY = P X W Y X ): M 1 ˆβ(ε), ˆβ(ε) = min { β(t ) T Z XY s.t. α(t ) ε }. 10 / 21

Asymptotic Correlation Detection Consider the asymptotics n for the sequence of problems Null Hypothesis: (X n, Y n ) P n XY Alternative Hypothesis: X n P n X independent of Y n For a test T n Z X n Y n with Z {0, 1} define errors α(t n ) = Pr[Z = 1], (X, Y, Z) P n XY T n Z X n Y n β(t n ) = max Pr[Z = 0], (X, Y, Z) P n Q X n T Z X n n Y n Y n Define minimal error for fixed rate R > 0: ˆα(R; n) = min { α(t n ) T n Z X n Y n s.t. β(t n ) exp( nr) }. 11 / 21

Error Exponents (Hoeffding) Recall: I s (X : Y ) = min D s (P XY P X ) ˆα(R; n) = min { α(t n ) T n Z X n Y n s.t. β(t n ) exp( nr) } Result (Error Exponent) For any R > 0, we have { 1n } log ˆα(R; n) lim n = sup s (0,1) { 1 s ( Is (X : Y ) R )}. s If R I(X : Y ) it evaluates to 0, else it is positive. I(X : Y ) is the critical rate (cf. Stein s Lemma). If R < I 0 (X : Y ) it diverges to +. This is the zero-error regime. 12 / 21

Strong Converse Exponents (Han Kobayashi) Recall: I s (X : Y ) = min D s (P XY P X ) ˆα(R; n) = min { α(t n ) T n Z X n Y n s.t. β(t n ) exp( nr) } Result (Strong Converse Exponent) For any 0 < R < I (X : Y ), we have { lim 1 n n log ( 1 ˆα(R; n) )} { s 1 ( = sup R Is (X : Y ) )}. s>1 s If R I(X : Y ) it evaluates to 0, otherwise it is positive. This implies the strong converse to Stein s Lemma. What if R = I(X : Y )? 13 / 21

Second Order Expansion For small deviations r from the rate R, define ˆα(R, r; n) = min { α(t n ) T n Z X n Y n s.t. β(t n ) exp( nr nr) }. Result (Second Order Expansion) For any r R, we have lim ˆα( I(X : Y ), r; n ) = Φ n ( ) r. V (X : Y ) Φ is cumulative (normal) Gaussian distribution function. V (X : Y ) = V (P XY P X P Y ) where V ( ) is the divergence variance d ds s=1 I s (X : Y ) = 1 2 V (X : Y ). 14 / 21

Universal Distribution For every n, consider the universal pmf (Hayashi 09) T n Y n(yn ) = λ P n(y ) 1 P n (Y ) U λ(y n ), where U λ is the uniform distribution over the type class λ. Every S n -invariant pmf n satisfies n(y n ) P n (Y ) T n Y n(yn ) y n. Main idea: test P n XY vs. P n X T n Y n. Lemma For any joint pmf P XY, the universal pmf satisfies D α ( P n XY P n X T n Y n ) = niα (X : Y ) + O(log n). 15 / 21

Error Exponent: Achievability (1) Fix s (0, 1). Fix sequence {λ n } n to be chosen later. We use Neyman-Pearson tests for P n n XY vs. P X T Y n n: { Z(x n, y n P n XY ) = 1 log (xn, y n } ) P n X (xn )TY n (y n ) λ n. n Then, with (X n, Y n ) P n XY, we have Pr[Z = 1] = x n,y n P n { XY (x n, y n ) 1 log P n XY (xn, y n ) P n X (xn )TY n n(yn ) λn exp ( ) ( (1 s)λ n P n XY (x n, y n ) ) s( P n X (x n )TY n n(yn ) ) 1 s x n,y ( n = exp (1 s) ( λ n D s(p n XY P n X TY n n))). } 16 / 21

Error Exponent: Achievability (2) And, with (X n, Y n ) P n X n, we have Pr[Z = 0] = { P X n(x n ) n(y n P n XY )1 log (xn, y n } ) P n x n,y n X (xn )TY n n(yn ) < λn = { P X n(x n ) n(y n P n XY )1 log (xn, y n } ) P n x n,y n X (xn )TY n n(yn ) < λn. where n(y n ) = π S n 1 S n n(p (π)yn ) is S n -invariant. Now we can bring in the universal pmf again: Pr[Z = 0] P n(y ) { P X n(x n )TY n n(yn )1 log x n,y n P n(y ) exp P n XY (xn, y n ) P n X (xn )TY n n(yn ) < λn ( sλ n (1 s)d s(p n XY P n X T n Y n) ). Choose {λ n } such that Pr[Z = 0] exp( nr). } 17 / 21

Second Order: Achievability There exits {λ n } n such that Pr[Z = 0] exp ( ni(x : Y ) nr ) (X n, Y n ) P n X Pr[Z = 1] = Pr[F n(x n, Y n) < r] (X n, Y n ) P n XY. with a new sequence of random variables F n(x n, Y n) = 1 ( log n n P n XY (Xn, Y n ) P n X (Xn )TY n n(y nr log Pn(Y ) n ) Asymptotic cumulant generating function: Λ F (t) = lim log E[exp(tF n)] n t = lim (D n n 1+ t (P n n XY = t2 2 V (P XY P X P Y ). ). P n X T n Y n) ni(x : Y ) ) F n converges in distribution to a Gaussian F with variance V (by a variation of Lévi s continuity theorem). 18 / 21

Quantum Hypothesis Testing Given a bipartite quantum state ρ AB, consider Null Hypothesis: state is ρ AB Alternative Hypothesis: state is ρ A σ B for some state σ B Using the same notation: lim { 1n } { 1 s log ˆα(R; n) = sup (Īs (A : B) R )}, n s (0,1) s { lim 1 n n log ( 1 α(r; n) )} { s 1 ( = sup R Ĩ s (A : B) )}. s>1 s The definition are similar, } Ī s (A : B) Ĩ s (A : B) = min σ B But D s and D s are different! { Ds (ρ AB ρ A σ B ) D s (ρ AB ρ A σ B ). 19 / 21

Two Quantum Rényi Divergences D(ρ σ) D s (ρ σ) D s (ρ σ) s 0.0 0.5 1.0 2.0 3.0 They agree with the classical quantity for commuting states. D s (ρ σ) = 1 s 1 log tr ( ρ s σ 1 s), D s (ρ σ) = 1 s 1 log tr (( σ 1 s 2s 1 s ρσ 2s ) s ). 20 / 21

Summary and Outlook Correlation detection gives operational meaning to I α (X : Y ) = min D α (P XY P X ). Similarly Arimoto s conditional Rényi entropy H α (X Y ) = log X min D α (P XY U X ). has an operational interpretation: Null Hypothesis: (X, Y ) P XY Alternative Hypothesis: X U X uniform and indep. of Y Does the symmetric mutual information I α(x : Y ) = min D α (P XY Q X ) Q X, have a natural operational interpretation? 21 / 21