The Method of Types and Its Application to Information Hiding

Similar documents
Capacity of a channel Shannon s second theorem. Information Theory 1/33

Shannon s noisy-channel theorem

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

LECTURE 13. Last time: Lecture outline

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

STRONG CONVERSE FOR GEL FAND-PINSKER CHANNEL. Pierre Moulin

Achievable Error Exponents for the Private Fingerprinting Game

The Poisson Channel with Side Information

EXPURGATED GAUSSIAN FINGERPRINTING CODES. Pierre Moulin and Negar Kiyavash

5 Mutual Information and Channel Capacity

Lecture 5 Channel Coding over Continuous Channels

EECS 750. Hypothesis Testing with Communication Constraints

EE 4TM4: Digital Communications II. Channel Capacity

Coding into a source: an inverse rate-distortion theorem

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Lecture 2. Capacity of the Gaussian channel

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Information Theory and Hypothesis Testing

Upper Bounds on the Capacity of Binary Intermittent Communication

Generalized Writing on Dirty Paper

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Entropies & Information Theory

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Lecture 2: August 31

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Lecture 10: Broadcast Channel and Superposition Coding

Lecture 4 Noisy Channel Coding

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Exact Random Coding Error Exponents of Optimal Bin Index Decoding

ELEC546 Review of Information Theory

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Lecture 4 Channel Coding

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 15: Conditional and Joint Typicaility

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

Chaos, Complexity, and Inference (36-462)

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Lecture 1: Introduction, Entropy and ML estimation

ECE 4400:693 - Information Theory

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Covert Communication with Channel-State Information at the Transmitter

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

A Framework for Optimizing Nonlinear Collusion Attacks on Fingerprinting Systems

Intermittent Communication

WATERMARKING (or information hiding) is the

LECTURE 10. Last time: Lecture outline

Information Theory in Intelligent Decision Making

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Shannon s Noisy-Channel Coding Theorem

Reliable Communication Under Mismatched Decoding

Finding the best mismatched detector for channel coding and hypothesis testing

On Achievable Rates for Channels with. Mismatched Decoding

Coding on Countably Infinite Alphabets

Coding into a source: a direct inverse Rate-Distortion theorem

Keyless authentication in the presence of a simultaneously transmitting adversary

Lecture 22: Final Review

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Chapter 9 Fundamental Limits in Information Theory

Hypothesis Testing with Communication Constraints

Lecture 14 February 28

Homework Set #2 Data Compression, Huffman code and AEP

Noisy channel communication

Capacity of AWGN channels

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Simultaneous Nonunique Decoding Is Rate-Optimal

Shannon s Noisy-Channel Coding Theorem

Arimoto Channel Coding Converse and Rényi Divergence

The Gallager Converse

Ahmed S. Mansour, Rafael F. Schaefer and Holger Boche. June 9, 2015

ECE Information theory Final (Fall 2008)

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Variable Length Codes for Degraded Broadcast Channels

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

A New Metaconverse and Outer Region for Finite-Blocklength MACs

(Classical) Information Theory III: Noisy channel coding

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

IEEE Proof Web Version

Towards a Theory of Information Flow in the Finitary Process Soup

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

Source Coding with Lists and Rényi Entropy or The Honey-Do Problem

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Interactive Decoding of a Broadcast Message

Lecture 11: Quantum Information III - Source Coding

On the Duality between Multiple-Access Codes and Computation Codes

Quiz 2 Date: Monday, November 21, 2016

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

An Extended Fano s Inequality for the Finite Blocklength Coding

On Multiple User Channels with State Information at the Transmitters

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Transcription:

The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7, 2005 1

Outline Part I: General Concepts Introduction Definitions What is it useful for? Part II: Application to Information Hiding Performance guarantees against omnipotent attacker? Steganography, Watermarking, Fingerprinting 2

Part I: General Concepts 3

Reference Materials I. Csiszar, The Method of Types, IEEE Trans. Information Theory, Oct. 1998 (commemorative Shannon issue) A. Lapidoth and P. Narayan, Reliable Communication under Channel Uncertainty, same issue. Application areas: capacity analyses computation of error probabilities (exponential behavior) universal coding/decoding hypothesis testing 4

Basic Notation Discrete alphabets X and Y Random variables X, Y with joint pmf p(x, y) The entropy of X is H(X) = x X p(x) log p(x) (will sometimes be denoted by H(p X )) Joint entropy H(X, Y ) = x X y Y p(x, y) log p(x, y) The conditional entropy of Y given X is H(Y X) = p(x, y) log p(y x) x X y Y = H(X, Y ) H(X) 5

The mutual information between X and Y is I(X; Y ) = p(x, y) p(x, y) log p(x)p(y) x X y Y = H(Y ) H(Y X) The Kullback-Leibler divergence between pmf s p and q is D(p q) = x X p(x) log p(x) q(x) 6

Types Deterministic notion Given a length-n sequence x X n, count the frequency of occurrence of each letter of the alphabet X Example: X = {0, 1}, n = 12, x = 110100101110 contains 5 zeroes and 7 ones the sequence x has type ˆp x = ( 5 12, 7 12 ) ˆp x is also called empirical pmf. It may be viewed as a pmf over X Each ˆp x (x) is a multiple of 1 n. 7

Joint Types Given two length-n sequences x X n and y Y n, count the frequency of occurrence of each pair (x, y) X Y Example: x = 110100101110 y = 111100101110 (x, y) have joint type ˆp xy = Empirical pmf over X Y 4/12 1/12 0 7/12 8

Conditional Types By analogy with Bayes rule, define the conditional type of y given x as ˆp y x (y x) = ˆp xy(x, y) ˆp x (x) which is an empirical conditional pmf Example: x = 110100101110 y = 111100101110 ˆp y x = 4/5 1/5 0 1 9

Type Classes The type class T x is the set of all sequences that have the same type as x. Example: all sequences with 5 zeroes and 7 ones The joint type class T xy is the set of all sequences that have the same joint type as (x, y) The conditional type class T y x is the set of all sequences y that have the same type as y, conditioned on x 10

Information Measures Any type may be represented by a dummy sequence Can define empirical information measures: H(x) H(ˆp x ) H(y x) H(ˆp y x ) I(x; y) I(X; Y ) for (X, Y ) ˆp xy Will be useful to design universal decoders 11

Typicality Consider pmf p over X Length-n sequence x i.i.d. p. Notation: x p n Example: X = {0, 1}, n = 12, x = 110100101110 For large n, all typical sequences have approximately composition p This can be measured in various ways: Entropy ɛ-typicality: 1 n log pn (x) H(X) < ɛ Strong ɛ-typicality: max x X ˆp x (x) p(x) < ɛ both define sets of typical sequences 12

Application to Channel Coding Channel input x = (x 1,, x n ) X n, output y = (y 1,, y n ) Y n Discrete Memoryless Channel (DMC): p n (y x) = n i=1 p(y i x i ) Many fundamental coding theorems can be proven using the concept of entropy typicality. Examples: Shannon s coding theorem (capacity of DMC) Rate-distortion bound for memoryless sources 13

Many fundamental coding theorems cannot be proved using the concept of entropy typicality. Examples: precise calculations of error log-probability various kinds of unknown channels So let s derive some useful facts about types Number of types (n + 1) X (polynomial in n) Size of type class T x : (n + 1) X e nh(ˆp x) T x e nh(ˆp x) Ignoring polynomial terms, we write T x =. e nh(ˆp x) 14

Probability of x under distribution p n : p n (x) = x X p(x) nˆp x(x) = e n x X ˆp x(x) log p(x) = e n[h(ˆp x)+d(ˆp x p)] same for all x in the same type class Probability of type class T x under distribution p n : P n (T x ) = T x p n (x) =. e nd(ˆp x p) Similarly: T y x. = e nh(ˆp y x) P n Y X (T y x x). = e nd(ˆp xy p Y X ˆp x ) 15

Constant-Composition Codes All codewords have the same type ˆp x Random coding: generate codewords x m, m M randomly and independently from uniform pmf on type class T x Note that channel outputs have different types in general 16

Unknown DMC s Universal Codes Channel p Y X is revealed neither to encoder nor to decoder neither encoding rule nor decoding rule may depend on p Y X C = max p X min I(X; Y ) p Y X Universal codes: same error exponent as in known-p Y X case (existence?) Encoder: select T x, use constant-composition codes Decoder: uses Maximum Mutual Information rule ˆm = argmax m M I(x m ; y) = argmin m M H(y x m ) Note: the GLRT decoder is in general not universal (GLRT: first estimate p Y X, then plug in ML decoding rule) 17

Key idea in proof Denote by D m Y n the decoding region for message m Polynomial number of type classes, forming a partition of Y n Given that m was transmitted, partition error event y Y n \ D m into a union over type classes: y T y xm \ D m T y x m 18

The probability of the error event is therefore given by P r[error m] = P r T y xm \ D m T y x m P r [ T y xm \ D m ] T y x m. = max T y x m = max T y x m. = max T y x m P r [ T y xm \ D m ] P r[t y xm ] T y x m \ D m T y xm e nd(ˆp xmy p Y X ˆp x m ) T y x m \ D m T y xm the worst conditional type class dominates error probability Calculation mostly involves combinatorics: finding out T y xm \ D m 19

Extensions Channels with memory Arbitrary Varying Channels randomized codes Continuous alphabets (difficult!) 20

Part II: Applications to WM 21

Reference Materials [SM 03 ] A. Somekh-Baruch and N. Merhav, On the Error Exponent and Capacity Games of Private Watermarking Systems, IEEE Trans. Information Theory, March 2003 [SM 04 ] A. Somekh-Baruch and N. Merhav, On the Capacity Game of Public Watermarking Systems, IEEE Trans. Information Theory, March 2004 [MO 03 ] P. Moulin and J. O Sullivan, Information-Theoretic Analysis of Information Hiding, IEEE Trans. Information Theory, March 2003 [MW 04 ] P. Moulin and Y. Wang, Error Exponents for Channel Coding with Side Information, preprint, Sep. 2004 22

Communication Model for Data Hiding Message M Encoder Attack Decoder x y f( s,m, k) p( y x) g( y, k) M^ Host s Key k Memoryless host sequence s Message M uniformly distributed over {1, 2,, 2 nr } Unknown attack channel p(y x) Randomization via secret key sequence k, arbitrary alphabet K 23

Attack Channel Model First IT formulations of this problem assumed a fixed attack channel (e.g., AWGN) or a family of memoryless channels (1998-1999) Memoryless assumption was later relaxed (2001) We ll just require the following distortion constraint: d n (x, y) n d(x i, y i ) D 2 x, y (wp1) i=1 unknown channel with arbitrary memory Similarly the following embedding constraint will be assumed: d n (s, x) D 1 s, k, m, x (wp1) 24

Data-Hiding Capacity [SM 04] Single-letter formula: C(D 1, D 2 ) = sup min [I(U; Y ) I(U; S)] p(x,u s) Q(D 1 ) p(y x) A(D 2 ) where U is an auxiliary random variable Q(D 1 ) = {p XU S : x,u,s p(x, u s)p(s)d(s, x) D 1} A(D 2 ) = {p Y X : x,y p(y x)p(x)d(x, y) D 2} Same capacity formula as in [MO 03], where p(y x) was constrained to belong to the family A n (D 2 ) of memoryless channels Why? 25

Achievability sketch of the proof Random binning construction Randomly-permuted, constant-composition code Given n, solve minmax problem over types ˆp(x, u s) and ˆp(y x). solution (ˆp x u s, ˆp y x ) All codewords u(l, m) are drawn uniformly from type class T u Given m, s, select codeword u(l, m) such that (u(l, m), s) T u s Then generate x uniformly from cond l type class T x u(l,m),s Maximum mutual information decoder: ˆm = argmax l,m I(u(l, m); y) 26

Achievability What is the worst attack? The worst p(y x) is uniform over a single conditional type Example: x = 110100101110 y = 111100101110 Then all sequences y that differ from x by exactly one bit are equally likely For a memoryless attack, p(y x) is uniform over multiple conditional types Memory does not help the attacker! Capacity is the same as in the memoryless case 27

Converse For any code with rate R > C(D 1, D 2 ), there exists an attack p(y x) such that reliable decoding is impossible Claim is proven by restricting search over attack channels that are uniform over a single conditional type Proof is similar to memoryless case, using Fano s inequality and Marton s telescoping technique 28

Error Exponents [MW 04] Obtain E r (R) E(R) E sp (R) for all R < C where [ E(R) lim sup 1 ] n n log P e,n and P e,n = min p F n,gn is the minmax probability of error max P e (F N, G N, p Y X ) p Y X Random-coding exponent E r (R) is obtained using a modification of the binning method above. Still randomly-permuted, constant-composition codes. Sphere-packing exponent E sp (R) is obtained by restricting search over attack channels that are uniform over a single conditional type 29

Communication Model for Steganography p S Steganalyzer p =p? S X Source M NR {1,..., 2 } Message p X Encoder Attack Decoder S fn X A( y x ) Y φn M^ Secret key K Distortion D 1 Distortion D 2 Would like p S = p X for perfect security This is hard matching n-dim pmf s Can be done using the randomization techniques described earlier make x is uniform over type classes Capacity formula is still of the form C(D 1, D 2 ) = sup p(x,u s) min [I(U; Y ) I(U; S)] p(y x) 30

Conclusion Method of types is based on combinatorics Polynomial number of types Useful to determine capacity and error exponents Randomized codes, universal decoders Natural concepts in presence of an adversary 31