Coding into a source: an inverse rate-distortion theorem

Coding into a source: an inverse rate-distortion theorem Anant Sahai joint work with: Mukul Agarwal Sanjoy K. Mitter Wireless Foundations Department of Electrical Engineering and Computer Sciences University of California at Berkeley LIDS Department of Electrical Engineering and Computer Sciences Massachusetts Institute of Technology 2006 Allerton Conference on Communication, Control, and Computation Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 1 / 27

Suppose the aliens landed... Your mission: reverse-engineer their communications technology Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 2 / 27

Suppose the aliens landed... Your mission: reverse-engineer their communications technology What assumptions should you make? Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 2 / 27

Suppose the aliens landed... Your mission: reverse-engineer their communications technology What assumptions should you make? They are already here! Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 2 / 27

Does information theory determine architecture? Channel coding is an optimal interface. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 3 / 27

Does information theory determine architecture? Channel coding is an optimal interface. Is it the unique interface? Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 3 / 27

Outline 1 Motivation and introduction 2 The equivalence perspective Communication problems Separation as equivalence 3 Main result: a direct converse for rate-distortion Basic: finite-alphabet sources Extension: real-alphabet with difference distortion Conditional rate-distortion: steganography with a public cover-story 4 Application: understanding information within unstable processes 5 Conclusions and open problems Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 4 / 27

Abstract model of communication problems U t 1 W t Specified Source S t X t Designed Encoder E t Common Randomness R 1 V t Feedback and Memory From Pattern I Y t Noisy Channel f c 1 Step delay U t Designed Decoder D t Z t Problem: Source S, Information pattern I, and Objective V. Constrained resource: Noisy channel f c Designed solution: Encoder E, Decoder D Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 6 / 27

Focus: what channels are good enough Channel f c solves the problem if E, D so system satisfies V Problem A is harder than problem B if any f c that solves A solves B. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 7 / 27

Focus: what channels are good enough Channel f c solves the problem if E, D so system satisfies V Problem A is harder than problem B if any f c that solves A solves B. Information theory is an asymptotic theory Pick V family with an appropriate slack parameter Consider the set of channels that solve the problem. Take union over slack parameter choices. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 7 / 27

Shannon bit-transfer problems A R,ǫ,d Source: noninteractive X i (R bits): fair coin tosses Information pattern: Di has access to Z1 i Ei gets access to X1 i Performance objective: V(ǫ, d) is satisfied if P(X i U i+d ) ǫ for every i 0. Slack parameter: permitted delay d Natural orderings: larger ǫ, d is easier but larger R is harder. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 8 / 27

Shannon bit-transfer problems A R,ǫ,d Source: noninteractive X i (R bits): fair coin tosses Information pattern: Di has access to Z1 i Ei gets access to X1 i Performance objective: V(ǫ, d) is satisfied if P(X i U i+d ) ǫ for every i 0. Slack parameter: permitted delay d Natural orderings: larger ǫ, d is easier but larger R is harder. Classical capacity C R = ǫ>0 R <R d>0 {f c f c solves A R,ǫ,d} C Shannon (f c ) = sup{r > 0 f c C R } Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 8 / 27

Estimation to a distortion constraint: A (FX,ρ,D,ǫ,d) Source: noninteractive X i drawn iid from F X Same information pattern Performance objective: V(ρ, D, ǫ, d) is satisfied if for all τ, lim n P( 1 τ+n 1 n i=τ ρ(x i, U i+d ) > D) ǫ. Slack parameter: permitted delay d Natural orderings: larger D, d,ǫ is easier Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 10 / 27

Estimation to a distortion constraint: A (FX,ρ,D,ǫ,d) Source: noninteractive X i drawn iid from F X Same information pattern Performance objective: V(ρ, D, ǫ, d) is satisfied if for all τ, lim n P( 1 τ+n 1 n i=τ ρ(x i, U i+d ) > D) ǫ. Slack parameter: permitted delay d Natural orderings: larger D, d,ǫ is easier Channels that are good enough C e,(fx,ρ,d) = ǫ>d D >D d>0 {f c f c solves A (FX,ρ,D,ǫ,d)} Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 10 / 27

Estimation to a distortion constraint: A (FX,ρ,D,ǫ,d) Source: noninteractive X i drawn iid from F X Same information pattern Performance objective: V(ρ, D, ǫ, d) is satisfied if for all τ, lim n P( 1 τ+n 1 n i=τ ρ(x i, U i+d ) > D) ǫ. Slack parameter: permitted delay d Natural orderings: larger D, d,ǫ is easier Channels that are good enough C e,(fx,ρ,d) = ǫ>d D >D d>0 Separation Theorem if ρ is finite. {f c f c solves A (FX,ρ,D,ǫ,d)} (C R(D) C m ) = (C e,(fx,ρ,d) C m ) Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 10 / 27

Classical separation revisited Source codes Shannon Bit Transfer Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 12 / 27

Classical separation revisited Source codes Shannon Bit Transfer Equivalence proved using mutual information characterizations of R(D) and C Shannon. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 12 / 27

Classical separation revisited Source codes Shannon Bit Transfer Equivalence proved using mutual information characterizations of R(D) and C Shannon. Bidirectional reductions at the problem level. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 12 / 27

Main Result Suppose we have a family of black-box systems indexed by ǫ that can communicate streams from input alphabet {X } satisfying τ: τ+n 1 lim P(1 ρ(x i, ˆX i ) > D) ǫ n n i=τ Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 14 / 27

Main Result Suppose we have a family of black-box systems indexed by ǫ that can communicate streams from input alphabet {X } satisfying τ: τ+n 1 lim P(1 ρ(x i, ˆX i ) > D) ǫ n n i=τ The distortion measure ρ is non-negative and additive. The probability measure P is for {X i } iid according to P(x) Then, assuming access to common-randomness at both the encoder and decoder, it is possible to reliably communicate bits at all rates R < R(D) bits per source symbol over the black-box system so that the probability of bit error is as small as desired. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 14 / 27

Relationships to existing perspectives Coding theory Faith in distance rather than probabilistic model for channel. Generalizes Hamming distance to arbitrary additive distortion measures. Arbitrarily varying channels Attacker is not limited to a set of attack channels. Attacker is not forced to be causal. Attacker only constrained on long blocks, not individual letters. Attacker only promises low distortion for inputs that look like iid P(x). Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 15 / 27

Relationships to existing perspectives Coding theory Faith in distance rather than probabilistic model for channel. Generalizes Hamming distance to arbitrary additive distortion measures. Arbitrarily varying channels Attacker is not limited to a set of attack channels. Attacker is not forced to be causal. Attacker only constrained on long blocks, not individual letters. Attacker only promises low distortion for inputs that look like iid P(x). Steganography/Watermarking No cover-text In conditional case, a cover-story instead Otherwise, Merhav and Somekh-Baruch closest to this work. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 15 / 27

Finite alphabet case Random encoder Nearest typical neighbor decoder Error events: Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 16 / 27

Finite alphabet case Random encoder Randomly draw 2 nr length n codewords iid according to P X using common randomness. Nearest typical neighbor decoder Error events: Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 16 / 27

Finite alphabet case Random encoder Randomly draw 2 nr length n codewords iid according to P X using common randomness. Transmit the one corresponding to the message. Nearest typical neighbor decoder Define the typical codeword set CR to be codewords with type P X ± ǫ for small ǫ. Let y n 1 be the received string. Decode to ˆX 1 n C R closest to y n 1 in a ρ-distortion sense. Error events: Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 16 / 27

Finite alphabet case Random encoder Randomly draw 2 nr length n codewords iid according to P X using common randomness. Transmit the one corresponding to the message. Nearest typical neighbor decoder Define the typical codeword set CR to be codewords with type P X ± ǫ for small ǫ. Let y n 1 be the received string. Decode to ˆX 1 n C R closest to y n 1 in a ρ-distortion sense. Error events: Atypical codeword X n 1. Probability tends to 0. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 16 / 27

Finite alphabet case Random encoder Randomly draw 2 nr length n codewords iid according to P X using common randomness. Transmit the one corresponding to the message. Nearest typical neighbor decoder Define the typical codeword set CR to be codewords with type P X ± ǫ for small ǫ. Let y n 1 be the received string. Decode to ˆX 1 n C R closest to y n 1 in a ρ-distortion sense. Error events: Atypical codeword X n 1. Probability tends to 0. Excess average distortion (> D) on true codeword. Probability tends to 0 by assumption. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 16 / 27

Finite alphabet case Random encoder Randomly draw 2 nr length n codewords iid according to P X using common randomness. Transmit the one corresponding to the message. Nearest typical neighbor decoder Define the typical codeword set CR to be codewords with type P X ± ǫ for small ǫ. Let y n 1 be the received string. Decode to ˆX 1 n C R closest to y n 1 in a ρ-distortion sense. Error events: Atypical codeword X n 1. Probability tends to 0. Excess average distortion (> D) on true codeword. Probability tends to 0 by assumption. There exists a false codeword has average distortion < D Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 16 / 27

Probability of false codeword being close Suppose y n 1 has type q Y. Consider z n 1 C R that is false. Let q XY be the resulting joint type. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 18 / 27

Probability of false codeword being close Suppose y n 1 has type q Y. Consider z n 1 C R that is false. Let q XY be the resulting joint type. q X p X ± ǫ Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 18 / 27

Probability of false codeword being close Suppose y n 1 has type q Y. Consider z n 1 C R that is false. Let q XY be the resulting joint type. q X p X ± ǫ E qxy [ρ(x, Y)] D if an error. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 18 / 27

Probability of false codeword being close Suppose y n 1 has type q Y. Consider z n 1 C R that is false. Let q XY be the resulting joint type. q X p X ± ǫ E qxy [ρ(x, Y)] D if an error. nq Y (1) nq Y (j) nq Y ( Y )...... Blown up nq Y (j)q X Y (1 j) nq Y (j)q X Y (i j) nq Y (j)q X Y ( X j)...... Sort y n 1 and correspondingly shuffle codeword zn 1 Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 18 / 27

Probability of false codeword being close Suppose y n 1 has type q Y. Consider z n 1 C R that is false. Let q XY be the resulting joint type. q X p X ± ǫ E qxy [ρ(x, Y)] D if an error. nq Y (1) nq Y (j) nq Y ( Y )...... Blown up nq Y (j)q X Y (1 j) nq Y (j)q X Y (i j) nq Y (j)q X Y ( X j)...... Sort y n 1 and correspondingly shuffle codeword zn 1 P(Z n 1 collides) j 2 nq Y(j)D(q X Y=j p X ) = 2 nd(q XY p X q Y ) Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 18 / 27

Bounding probability of collision Union bound over 2 nr codewords and (n + 1) X Y joint types q XY. Minimize divergence D(q XY p X q Y ) over set of joint types q XY satisfying: EqXY [ρ(x, Y)] D qx p X ± ǫ Key step: D(q XY p X q Y ) = D(q X p X ) + D(q XY q X q Y ) D(q XY q X q Y ) = I q (X; Y) Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 19 / 27

Bounding probability of collision Union bound over 2 nr codewords and (n + 1) X Y joint types q XY. Minimize divergence D(q XY p X q Y ) over set of joint types q XY satisfying: EqXY [ρ(x, Y)] D qx p X ± ǫ Key step: D(q XY p X q Y ) = D(q X p X ) + D(q XY q X q Y ) D(q XY q X q Y ) = I q (X; Y) Minimizing I q (X; Y) gives R(D) when ǫ 0 Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 19 / 27

Bounding probability of collision Union bound over 2 nr codewords and (n + 1) X Y joint types q XY. Minimize divergence D(q XY p X q Y ) over set of joint types q XY satisfying: EqXY [ρ(x, Y)] D qx p X ± ǫ Key step: D(q XY p X q Y ) = D(q X p X ) + D(q XY q X q Y ) D(q XY q X q Y ) = I q (X; Y) Minimizing I q (X; Y) gives R(D) when ǫ 0 If R < R(D), collision probability 0 with n. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 19 / 27

Extending to real vector alphabets Compact support and difference-distortion Same codebook construction Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 21 / 27

Extending to real vector alphabets Compact support and difference-distortion Same codebook construction Pick a fine enough quantization Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 21 / 27

Extending to real vector alphabets Compact support and difference-distortion Same codebook construction Pick a fine enough quantization Reduces to finite alphabet case at decoder with a small factor increase in distortion. Unbounded support New codebook construction: View F X(x) = (1 δ)f X(x) + δf X(x) where X has compact support. First flip n iid unfair coins with P(head) = δ. Mark positions as dirty or clean. Draw codewords from X in clean positions and X in dirty ones. Modified decoding rule: Declare error if fewer than (1 2δ)n clean positions. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 21 / 27

Extending to real vector alphabets Compact support and difference-distortion Same codebook construction Pick a fine enough quantization Reduces to finite alphabet case at decoder with a small factor increase in distortion. Unbounded support New codebook construction: View F X(x) = (1 δ)f X(x) + δf X(x) where X has compact support. First flip n iid unfair coins with P(head) = δ. Mark positions as dirty or clean. Draw codewords from X in clean positions and X in dirty ones. Modified decoding rule: Declare error if fewer than (1 2δ)n clean positions. Restrict codebook to clean positions only for decoding purposes. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 21 / 27

Extending to real vector alphabets Compact support and difference-distortion Same codebook construction Pick a fine enough quantization Reduces to finite alphabet case at decoder with a small factor increase in distortion. Unbounded support New codebook construction: View F X(x) = (1 δ)f X(x) + δf X(x) where X has compact support. First flip n iid unfair coins with P(head) = δ. Mark positions as dirty or clean. Draw codewords from X in clean positions and X in dirty ones. Modified decoding rule: Declare error if fewer than (1 2δ)n clean positions. Restrict codebook to clean positions only for decoding purposes. Increases distortion by a factor of 1+2δ 1 2δ. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 21 / 27

Extending to real vector alphabets Compact support and difference-distortion Same codebook construction Pick a fine enough quantization Reduces to finite alphabet case at decoder with a small factor increase in distortion. Unbounded support New codebook construction: View F X(x) = (1 δ)f X(x) + δf X(x) where X has compact support. First flip n iid unfair coins with P(head) = δ. Mark positions as dirty or clean. Draw codewords from X in clean positions and X in dirty ones. Modified decoding rule: Declare error if fewer than (1 2δ)n clean positions. Restrict codebook to clean positions only for decoding purposes. Increases distortion by a factor of 1+2δ 1 2δ. lim 0 lim δ 0 R,δ (D 1+2δ 1 2δ ) = R(D) Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 21 / 27

Conditional Rate-Distortion Assume coverstory V1 n drawn according to P(V) is known to all parties: encoder, decoder, and attacker. All R < R X V (D) are achievable. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 23 / 27

Conditional Rate-Distortion Assume coverstory V1 n drawn according to P(V) is known to all parties: encoder, decoder, and attacker. All R < R X V (D) are achievable. Encoder draws conditionally using P(X V). Codeword typicality defined similarly (include V n 1 ) Nearest-conditionally-typical codeword decoding Parallel proof nq V (1) nq V (s) nq V ( V )...... nq V (s)q Y V (1 s) Blown up nq V (s)q Y V (j s) nq V (s)q Y V ( Y s)...... Blown up nq V (s)q Y V (j s)q X VY (i s,j)...... Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 23 / 27

Unstable Markov Processes: R(D) X t+1 = AX t + W t where A > 1 1 Squared error distortion 0.8 0.6 0.4 0.2 "Sequential" Rate distortion (obeys causality) Rate distortion curve (non causal) Stable counterpart (non causal) 0 0.5 1 1.5 2 2.5 3 3.5 Rate (in bits) Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 25 / 27

Unstable Markov Processes: two kinds of information Accumulation: Look at {X kn } Can embed R1 < n log 2 A bits per symbol These bits are recovered with anytime reliability if black-box has finite error moments. Dissipation: Look at {X kn 1 k(n 1)+1 X kn} Can be transformed to look iid Fall under our results Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 26 / 27

Unstable Markov Processes: two kinds of information Accumulation: Look at {X kn } Can embed R1 < n log 2 A bits per symbol These bits are recovered with anytime reliability if black-box has finite error moments. Dissipation: Look at {X kn 1 k(n 1)+1 X kn} Can be transformed to look iid Fall under our results Can embed R2 < R(D) log 2 A bits per symbol Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 26 / 27

Unstable Markov Processes: two kinds of information Accumulation: Look at {X kn } Can embed R1 < n log 2 A bits per symbol These bits are recovered with anytime reliability if black-box has finite error moments. Dissipation: Look at {X kn 1 k(n 1)+1 X kn} Can be transformed to look iid Fall under our results Can embed R2 < R(D) log 2 A bits per symbol Two-tiered nature of information-flow proved by direct reduction. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 26 / 27

Conclusions and open problems Embedding codes Source codes Already extend to stationary-ergodic processes that mix. Shannon Bit Transfer Traditional point-to-point source-channel separation is a consequence of a problem-level equivalence that can be proved using direct reductions in both directions. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 27 / 27

Conclusions and open problems Embedding codes Source codes Shannon Bit Transfer Already extend to stationary-ergodic processes that mix. Can the gap between R seq (D) and R(D) be used to carry information? Traditional point-to-point source-channel separation is a consequence of a problem-level equivalence that can be proved using direct reductions in both directions. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 27 / 27

Conclusions and open problems Embedding codes Shannon Bit Transfer Source codes Traditional point-to-point source-channel separation is a consequence of a problem-level equivalence that can be proved using direct reductions in both directions. Already extend to stationary-ergodic processes that mix. Can the gap between R seq (D) and R(D) be used to carry information? Apply equivalence perspective to better understand multiterminal problems. Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 27 / 27

Conclusions and open problems Embedding codes Shannon Bit Transfer Source codes Traditional point-to-point source-channel separation is a consequence of a problem-level equivalence that can be proved using direct reductions in both directions. Already extend to stationary-ergodic processes that mix. Can the gap between R seq (D) and R(D) be used to carry information? Apply equivalence perspective to better understand multiterminal problems. Is there another reason to suspect that channel coding is fundamental to communication? Anant Sahai (UC Berkeley) Inverse Rate Distortion Sep 27, 2006 27 / 27