( 1 k "information" I(X;Y) given by Y about X)

Size: px

Start display at page:

Download "( 1 k "information" I(X;Y) given by Y about X)"

Ophelia Lee
5 years ago
Views:

1 SUMMARY OF SHANNON DISTORTION-RATE THEORY Consider a stationary source X with f (x) as its th-order pdf. Recall the following OPTA function definitions: δ(,r) = least dist'n of -dim'l fixed-rate VQ's w. rate R δ(,n,r) = least dist'n of -dim'l VQ's w. nth-order bloc lossless coding and rate R δ(r) = inf δ(,r) = = inf,n δ(,n,r) = least dist'n of VQ's with rate R, any dimension and fixed- or variable-rate coding These functions describe the best possible performance of VQ's. High-Resolution Theory enabled us to find concrete formulas for them (the Zador-Gersho formulas) for the case that R is large. Shannon's distortion-rate theory enables one to find δ(r) for ANY value of R. However, it does not allow us to find δ(,r) or δ(,n,r), not even for some R's. The ey result is the following. April 4, 2005 Sh- Shannon's Distortion-Rate Theorem For a stationary, ergodic source with finite variance. δ(r) = D(R) OPTA function = Shannon's DRF where D(R) = lim D(,R) = Shannon's "distortion-rate function" D(,R) = inf q Q (R) E X-Y 2 X = (X...X ) random variables from source Y = (Y...Y ) random variables from test channel q with X as input Q (R) = set of conditional probability densities, called "test channels" = {q(y x): f(x) q(y x) log f(x)q(y x) 2 dx dy R} f(y) ( "information" I(X;Y) given by Y about X) E X-Y 2 is computed wrt to joint density f(x,y) = f(x)q(y x) April 4, 2005 Sh-2

2 δ(r) is defined by a minimum over actual quantizers. D(R) is defined by a minimum over hypothetical conditional probability distributions. There is no straightforward connection between the δ(r) and D(R). This theorem is one of the deep and central results of information theory. Its proof can be found in information theory textboos. As does most of information theory, the proof uses the asymptotic equipartition property, which in turn derives from the law of large numbers. We'll setch some ideas of the proof later. The theorem says two things: Positive statement: For any R, there exist VQ's with rate R or less having MSE arbitrarily close to D(R). (The proof shows there exist fixed-rate codes.) Negative statement: For any R, every VQ with rate R or less (fixed- or variable-rate) has MSE greater than or equal to D(R). Unfortunately, this theorem does not indicate how large the dimension needs to be to able to attain distortion close to D(R). Fortunately, Zador's theorem does enable us to learn how large the dimension needs to be, at least for large R, which is why we have focused in this course on Zador's rather than Shannon's theorem. April 4, 2005 Sh-3 The test channels introduced in the definition of D(R) are not to be considered codes or any other part of an actual physical system. Although the definition of D(R) is quite complex, there are cases, such as Gaussian sources, where it can be reduced to a closed form or parametric expression. In other cases, the "Blahut algorithm" can at least be used to compute D(,R), and if is large, D(,R) D(R). Unfortunately, the Blahut algorithm becomes very complex for large. So in practice it is extremely difficult to compute D(R), except in special cases such as IID or Gaussian sources. Because D(R) can be so difficult ot compute, upper and lower bounds to it have been developed, which can serve as approximations. Shannon's theorem is often stated in the following equivalent form: γ(d) = R(D) where γ(d) is the rate vs. distortion OPTA function, defined as the least rate of any lossy source code with distortion D or less (it is the inverse of δ(r)), and R(D) is the "Shannon rate-distortion function", which is the inverse of D(R). In fact, Shannon originally stated the theorem in this form, and the subject is usually called "rate-distortion theory". April 4, 2005 Sh-4

3 The theorem generalizes to other measures of distortion between vectors of the form d(x,y) = d(xi,y i ) i= where d(x,y) is some distortion measure between individual samples. d(x,y) is called a per-letter distortion measure. April 4, 2005 Sh-5 THE COMPLEMENTARY NATURE OF SHANNON DISTORTION-RATE THEORY AND ZADOR'S HIGH-RESOLUTION THEORY Consider Fixed-Rate Coding Shannon Theory: For large and any R, δ(,r) δ(r) = D(R) High-Resolution Theory: For large R and any : δ(,r) Z(,R) For large and large R, they agree: δ(r) = D(R) δ(,r) Z(,R) Important Note: δ(,r) D(,R) All we can say is δ(,r) > D(,R), for all,r δ(,r) Z(,R), when,r large April 4, 2005 Sh-6

4 RELATIONSHIPS BETWEEN THE DISTORTION-RATE FUNCTION AND THE ZADOR FUNCTION The following can be shown directly from definitions (they also follow from what we now about the operational significance of D(,R) and Z(,R)): D(,R) /2πe m * Z(,R) The ratio of the left and right sides goes to one as R. D(R) Z(R) The ratio of the left and right sides goes to one as R. Sometimes they are equal for sufficiently large values of R. The above inequalities are called Shannon-Lower Bounds. They are restated and proved later in Property 2. April 4, 2005 Sh-7 PROPERTIES OF THE DISTORTION-RATE FUNCTION These properties are derived by directly using and manipulating the definitions of D(,R) and D(R).. D(0) = D(,0) = σ 2 2. D(R) > 0 and D(,R) > 0, R 0 D D(R) D(,R) D(2,R) D(3,R) R 3. D(R) and D(,R) decrease montonically to zero as R increases. 4. D(R), D(,R) are convex (and consequently continuous) functions of R. 5. The D(,R)'s are subadditive. That is for any, m, R D(+m,R) +m D(,R) + m +m D(m,R) From which it follows that D(R) D(n,R) D(,R) D(,R) for all. D(R) = inf D(,R) Thus, D(,R)'s tend to decrease with, but not necessarily monotonically. 6. D(R) = D(,R) when the source is IID. April 4, 2005 Sh-8

5 7. For an IID Gaussian source D(R) = D(,R) = σ 2 2-2R Derivation: First recall that for an IID Gaussian source, the first-order differential entropy is h = 2 log 2πeσ2. Therefore, the Shannon lower bound gives D(,R) /2πe m * Z(,R) = /2πe m * m * 2-2h -2R = σ 2 2-2R The derivation is completed by showing D(,R) σ 2 2-2R. This is accomplished by verifying that the test channel q(y x) = 2πb exp{- (y-ax)2 2b }, with a = -2-2R and b = 2-2R (-2-2R )σ 2, has I(X;Y) R and E (X-Y) 2 = σ 2 2-2R. It then follows from the definition of D(,R) that D(,R) E (X-Y) 2 = σ 2 2-2R. 8. For a first-order AR Gaussian source with correlation coef. ρ D(R) = Z(R) = σ 2 (-ρ 2 ) 2-2R for R R o = 2 log 2 (+ ρ ) 2 No closed form expression for other R's. This property follows from the next. April 4, 2005 Sh-9 9. For a stationary Gaussian source with power spectral density S(ω), D(R) = Z(R) = Q 2-2R, R R o = 2 log Q 2 S min where S min is the minimum value of S(ω), and Q is the mean-squared error of the best linear prediction of X i based on all past values of X: π Q = exp{ ln S(ω) dω } 2π -π For R R o, there is no closed form expression for D(R). However, the following parametric expression applies for all values of R. For any θ, 0 θ S max, where S min is the max value of S(ω), where D θ = D(R θ ), R θ = D θ = 2π -π 2π π -π π max {0, 2 log S(ω) 2 θ } dω min {θ, S(ω)} dω Interpretation: For a given θ, all frequences ω for which S(ω) θ, contribute S(ω) 2 log θ to the rate, and θ to the distortion. All frequencies ω for which S(ω) < θ are discarded (e.g. filtered out). They contribute 0 to the rate, and S(ω) to the distortion. April 4, 2005 Sh-0

6 Special cases: θ = 0 R θ =, D θ = 0 θ = S max R = 0, D θ = σ 2 θ S min R θ = 2π -π D θ = θ. R(D) = π 2 log S(ω) 2 θ dω, 2π π -π D(R) θ = S max R o 2 log 2 S(ω) D dω = 2 log 2 Q D, D S min D(R) = Q 2-2R, R 2 log 2 Q S min = R o. θ = S min θ 0 R For the AR source, Property 8 follows from Property 9: S(ω) = σ 2 -ρ2-2ρcos(ω)+ω2, S(ω) S max S min = σ 2 - ρ + ρ, S max = σ 2 + ρ - ρ, Q = σ 2 (-ρ2), R o = 2 log 2 (+ ρ ) 2 S min -π -π ω April 4, 2005 Sh- 0. There are a few other sources for which D(R) can be computed analytically. For other sources, D(R) must be computed numerically. The most well nown agorithm is that of Blahut for computing D(,R). Because it is hard to compute, various upper and lower bounds have been found for D(,R) and D(R). Two are given below.. An upper bound: For any source, D(R) and D(,R) are bounded from above by the corresponding functions for a Gaussian source with the same autocorrelation function (equivalently, same power spectral density). It follows that Gaussian sources are the hardest to compress among those sources with a given autocorrelation function. April 4, 2005 Sh-2

7 2. Shannon lower bounds Let X be a stationary source. For any and R D(,R) 2πe 2 2h -2R = /2πe m * Z(,R) D(R) 2πe 2 2h -2R = Z(R) where h = h(x,...,x ) is the th order differental entropy of X h = lim h is the differential entropy rate of X m * = minimum value of any valid inerital profile Z(,R) and Z(R) = lim Z(,R) are Zador functions Note: The ratio of the left and right sides of each bound can be shown to go to one as R. Sometimes equality hold for large R. Derivation: To derive lower bound to D(,R), consider any test channel q that is allowed to be used in the definition of D(,R), i.e. any q s.t. I(X;Y) = f(x) q(y x) log f(x)q(y x) 2 f(y) dx dy R. We will show: E X-Y 2 2πe 2 2h -2R. (* ) April 4, 2005 Sh-3 Since this holds for any valid q, it holds for the q that minimizes E X-Y 2. Therefore, for the minimizing q D(,R) = E X-Y 2 2πe 22h -2R = /2πe m * Z(,R) where the last equality comes from the definition of Z(,R). This derives the Shannon lower bound to D(,R). The shannon lower bound to D(R) follows by taing the limit as grows to infinity. It remains only to derive (*), which we do using the following lemma from information theory. Fano's Lemma for MSE: If X and Y are -dimensional random vectors, then E X-Y 2 2πe 22 h(x Y) By manipulating the defining formula for I(X;Y), one may straightforwardly show h(x Y) = h(x) - I(X;Y) = h - I(X;Y) by the definition of h h - R by the choice of q Substituting this into the lower bound given in Fano's Lemma yields (*), and finishes the derivation. April 4, 2005 Sh-4

Summary of Shannon Rate-Distortion Theory

Summary of Shannon Rate-Distortion Theory Consider a stationary source X with kth-order probability density function denoted f k (x). Consider VQ with fixed-rate coding. Recall the following OPTA function