On MMSE Estimation from Quantized Observations in the Nonasymptotic Regime

Size: px

Start display at page:

Download "On MMSE Estimation from Quantized Observations in the Nonasymptotic Regime"

Alexandra Bennett
5 years ago
Views:

1 On MMSE Estimation from Quantized Observations in the Nonasymptotic Regime Jaeho Lee, Maxim Raginsky, and Pierre Moulin Abstract This paper studies MMSE estimation on the basis of quantized noisy observations It presents nonasymptotic bounds on MMSE regret due to quantization for two settings: () estimation of a scalar random variable given a quantized vector of n conditionally independent observations, and () estimation of a p-dimensional random vector given a quantized vector of n observations (not necessarily independent) when the full MMSE estimator has a subgaussian concentration property Index Terms MMSE estimation, vector quantization, indirect rate distortion problems, nonasymptotic bounds I INTRODUCTION Minimum mean-square error (MMSE) estimation is a fundamental primitive in communications, signal processing, and data analytics In today s applications, the estimation task is often performed on high-dimensional data collected, and possibly preprocessed, at multiple remote locations Consequently, attention must be paid to communication constraints and their impact on the MMSE One strategy for reducing the communication burden is to compress the observations using vector quantization (VQ) The idea of quantization for estimation and control can be traced back to the work of Curry [, who considered in detail the jointly Gaussian case, derived a modification of the Kalman filter for use with quantized inputs, and developed approximation-based schemes for nonlinear systems Because VQ inevitably introduces loss, it is of interest to characterize the resulting MMSE regret, ie, the difference between the optimal performance achievable with quantized observations and the optimal performance that can be attained without quantization The problem of designing an optimal vector quantizer to minimize the MMSE regret is equivalent to a noisy source coding problem with the quadratic fidelity criterion, where the compressor acts on the vector of observations and the decompressor generates an estimate of the target random vector This equivalence was systematically studied by Wolf and Ziv [, who showed that there is no loss of optimality if we first compute the full MMSE estimate and then compress it using an optimal quantizer Ephraim and Gray [3 later extended this result to a more general class of weighted quadratic distortion functions and gave conditions for convergence of the Lloyd-type iterative algorithm for quantizer design The authors are with the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University of Illinois, Urbana, IL 680, USA s: jlee60,maxim,pmoulin}@illinoisedu Research supported in part by DARPA under grant no N and in part by NSF under CAREER award no CCF-5404 The main message of the above works is that the problem of quantizer design for minimum MMSE regret is a functional compression problem: the compressed representation of X should retain as much ormation as possible about the conditional mean η(x) E[ X of the target given X, also known as the regression function However, there is an interesting tension between the ormation-theoretic and the statistical aspects of the problem: if the MMSE without quantization is sufficiently small, then the dominant contribution to the quantized MMSE should come from the quantization error; in the opposite regime of high-rate quantization, the quantized MMSE should not differ much from its unquantized counterpart In this paper, we study both the statistical and the functionalcompression aspects of MMSE estimation with quantized observation In particular, we obtain sharp upper bounds on the MMSE regret in two scenarios: MMSE estimation of a scalar random variable, where the vector X = (X,, X n ) of conditionally iid observations is passed through a k-ary vector quantizer In this setting, under mild regularity conditions on the conditional distribution P X, we obtain nonasymptotic bounds on the MMSE regret These bounds exhibit two distinct behaviors depending on whether the number of observations n is larger than the square of the codebook size k In the asymptotic regime of n, k, we recover an existing result of Samarov and Has minskii [6, who were the first to address this problem MMSE estimation of a p-dimensional random vector when the n-dimensional vector of observations X (not necessarily independent) is passed through a k-ary VQ In this setting, we derive an upper bound on the MMSE regret under the assumption that the l norm of the regression function exhibits subgaussian concentration around its expected value E η(x) Unlike some of the existing literature on high-resolution quantization for functional compression (eg, [4), we do not require smoothness of η(x) Notation We will always denote by the l (Euclidean) norm A k-ary quantizer on the Euclidean space R d is a Borelmeasurable mapping q : R d [k, where [k is shorthand for the set,, k} Any such q is characterized by its cells or bins, ie, the Borel sets C j = q (j}) = v R d : q(v) = j}, j [k The set of all k-ary quantizers on R d will be denoted by Q d k A k-ary reconstruction function on Rd is a mapping f : [k R d Any such f is characterized by its

2 reconstruction points c j = f(j) R d, j [k The set of all k-ary reconstruction functions on R d will be denoted by R d k Any finite set of points C = c,, c k } R d defines a quantizer q C Q d k and a reconstruction function f C R d k by q C (v) = arg min v c j, j [k u R d and f C (j) = c j for all j [k The composite mapping q C f C q C : R d R d is called a k-point nearest-neighbor quantizer with codebook C A well-known result is that, for any random vector V with E V <, f R d k q Q d k E V f(q(v )) = E V q C (V ), C R d : C =k and the imum is actually a minimum We refer the reader to the survey article by Gray and Neuhoff [5 for more details We use the following asymptotic order notation: for two sequences a m } and b m }, we write a m b m if a m = O(b m ), and a m b m if a m b m and b m a m II PERFORMANCE CRITERIA AND SOME BASIC RESULTS Suppose two random vectors X in R n and in R p are jointly distributed according to a given probability law P X The MMSE in estimating as a function of X is mmse(p X ) f E f(x), where the imum over all Borel-measurable functions f : R n R p is achieved by the regression function η(x) = E[ X = x We consider the problem of MMSE estimation of under the circumstances where only q(x), a quantized version of X, is accessible Thus, for each k Z, we are interested in the MMSE functional mmse k (P X ) f R p q Q n k k E f(q(x)) This problem can be cast as one-shot fixed-rate lossy coding of with squared-error distortion when the encoder only has access to X (see, eg, [, [3) We recall some known results that explicitly involve the regression function η(x) The first one, due to Wolf and Ziv [, is a useful decomposition of mmse k (P X ): Proposition For every k Z, where mmse k (P X ) = mmse(p X ) reg k (P X ), () reg k (P X ) = E E[ X E[ q(x) q Q n k is the MMSE regret due to quantization Remark A special case of this result for jointly Gaussian X and was obtained by Curry [, Sec 4 The second result, which was proved by Ephraim and Gray [3 for a more general class of weighted quadratic distortion functions, shows that there is no loss of optimality if we restrict our attention to schemes of the following type: given X, we first compute the regression function η(x), quantize it using a k-ary quantizer, and then estimate by its conditional mean given the cell index of X Proposition reg k (P X ) = f R p k q Q p k E η(x) f(q(η(x))) = C R p : C =k E η(x) q C(η(X)) III THE CASE OF n CONDITIONALL INDEPENDENT OBSERVATIONS (a) (b) We now consider the situation when the coordinates X,, X n of X are independent and identically distributed (iid) conditionally on To keep things simple, we consider the case when is a scalar random variable (ie, p = ), and its marginal distribution P has a probability density function f supported on a compact interval = [ A, A In the asymptotic regime as k and n, this setting was investigated by Samarov and Has minskii [6, who showed the following under mild regularity conditions: If n/k, then the dominant contribution to mmse k (P X ) comes from the minimum expected distortion C ; C =k E q C( ) k ( ) 3 f (y) /3 dy incurred on by any k-point quantizer [7, [8 If n/k 0, then the dominant contribution to mmse k (P X ) comes from mmse(p X ) n f (y) I(y) dy, where I(y) is the Fisher ormation in the conditional distribution P X =y [9 (see below for definitions) The proofs in [6 rely on the deep result, commonly referred to as the Bernstein von Mises theorem, which says that the posterior distribution of the normalized error ni( )(η(x) ) is asymptotically standard normal (see, eg, [0, Sec 0) In this section, we establish a finite n, finite k version of the results of Samarov Has minskii using a recent nonasymptotic generalization of the Bernstein von Mises theorem due to Spokoiny [ Our result requires a number of regularity assumptions The first two are standard: (C) The conditional distributions P X =y, y, are dominated by a common σ-finite measure µ on the real line The corresponding log-density l(u, y) log dp X =y (u) dµ is twice differentiable in y for every u We will denote the first and second derivatives with respect to y by and (C) For each y, the Fisher ormation [ exists and is positive I(y) E [l(x, ) = y

3 The remaining assumptions (see [3, Sec 5) are slightly stronger than the classical ones For each r > 0, define the local neighborhoods N r (y) y : } I(y) y y r The regularity assumptions can be split into two groups: identifiability conditions and exponential moment conditions We start with the former: (I) There are positive constants r 0 > 0 and δ > 0, such that, for every r r 0, sup sup D(P X =y P X =y) I(y)(y y ) δ r y y N r(y) (I) There is a constant b > 0, such that, for every r > 0, sup sup y y N r(y) D(P X =y P X =y) I(y)(y y ) b The exponential moment assumptions pertain to the random variables ζ y (X, y ) l(x, y ) E[l(X, y ) = y, [note that ζ y (Z, y) l(x, y) Let ξ y (X, y ) w ζ y(x, w) w=y y, y (E) There exist constants g > 0 and v 0 > 0, such that [ ( ) λξ y (X, y) sup log E exp = y v 0λ y I(y) y y N r(y) v 0λ for all λ g (E3) For every r > 0 there exists g (r) > 0, such that [ ( ) λξ y (X, y) sup sup log E exp = y I(y) for all λ g (r) v 0λ For example, in the case of additive Gaussian noise, ie, when P X =y = N(y, σ ) for all y, it is easy to verify that all of these assumptions are met We are now ready to state and prove the main result of this section Since both and η(x) are supported on the bounded interval [ A, A, there is no loss of generality in restricting our attention only to nearest-neighbor quantizers q C with codebooks C = y,, y k } [ A, A Moreover, we can assume that C is ordered in such a way that A y 0 y < y < < y k y k A Given such an ordered C, we define C max 0 j k (y j y j ) Theorem Suppose that Assumptions (C) (C), (I) (I), and (E) (E3) hold Suppose also that log f is Lipschitz on [ A, A Then there exists a constant L > 0 that depends only on the constants in the above assumptions, such that, for any k-point nearest-neighbor quantizer q C with C, we have E η(x) q C (η(x)) E q C ( ) ( [ L C min, E )} mmse(p X ) C n I( ) (3) In the additive Gaussian noise case, the bound (3) becomes E η(x) q C (η(x)) E q C ( ) L C min σ, C n }, where σ is the noise variance Proof: For any collection C = y,, y k } of k reconstruction points, define the function e C : R by e C (y) min j [k (y y j) Then a simple calculation shows that e C (y) e C (y ) min C, C y y }, (4) for all y, y The expected reconstruction error of the for all λ g nearest-neighbor quantizer q (E) There exists a constant ω C can be written as > 0, such that, for every r r 0, E η(x) q C (η(x)) = E[e C (η(x)) [ ( ) λ[ξ y (X, y ) ξ y (X, y) Using the law of iterated expectation and the smoothness sup sup log E exp y y N r(y) ω r I(y) = y estimate (4), we obtain E[e C (η(x)) E[e C ( ) E E[e C (η(x)) e C ( ) CE min, E [ η(x) } (5) C We now invoke Spokoiny s nonasymptotic Bernstein von Mises theorem [ Let Z n η(x) ( G n (X, )), where G n (X, ) n l(x i, ), ni( ) i= and for any L > 0 consider the event ( ) } ni( /4 log n A L n( ) ) Zn L, n Then there exists a choice L = L 0 that depends only on the constants in the regularity conditions, such that P [ A n ( ) C n, (6)

4 where A n ( ) A L0 n ( ), and C > 0 is an absolute constant [, Sec 43 Therefore, E [ η(x) (a) E [ Z n [ E Gn (X, ) = E [ A n ( )} Z n [ E A c n ( )} Z n E [ G n (X, ) (b) L 0 C ni( ) n E[ Z n [ E Gn (X, ) (c) L 0 C ni( ) n E[ η(x) C n E[ Gn (X, ) [ E Gn (X, ), (7) where (a) follows from the triangle inequality, (b) from (6) and Cauchy Schwarz, and (c) again from the triangle inequality Now, since E[ l(x, ) = 0 and Var[ l(x, ) = I( ) [, we have and E [ G n (X, ) = Var[Gn (X, ) = ni( ) E [ G n (X, ) ni( ) Substituting these estimates into (7) gives E [ η(x) L 0 C ni( ) n I( ) C E [ η(x) n Plugging this bound into (5), using Jensen s inequality, and simplifying, we get (3) Corollary Under the same assumptions as in Theorem, reg k(p X ) E q C( ) C ; C k ( [ min k, k E )} mmse(p X ) n I( ) (8) Remark Using the ormation inequality [9 mmse(p X ) [ n E I( ) and Jensen s inequality, the bound (8) can be weakened to reg k(p X ) E q C( ) C ; C k } min k, mmse(px ) k Proof: Let C = y,, yk } be the reconstruction points of an optimal k-point quantizer for arranged in increasing order From the work of Panter and Dite [7, we know that these points should be chosen in such a way that y j f (y) /3 dy = c j f (y) /3 dy, k y j for all j = 0,,, k, where c j = / for j = 0 and otherwise Therefore, C /k Using this and the definition of the MMSE regret in (3), we get (8) Observe that the value of the right-hand side of (8) is determined by whether the number of observations n is larger or smaller than k This agrees with the asymptotic results of Samarov and Has minskii [6 IV A HIGH-RESOLUTION BOUND FOR FUNCTIONAL COMPRESSION We now consider a general setting of p-dimensional and n-dimensional X without any conditional independence assumptions Instead, we focus on the scaling of the MMSE regret reg k (P X ) with k, while the values of n and p stay fixed Proposition shows that the MMSE regret is precisely the minimum expected distortion attainable by k-ary quantization of X in the problem of functional compression of the regression function η(x) The results we present in this section require only some regularity assumptions on the l norm η(x) : Assumption The random variable η(x) has a finite fourth moment: E η(x) 4 < Assumption The random variable η(x) is subgaussian: there exists a positive constant v > 0, such that log E [e λ( η(x) E η(x) ) vλ, λ R One sufficient condition for Assumption to hold is for to have a finite fourth moment Indeed, in that case, using Jensen s inequality, we have E η(x) 4 = E E[ X 4 E 4 < As for Assumption, it will be met, for example, if the regression function η is Lipschitz, ie, if there exists some finite constant L > 0, such that η(x) η(x ) L x x, x, x R n and if X is a Gaussian random vector with a nonsingular covariance matrix [4 Theorem Suppose Assumption holds Then reg k (P X ) ( E η(x) E η(x) 4) /3 k /3p (9) If Assumption also holds, then reg k (P X ) r>e η(x) r k /p E η(x) 4 e (r E η(x) ) /4v } (0) Remark 3 Here, p can be replaced by a suitable intrinsic dimension of the support of η(x) (eg, the rate-distortion dimension [5) For example, if η(x) is linear, ie, E[ X =

5 AX for some deterministic matrix A R p n, then we can replace p by rank(a) Moreover, using a suboptimal value of r, we can weaken the bound in (0) to reg k (P X ) log k k /p, where the hidden constant depends on p, on the first and fourth moments of η(x), and on the subgaussian constant v Apart from the logarithmic factor, this scaling of the MMSE regret agrees with the high-resolution approximation for VQ [5 and with the Shannon lower bound [5, [6 Proof: From Assumption and from Jensen s inequality, it follows that the first and second moments of η(x) are also finite Fix a positive real constant r > 0, which will be optimized later A simple volumetric estimate shows that the l ball of radius r in R p can be covered by at most ( ) r p ɛ balls of radius ɛ So, for a given k Z, we can cover the radius-r ball by k balls of radius ɛ rk /p Let y i } k i= be the centers of these k balls We now construct a quantizer q (r) Q p k as follows: arg min η(x) y j, if η(x) r q (r) (x) = j [k k, otherwise For this quantizer, we have E η(x) E[η(X) q (r) (X) = T T, where T E [ η(x) r} η(x) E[η(X) q (r) (X) [ T E η(x) > r} η(x) E[η(X) q (r) (X) For the first term, we have T 4ɛ P [ η(x) r 4ɛ r k /p, where the first inequality is true since η(x) and E[η(X) q (r) (X) are inside the same ɛ-ball in the covering for all X such that q (r) (X) [k For the second term, T P [ η(x) > r E η(x) E[η(X) q (r) (X) 4 P[ η(x) > r E( η(x) E[η(X) q (r) (X) ) 4 P[ η(x) > r 8 E η(x) 4 8 E E[η(X) q (r) 4 P[ η(x) > r 6 E η(x) 4, where the first line is by Cauchy Schwarz inequality, and the remaining steps follow from monotonicity and convexity By Markov s inequality, P[ η(x) > r = P[ η(x) > r E η(x) r Therefore, reg k (P X ) E η(x) E[η(X) q (r) r k /p E η(x) E η(x) 4 r Optimizing over all r > 0, we get (9) Now suppose Assumption also holds Let r = E η(x) t for some t > 0 Since η(x) is subgaussian, the Chernoff bounding technique gives P [ η(x) > r = P [ η(x) E η(x) > t e t /v = e (r E η(x) ) /v (see, eg, [4, Chap 3) Thus, we have reg k (P X ) E η(x) E[η(X) q (r) r k /p E η(x) 4 e (r E η(x) ) /4v for all r > E η(x) Optimizing over r, we get (0) ACKNOWLEDGMENTS The authors would like to thank Tamás Linder and Vladimir Spokoiny for helpful discussions REFERENCES [ R E Curry, Estimation and Control with Quantized Measurements, Cambridge, MA: MIT Press, 970 [ J K Wolf and J Ziv, Transmission of noisy ormation to a noisy receiver with minimum distortion, IEEE Trans Inform Theory, vol 6, no 4, pp 406-4, July 970 [3 Ephraim and R M Gray, A unified approach for encoding clean and noisy sources by means of waveform and autoregressive model vector quantization, IEEE Trans Inform Theory, vol 34, no 4, pp , July 988 [4 V Misra, V K Goyal, and L R Varshney, Distributed scalar quantization for computing: High-resolution analysis and extensions, IEEE Trans Inform Theory, vol 57, no 8, pp , August 0 [5 R M Gray and D L Neuhoff, Quantization, IEEE Trans Inform Theory, vol 44, no 6, October 998 [6 A M Samarov and R Z Has minskii, Estimation by means of statistics that take a finite number of values, Problems of Inform Transmission, vol 3, no 4, pp 60-65, 977 [7 P F Panter and W Dite, Quantisation distortion in pulse-count modulation with nonuniform spacing of levels, Proc IRE, vol 39, pp 44-48, January 95 [8 P Elias, Bounds on performance of optimum quantizers, IEEE Trans Inform Theory, vol 6, no, pp 7-84, March 970 [9 I A Ibragimov and R Z Hasminskii, On ormation inequalities and superefficient estimates, Problems of Inform Transmission, vol 9, no 3, pp 6-7, 973 [0 A W van der Vaart, Asymptotic Statistics, Cambridge University Press, 998 [ V Spokoiny, Bernstein von Mises theorem for growing parameter dimension, arxiv preprint , 03 [ I A Ibragimov and R Z Has minskii, Statistical Estimation: Asymptotic Theory, Springer, New ork, 98 [3 V Spokoiny, Parametric estimation Finite sample theory, Ann Statist, vol 40, no 6, pp , 0 [4 M Raginsky and I Sason, Concentration of Measure Inequalities in Information Theory, Communications, and Coding, nd ed, Now Publishers, 04 [5 T Kawabata, and A Dembo, The rate-distortion dimension of sets and measures, IEEE Trans Inform Theory, vol 40, no 5, pp , September 994 [6 T Linder and R Zamir, On the asymptotic tightness of the Shannon lower bound, IEEE Trans Inform Theory, vol 40, no 6, pp 06-03, November 994

Afundamental component in the design and analysis of

Afundamental component in the design and analysis of IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 2, MARCH 1999 533 High-Resolution Source Coding for Non-Difference Distortion Measures: The Rate-Distortion Function Tamás Linder, Member, IEEE, Ram