Covariance 1. Abstract. lower bounds that converges monotonically to the CR bound with exponential speed of convergence. The

Similar documents
Objective Functions for Tomographic Reconstruction from. Randoms-Precorrected PET Scans. gram separately, this process doubles the storage space for

ROYAL INSTITUTE OF TECHNOLOGY KUNGL TEKNISKA HÖGSKOLAN. Department of Signals, Sensors & Systems

Linear Regression and Its Applications

G. Larry Bretthorst. Washington University, Department of Chemistry. and. C. Ray Smith

Recursive Algorithms for Computing the Cramer-Rao Bound

Asymptotic Convergence Properties of EM-Type. Algorithms 1. Alfred O. Hero and Jerey A. Fessler

Automatica, 33(9): , September 1997.

Sparsity Regularization for Image Reconstruction with Poisson Data

Maximum Likelihood Estimation

No. of dimensions 1. No. of centers

New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Spectral inequalities and equalities involving products of matrices

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

1. Introduction Let the least value of an objective function F (x), x2r n, be required, where F (x) can be calculated for any vector of variables x2r

TO APPEAR IN IEEE TRANS. ON SIGNAL PROCESSING, JULY

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

Testing for a Global Maximum of the Likelihood

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

Lecture Notes 1: Vector spaces

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Solutions Preliminary Examination in Numerical Analysis January, 2017

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

STABILITY OF INVARIANT SUBSPACES OF COMMUTING MATRICES We obtain some further results for pairs of commuting matrices. We show that a pair of commutin

Relative Irradiance. Wavelength (nm)

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

16 Chapter 3. Separation Properties, Principal Pivot Transforms, Classes... for all j 2 J is said to be a subcomplementary vector of variables for (3.

Accelerating the EM Algorithm for Mixture Density Estimation

These outputs can be written in a more convenient form: with y(i) = Hc m (i) n(i) y(i) = (y(i); ; y K (i)) T ; c m (i) = (c m (i); ; c m K(i)) T and n

Companding of Memoryless Sources. Peter W. Moo and David L. Neuho. Department of Electrical Engineering and Computer Science

Gaussian process for nonstationary time series prediction

Linear Algebra: Characteristic Value Problem

System Modeling for Gamma Ray Imaging Systems

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS

NP-hardness of the stable matrix in unit interval family problem in discrete time

The iterative convex minorant algorithm for nonparametric estimation

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

LEBESGUE INTEGRATION. Introduction

On the Schur Complement of Diagonally Dominant Matrices

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Lecture 7 Introduction to Statistical Decision Theory

Iterative Methods. Splitting Methods

2 W. LAWTON, S. L. LEE AND ZUOWEI SHEN is called the fundamental condition, and a sequence which satises the fundamental condition will be called a fu

REGLERTEKNIK AUTOMATIC CONTROL LINKÖPING

Program in Statistics & Operations Research. Princeton University. Princeton, NJ March 29, Abstract

THE NUMERICAL EVALUATION OF THE MAXIMUM-LIKELIHOOD ESTIMATE OF A SUBSET OF MIXTURE PROPORTIONS*

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA

QUASI-UNIFORMLY POSITIVE OPERATORS IN KREIN SPACE. Denitizable operators in Krein spaces have spectral properties similar to those

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

y Ray of Half-line or ray through in the direction of y

Centro de Processamento de Dados, Universidade Federal do Rio Grande do Sul,

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

Mathematical foundations - linear algebra

Noise Analysis of Regularized EM for SPECT Reconstruction 1

VII Selected Topics. 28 Matrix Operations

Optimum Signature Sequence Sets for. Asynchronous CDMA Systems. Abstract

SOLVING FUZZY LINEAR SYSTEMS BY USING THE SCHUR COMPLEMENT WHEN COEFFICIENT MATRIX IS AN M-MATRIX

A MULTIGRID ALGORITHM FOR. Richard E. Ewing and Jian Shen. Institute for Scientic Computation. Texas A&M University. College Station, Texas SUMMARY

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Novel Approach to Analysis of Nonlinear Recursions. 1 Department of Physics, Bar-Ilan University, Ramat-Gan, ISRAEL

Introduction to Linear Algebra. Tyrone L. Vincent

Embedding and Probabilistic. Correlation Attacks on. Clock-Controlled Shift Registers. Jovan Dj. Golic 1

Multivariate Regression

1 Solutions to selected problems

NORMS ON SPACE OF MATRICES

Maximum-Likelihood Deconvolution in the Spatial and Spatial-Energy Domain for Events With Any Number of Interactions

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Scan Time Optimization for Post-injection PET Scans

. (a) Express [ ] as a non-trivial linear combination of u = [ ], v = [ ] and w =[ ], if possible. Otherwise, give your comments. (b) Express +8x+9x a

SIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM. Neal Patwari and Alfred O.

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

OUTLINE 1. Introduction 1.1 Notation 1.2 Special matrices 2. Gaussian Elimination 2.1 Vector and matrix norms 2.2 Finite precision arithmetic 2.3 Fact

1 Introduction CONVEXIFYING THE SET OF MATRICES OF BOUNDED RANK. APPLICATIONS TO THE QUASICONVEXIFICATION AND CONVEXIFICATION OF THE RANK FUNCTION

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Analysis on Graphs. Alexander Grigoryan Lecture Notes. University of Bielefeld, WS 2011/12

Numerical Analysis: Solving Systems of Linear Equations

Robust Maximum- Likelihood Position Estimation in Scintillation Cameras

Average Reward Parameters

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

Preprocessing & dimensionality reduction

Multivariate Statistical Analysis

The Generalized CEM Algorithm. MIT Media Lab. Abstract. variable probability models to maximize conditional likelihood

A Study of Numerical Algorithms for Regularized Poisson ML Image Reconstruction

Projected Gradient Methods for NCP 57. Complementarity Problems via Normal Maps

IMU-Laser Scanner Localization: Observability Analysis

Exercises Chapter 4 Statistical Hypothesis Testing

1 Outline Part I: Linear Programming (LP) Interior-Point Approach 1. Simplex Approach Comparison Part II: Semidenite Programming (SDP) Concludin


= w 2. w 1. B j. A j. C + j1j2

The Uniformity Principle: A New Tool for. Probabilistic Robustness Analysis. B. R. Barmish and C. M. Lagoa. further discussion.

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix

Transcription:

A Recursive Algorithm for Computing CR-Type Bounds on Estimator Covariance 1 Alfred Hero (Corresponding Author) Jerey A. Fessler Dept. of EECS Division of Nuclear Medicine The University of Michigan The University of Michigan Ann Arbor, MI 48109-2122 Ann Arbor, MI 48109-0028 Tel: (313)-763-0564 Tel: (313)-763-1434 KEWORDS: Multi-dimensional parameter estimation, estimator covariance bounds, complete-incomplete data problem, image reconstruction. Abstract We derive an iterative algorithm that calculates submatrices of the Cramer-Rao (CR) matrix bound on the covariance of any unbiased estimator of a vector parameter. Our algorithm computes a sequence of lower bounds that converges monotonically to the CR bound with exponential speed of convergence. The recursive algorithm uses an invertable \splitting matrix," and we present a statistical approach to selecting this matrix based on a \complete data - incomplete data" formulation similar to that of the well known EM parameter estimation algorithm. As a concrete illustration we consider image reconstruction from projections for emission computed tomography. 1 This research was supported in part by the National Science Foundation under grant BCS-9024370, a DOE Alexander Hollaender Postdoctoral Fellowship, and DOE Grant DE-FG02-87ER65061. 1

1 Introduction The Cramer-Rao (CR) bound on estimator covariance is an important tool for predicting fundamental limits on best achievable parameter estimation performance. Foravector parameter 2 IR n, an observation, and a p.d.f. f (y; ), one seeks a lower bound on the minimum achievable variance of an unbiased estimator ^ 1 = ^ 1 () of a scalar parameter 1 of interest. More generally, if, without loss in generality, the p parameters 1 ;:::; p are of interest, p n, one may want to specify a p p matrix which lower bounds the error covariance matrix for unbiased estimators ^ 1 ;:::;^ p. The upper left hand p p submatrix of the n n inverse Fisher information matrix F,1 provides the CR lower bound for these parameter estimates. Equivalently, the rst p columns of F,1 provide this CR bound. The method of sequential partitioning [11] for computing the upper left pp submatrix of F,1 and Cholesky based Gaussian elimination techniques [3] for computing the p rst columns of F,1 are ecient direct methods for obtaining the CR bound but require O(n 3 ) oating point operations. Unfortunately, in many practical cases of interest, e.g. when there are a large number of nuisance parameters, high computation and memory requirements make direct implementation of the CR bound impractical. In this correspondence we give an iterative algorithm for computing columns of the CR bound which requires only O(pn 2 ) per iteration. This algorithm falls into the class of \splitting matrix iterations" [3] with the imposition of an additional requirement: the splitting matrix must be chosen to ensure that a valid lower bound results at each iteration of the algorithm. While a purely algebraic approach to this restricted class of iterations can easily be adopted [16], the CR bound setting allows us to exploit additional properties of Fisher information matrices arising from the statistical model. Specically, we formulate the parameter estimation problem in a complete data - incomplete data setting and apply a version of the \data processing theorem" [1] for Fisher information matrices. This setting is similar to that which underlies the classical formulation of the Maximum Likelihood Expectation Maximization (ML-EM) parameter estimation algorithm. The ML-EM algorithm generates a sequence of estimates f^ k g k for which successively increase the likelihood function and converge to the maximum likelihood estimator. In a similar manner, our algorithm generates a sequence of tighter and tighter lower bounds on estimator covariance which converge to the actual CR matrix bound. The algorithms given here converge monotonically with exponential rate where the speed of convergence increases as the spectral radius (I, F X,1 F ) decreases. Here I is the n n identity matrix and F X and F are the complete and incomplete data Fisher information matrices, respectively. Thus when the complete data is only moderately more informative than the incomplete data, F is close to F X so that (I, F,1 X F ) is close to 0 and the algorithm converges very quickly. To implement the algorithm, one must 1) precompute the rst p columns of F,1,1 X, and 2) provide a subroutine that can multiply F X F or F,1 X E [r 11 Q(; )] by a column vector (see (17)). By appropriately choosing the complete data space this precomputation can be quite simple, e.g. X can be chosen to make F X sparse or even diagonal. If the complete data space is chosen intelligently only a few iterations may be required to produce a bound which closely approximates the CR bound. In this case the proposed algorithm gives an order of magnitude computational savings as compared to conventional exact methods of computing the CR bound. The paper concludes with an implementation of the recursive algorithm for bounding the minimum achievable error of reconstruction for a small region of interest (ROI) in an image reconstruction problem arising in emission computed tomography. By using the complete data of the standard EM algorithm for PET reconstruction [15], F X is diagonal and the implementation of the CR bound algorithm is very simple. As in the PET reconstruction algorithm, the rate of convergence of the iterative CR bound algorithm depends on the image intensity and the tomographic system response matrix. 2

2 CR Bound and Iterative Algorithm 2.1 Background and General Assumptions Let i be an open subset of the real line IR. Dene =[ 1 ;:::; n ] T a real, non-random parameter vector residing in = 1 n. Let fp g 2 be a family of probability measures for a certain random variable taking values in a set. Assume that for each 2, P is absolutely continuous with respect to a dominating measure so that for each there exists a density function f(y; ) =dp (y)=d for. The family of densities ff (y; )g 2 is said to be a regular family [9] if is an open subset of IR p and: 1) f (y; ) isacontinuous function on for -almost all y; 2)lnf(; ) is mean-square dierentiable in ; and 3) r ln f(; ) is mean-square continuous in. These three conditions guarantee that the Fisher information matrix F () =E [r T ln f(; )r ln f(; )] exists and is nite, where r =[@=@ 1 ;:::;@=@ n ]isthe (row) gradient operator. Finally we recall convergence results for linear recursions of the form v i+1 = Av i ; i =1; 2;::: where v i is a vector and A is a matrix. Let (A) denote the spectral radius, i.e. the maximum magnitude eigenvalue, of A. If (A) < 1 then v i converges to zero and the asymptotic rate of convergence increases as the root convergence factor (A) decreases [14]. 2.2 The CR Lower Bound Let ^ = ^() beanunbiased estimator of 2, and assume that the densities ff (y; g 2 are a regular family. Then the covariance matrix of ^ satises the matrix CR lower bound [9]: cov (^) B() =F,1 (): (1) In (1) F () is the assumed non-singular n n Fisher information matrix associated with the measurements : F () def = E [r u ln f (; u)j u= ] T [r u ln f (; u)j u= ]: (2) Under the additional assumption that the mixed partials @2 i j f (; ), i; j =1:::;n, exist, are continuous in, and are absolutely integrable in, the Fisher information matrix is equivalent to the Hessian, or \curvature matrix," of the mean ln f (; ): F () =,E [r T ur u ln f (; u)j u= ]=,r T u r u E [ln f (; u)]j u= : (3) Assume that among the n unknown quantities =[ 1 ;:::; n ] T only a small number p n of parameters I =[ 1 ;:::; p ] T are directly of interest, the remaining n, p parameters being considered \nuisance parameters." Partition the Fisher information matrix F as: F11 F F = T 12 ; (4) F 12 F 22 where F 11 is the p p Fisher information matrix for the parameters I of interest, F 22 is the (n, p) (n, p) Fisher information matrix for the nuisance parameters, and F 12 is the (n, p) p information coupling 3

matrix. The CR bound on the covariance of any unbiased estimator ^ I =[^ 1 ;:::;^ p ] T of the parameters of interest is simply the p p submatrix in the upper left hand corner of F,1 : cov (^ I ) E T F,1 E (5) where E is the n p matrix consisting of the p rst columns of the n n identity matrix, i.e. E =[e 1 ;:::;e p ] and e j is the j-th unit column vector in IR n. Using a standard identity for the partitioned matrix inverse [3] this submatrix can be expressed in terms of the partition elements of F yielding the following equivalent form for the unbiased CR bound: cov (^ I ) F 11, F T 12 F,1 22 F 12,1 : (6) By using the method of sequential partitioning [11], the right hand side of (6) could be computed with O(n 3 ) oating point operations. Alternatively, the CR bound (5) is specied by the rst p columns F,1,1 E of F. These p columns are given by the columns of the n p matrix solution U to F U = E. The topmost p p block E T U of U is equal to the right hand side of the CR bound inequality (5). By using the Cholesky decomposition of F and Gaussian elimination [3] the solution U to F U = E could be computed with O(n 3 ) oating point operations. Even if the number of parameters of interest is small, for large n the feasibility of directly computing the CR bound (5) is limited by the high number O(n 3 ) of oating point operations. For example, in the case of image reconstruction for a moderate sized 256 256 pixelated image F is 256 2 256 2 so that direct computation of the CR bound on estimation errors in a small region of the image requires on the order of 256 6 or 10 19 oating point operations! 2.3 A Recursive CR Bound Algorithm The basic idea of the algorithm is to replace the dicult inversion of F with an easily inverted matrix F.To simplify notation, we drop the dependence on. Let F be a nn matrix. Assume that F is positive denite and that F F ; i.e. F, F is nonnegative denite. It follows that F is positive denite, so let F 1 2 be the positive denite matrix-square-root-factor of F. Then F, 1 2 F F, 1 2 is positive denite, F, 1 2 (F, F )F, 1 2 is non-negative denite, and therefore: F, 1 2 F F, 1 2 = [I, F, 1 2 (F, F )F, 1 2 ] > 0: Hence 0 I,F, 1 2 F F, 1 2 <Iso that all of the eigenvalues of I,F, 1 2 F F, 1 2 are non-negative and strictly less than one. Since I, F, 1 2 F F, 1 2 is similar to I, F,1 F, it follows that the eigenvalues of I, F,1 F lie in [0; 1) [8, Corollary 1.3.4]. Thus, applying the matrix form of the geometric series [8, Corollary 5.6.16]: B = [F ],1 =[F, (F, F )],1 = [I, F,1 (F, F )],1 F,1 = 1X k=0 [I, F,1 F ] k! F,1 : (7) This innite series expression for the unbiased n n CR bound B is the basis for the matrix recursion given in the following theorem. Theorem 1 Assume that F is positive denite and F F. When initialized with the n n matrix of zeros B 0 =0, the following recursion yields a sequence of matrix lower bounds B k = B k () on the n n covariance of unbiased estimators ^ of. This sequence asymptotically converges to the n n unbiased CR bound F,1 with root convergence factor (A). 4

Recursive Algorithm: For k =0; 1; 2;::: B k+1 = A B k + F,1 ; (8) where A = I, F,1 F has eigenvalues in [0; 1). Furthermore, the convergence is monotone in the sense that B k B k+1 B = F,1, for k =0; 1; 2;:::. Proof Since all eigenvalues of I, F,1 F consider: are in the range [0; 1) we obviously have (I, F,1 F ) < 1. Now B k+1, F,1 = (I, F,1 F )B k + F,1, F,1 = (I, F,1 F )(B k, F,1 ) (9) Since the eigenvalues of I, F,1 F are in [0; 1), this establishes that B k+1! F,1 convergence factor (I, F,1 F ). Similarly: B k+1, B k = (I, F,1 F )[B k, B k,1 ]; k =1; 2;:::; with initial condition B 1, B 0 = F,1. By induction we have B k+1, B k = F, 1 2 h F, 1 2 (I, F )F, 1 2i k F, 1 2 as k!1with root which is non-negative denite for all k 0. Hence the convergence is monotone. 2 By right multiplying each side of the equality (8) by the matrix E =[e 1 ;:::;e p ], where e j is the j-th unit vector in IR n we obtain a recursion for the rst p columns B k E =[b k 1 ;:::;bk p]. Furthermore, the rst p rows E T B k E of B k E correspond to the upper left hand corner p p submatrix of B k and, since E T [B k+1, B k ]E is non-negative denite, by Theorem 1 E T B k E converges monotonically to E T F,1 E. Thus we have the following Corollary to Theorem 1. Corollary 1 Assume that F is positive denite and F F and let E =[e 1 ;:::;e p ] be the n p matrix whose columns are the rst p unit vectors in IR n. When initialized with the n p matrix of zeros 0 =0, the top p p block E T k of k in the following recursive algorithm yields a sequence of lower bounds on the covariance of any unbiased estimator of I =[ 1 ;:::; p ] T which asymptotically converges to the p p CR bound E T F,1 E with root convergence factor (A): Recursive Algorithm: For k =0; 1; 2;::: (k+1) = A (k) + F,1 ; (10) where A = I, F,1 F has eigenvalues in [0; 1) and F,1 = F,1 E is the n p matrix consisting of the rst p columns of F,1.Furthermore, the convergence is monotone in the sense that E T (k) E T (k+1) E T F,1 E, for k =0; 1; 2;:::. The n n times n p matrix multiplication A k requires only O(pn 2 ) oating point operations. Hence, for p n the recursion (10) requires only O(n 2 ) oating point operations per iteration. 5

2.4 Discussion We make the following comments on the recursive algorithms of Theorem 1 and Corollary 1. 1. For the algorithm (10) for computing columns of F,1 to have signicant computational advantages relative to the direct approaches discussed in Section 2, the precomputation of the matrix inverse F,1 and multiplication by the matrix product A = I, F,1 F must be simple, and the iterations must converge reasonably quickly. Bychoosing an F that is is sparse or diagonal the computation of F,1 and A requires only O(n 2 ) oating point operations. If in addition F is chosen so that (I, F,1 F )is small, then the algorithm (10) will converge to within a small fraction of the corresponding column of F,1 with only a few iterations and thus will be an order of magnitude less costly than direct methods requiring O(n 3 ) operations. 2. One can use the recursion I, F B k+1 = A [I, F B k ], obtained similar to (9) of the proof of Theorem 1, to monitor the progress of the j-th column b k j, F B k towards zero. This recursion can be implemented alongside of the bound recursion (10). 3. For p = 1 the iteration of Corollary 1 is related to the \matrix splitting" method [3] for iteratively approximating the solution u to a linear equation Cu = c. In this method, a decomposition C = F, N is found for the non-singular matrix C such that F is non-singular and (F,1 N) < 1. Once this decomposition is found the algorithm below produces a sequence of vectors u k which converge to the solution u = C,1 c as k!1: u k+1 = F,1 Nu k + F,1 c: (11) Identifying C as the incomplete data Fisher information F, N as the dierence F, F, u as the j-th column of F,1, and c as the j-th unit vector e j in IRn, the splitting algorithm (11) is equivalent to the column recursion of Corollary 1. The novelty of the recursion of Corollary 1 is that based on statistical considerations presented in the following section we have identied a particular class of matrix decompositions for F that guarantee monotone convergence of the j-th component ofb k j to the scalar CR bound on var (^ j ). Moreover, for general p 1 the recursion of Corollary 1 implies that when p parallel versions of (11) are implemented with c = e j and u k = u k j, j =1;:::;p, respectively, the rst p rows of the concatenated sequence [u k 1 ;:::;uk p] converge monotonically to the p p CR bound on cov (^ I ), ^ I =[^ 1 ;:::;^ p ] T. Monotone convergence is important in the statistical estimation context since it ensures that no matter when the iterative algorithm is stopped a valid lower bound is obtained. 4. The basis for the matrix recursion of Theorem 1 is the geometric series (7). A geometric series approach was also employed in [12, Section 5] to develop a method to speed up the asymptotic convergence of the EM parameter estimation algorithm. This method is a special case of Aitken's acceleration which requires computation of the inverse of the observed Fisher information ^F () =,r 2 ln f (; ) evaluated at successive EM iterates, = k, k = m; m +1;m+2;::: where m is a large positive integer. If ^F ( k ) is positive denite then Theorem 1 of this paper can be applied to iteratively compute this inverse. Unfortunately, ^F () is not guaranteed to be positive denite except within a small neighborhood f : k, ^k g of the MLE, so that in practice such an approach may fail to produce a converging algorithm. 3 Statistical Choice for Splitting Matrix The matrix F must satisfy F F and must also be easily inverted. For an arbitrary matrix F,verifying that F F could be quite dicult. In this section we present a statistical approach tochoosing the 6

matrix F ; the matrix F is chosen to be the Fisher information matrix of the complete-data that is intrinsic to an EM algorithm. This approach guarantees that F F through a Fisher information version of the data-processing inequality. 3.1 Incomplete Data Formulation Many estimation problems can be conveniently formulated as an incomplete data - complete data problem. The setup is the following. Imagine that there exists a dierent set of measurements X taking values in a set X whose probability density f X (x; ) is also a function of. Further assume that this hypothetical set of measurements X is larger and more informative as compared to in the sense that the conditional distribution of given X is functionally independent of. X and X are called the complete data and complete data space while and are called the incomplete data and incomplete data space, respectively. This denition of incomplete - complete data is equivalent to dening as the output of a -independent possibly noisy channel having input X. Note that our denition contains as a special case the standard denition [2] whereby X and must be related via a deterministic functional transformation = h(x), where h : X!is many-to-one. Assume that a complete data set X has been specied. For regular probability densities f X (x; ), f (y; ), f Xj (xjy; )we dene the associated Fisher information matrices F X () =,E hr r T ln f X(X; ), F () = h i,e hr r T ln f (; ) i, F Xj () =,E r r T ln f Xj (Xj; ), respectively. First we give a decomposition for F () in terms of F X () and F Xj ().. i Lemma 1 Let X and be random variables which have a joint probability density f X; (x; y; ) relative to some product measure X. Assume that X is more informative than in the sense that the conditional distribution of given X is functionally independent of. Assume also that ff X (x; )g 2 isaregular family of densities with mixed partials @2 i j f X (x; ) which are continuous in and absolutely integrable in x. Then ff (x; )g 2 isaregular family of densities with continuous and absolutely integrable mixed partials, the above dened Fisher information matrices F X (), F (), and F Xj () exist are nite and F () = F X (), F Xj () (12) Proof of Lemma 1 Since X; has the density f X; (x; y; ) with respect to the measure X there exist versions f jx (yjx; ) and f Xj (xjy; ) of the conditional densities. R Furthermore, by assumption f jx (yjx; ) = f jx (yjx) does not depend on. Since f (y; ) = X f jx(yjx)f X (x; )d X it is straightforward to show that the family ff (y; ) 2 inherits the regularity properties of the family ff X (x; ) 2. Now for any y such that f (y; ) > 0wehave from Bayes' rule f Xj (xjy; ) = f jx(yjx)f X (x; ) : (13) f (y; ) Note that f X; (x; y; ) > 0 implies that f (y; ) > 0, f X (x; ) > 0, f jx (yjx) > 0 and f Xj (xjy; ) > 0. Hence, we can use (13) to express: log f Xj (xjy; ) = log f X (x; ), log f (y; ) + log f jx (yjx); (14) whenever f X; (x; y; ) > 0. From this relation it is seen that f Xj (xjy; ) inherits the regularity properties of the X and densities. Therefore, since the set f(x; y) :f X; (x; y; ) > 0g has probability one we obtain 7

from (14): E [,r 2 log f Xj (Xj; )] = E [,r 2 log f X (; )], E [,r 2 log f (X; )] This establishes the lemma. 2 Since the Fisher information matrix F Xj is non-negative denite, an important consequence of the decomposition of Lemma 1 is the matrix inequality: F X () F (): (15) The inequality (15) can be interpreted as a version of the \data processing theorem" of information theory [1] which asserts that any irreversible processing of data X entails a loss in information in the resulting data. 3.2 Remarks 1. The inequality (15) is precisely the condition required of the splitting matrix F by the recursive CR bound algorithm (10). Furthermore, in many applications of the EM algorithm, the complete-data space is chosen such that the dependence of X on is \uncoupled," so that F X is diagonal or very sparse. Since many of the problems in which F is dicult to invert are problems for which the EM algorithm has been applied, the Fisher information of the corresponding complete-data space is thus a natural choice for F. 2. If the incomplete data Fisher matrix F is available the matrix A in the recursion (8) can be precomputed as: A = I, F,1 X F : (16) On the other hand, if the Fisher matrix F is not available, the matrix A in the recursion (8) can be computed directly from Q(u;v)=Eflog f(x; )j; g arising from the E step of the EM parameter estimation algorithm [2]. Note that, under the assumption that exchange of order of dierentiation and expectation is justied [10, Sec. 2.6]: F Xj () = E h,r 2 u E = E,r 20 H(; ) ; ln fxj (Xj; u)j j u= i where H(u; v) def = E v flog f Xj (Xj; u)j = yg. We can make use of an identity [2, Lemma 2]: Furthermore, This gives the identity: r 20 H(; ) =,r 11 H(; ) r 11 H(; ) =r 11 Q(; ): F Xj () = E [r 11 Q(; )]: Giving an alternative expression to (16) for precomputing A: A = F,1 X E [r 11 Q(; )]: (17) 3. The form (I, F,1 X F ) for the rate of convergence of the algorithms (8) and (10) implies that for rapid convergence the complete data space X should be chosen such that X is not signicantly more informative than relative to the parameter. 8

4. The matrix recursion of Theorem 1 can be related to the following Frobenius normalization method for inverting a sparse matrix C: B k+1 = B k [I, C]+I; (18) where =1=kCk 2 is the inverse of the Frobenius norm of C. When initialized with B 0 = I the above algorithm converges to C,1 as k!1.for the case that C is the Fisher matrix F the matrix recursion (18) can be interpreted as a special case of the algorithm of Theorem 1 for a particular choice of complete data X. Specically, let the complete data be dened as the concatenation X =[ T ; S T ] T of the incomplete data and a hypothetical data set S =[S 1 ;:::;S m ] T dened by the following: S = c()+w; (19) where W =[W 1 ;:::;W m ] T are i.i.d. standard Gaussian random varibles independent of, and c =[c 1 ;:::;c m ] T isavector function of. It is readily veried that the Fisher matrix F S for based on observing S P m is of the form: F S = j=1 rt c j ()rc j (), where r =[ @ @ @ 1 ;:::; @ n ]. Now since S and are independent F X = F S + F so that if we could choose c() such that F S = kf ki, F the recursion of Theorem 1 would be equivalent to (18) with F = C, F,1 X = I, A = I, F. In particular, for the special case that F is functionally independent of, we can take m equal to n and take the hypothetical data S =[S 1 ;:::;S n ] T as the n-dimensional linear Gaussian model: S j = c j T + W j ; j =1;:::;n; where c j =[kf k 2, p j ] j ; j =1;:::;n; and f 1 ;:::; P n g are the eigenvectors and f 1 ;:::; n g are the eigenvalues of F. With this denition n of S, F = j=1 j j T j is simply the eigendecomposition of the matrix kf ki, F so that F X = kf ki = I as required. 4 Application to ECT Image Reconstruction We consider the case of positron emission tomography (PET) where a set of m detectors is placed about an object to measure positions of emitted gamma-rays. The mathematical formulation of PET is as follows. Over a specied time intervalanumber N b of gamma-rays are randomly emitted from pixel b, b =1;:::;n, and a number d of these gamma-rays are detected at detector d, d =1;:::;m. The average number of emissions at pixels 1;:::;n is an unknown vector =[ 1 ;:::; n ] T, called the object intensity. It is assumed that the N b 's are independent Poisson random P variables with rates b, b =1;:::;n, and the d 's n are independent Poisson distributed with rates d = b=1 P djb b, where P djb is the transition probability corresponding to emitter location b and detector location d. For simplicity we assume that d > 0 8d. The objective is to estimate a subset [ 1 ;:::; p ] T, p n, of the object intensities within a p-pixel region of interest (ROI). In this section we develop the recursive CR bound for this estimation problem. The log-likelihood function for based on =[ 1 ;:::; m ] T is simply ln f (; ) = ln =, m mx d=1 d=1 [ d ] d e,d (20) d! d + mx d=1 d ln d + constant: (21) 9

From this the Hessian matrix with respect to is simply calculated and, using the fact that E [ d ]= d, the n n Fisher information matrix F is obtained: F = = mx d=1 1 d P T dj P dj (22) mx d=1 P dji P djj d!! i;j=1;:::;n where P dj =[P dj1 ;:::;P djn ] is the d-th row ofthemnsystem matrix ((P jji )). If m n, and the linear span of fp dj g n is d=1 IRn, then F is invertible and the CR bound exists. However, even for an imaging system of moderate resolution, e.g. a 256 256 pixel plane, direct computation of the p p ROI submatrix E T F,1 E, E =[e 1 ;:::;e p], of the (256) 2 (256) 2 Fisher matrix F is impractical. The standard choice of complete data for estimation of via the EM algorithm is the set N db, d = 1;:::;m, b =1;:::;n, where N db denotes the number of emissions in pixel b which are detected at detector d. fn db g are independent Poisson random variables with intensity E [N db ]=P djb b, d =1;:::;m, b =1;:::;n. By Lemma 1 we know that, with F X the Fisher information matrix associated with the complete data, F X, F is non-negative denite. Thus F X can be used in Theorem 1 to obtain a montonically convergent CR bound recursion. The log-likelihood function associated with the complete data set X =[N db ] m;n is of similar form to d=1;b=1 (21): ln f X (X; ) =, mx nx d=1 b=1 P djb b + mx nx d=1 b=1 The Hessian r 2 ln f X(X; ) is easily calculated, and, assuming b > 0 forallb, the Fisher information matrix F X is obtained as: F X = diag b P m d=1 P djb b N db ln b + constant: ; ; (23) where diag b (a b ) denotes a diagonal n n matrix with the a b 's indexed successively along the diagonal. Using the results (23) and (22) above we obtain: A = I, F,1 X F (24) = I, dx l=1!! P i P lji P ljj m d=1 P : dji l i;j=1;:::;n The recursive CR bound algorithm of Corollary 1 can now be implemented using the matrices (23) and (24). The rate of convergence of the recursive CR bound algorithm is determined by the maximum eigenvalue (A) ofa specied by (24). For a xed system matrix ((P jji )) the magnitude of this eigenvalue will depend on the image intensity. Assume for simplicity that with probability one any emitted gamma-ray is detected at some detector, i.e. P m d=1 P djb = 1 for all b. Since trace(a) = P n i=1 i, where f i g n i=1 of A, using (24) it is seen that the maximum eigenvalue (A) must satisfy: 1 n trace(a) =1, 1 n P n Now applying Jensen's inequality to i=1 P lji 2 i it can be shown that 1 dx l=1 are the eigenvalues P n i=1 P 2 lji i P n j=1 P ljj j (A) < 1: (25) n trace(a) 1, 1 n ; (26) 10

where equality occurs if P ljj independent of j. On the other hand, it can easily be seen that equality is approached in (26) as the intensity concentrates an increasing proportion of its mass on a single pixel k o, e.g.: i = 8 < : P n b=1 b ; (1, ) n,1 n i = k o ; P n b=1 b i 6= k o 1 n and approaches zero. Thus for this case we have from (25): 1, 1=n < (A) < 1. Since n is typically very large, this implies that the asymptotic convergence rate of the recursive algorithm will suer for image intensities which approach that of an ideal point source, at least for this particular splitting matrix. 5 Conclusion and Future Work Wehave given a recursive algorithm which can be used to compute submatrices of the CR lower bound F,1 on unbiased multi-dimensional parameter estimation error covariance. The algorithm sucessively approximates the inverse Fisher information matrix F,1 via a monotonically convergent splitting matrix iteration. We have given a statistical methodology for selecting an appropriate splitting matrix F which involves application of a data processing theorem to a complete-data incomplete-data formulation of the estimation problem. We are currently investigating a purely algebraic methodology in which a splitting matrix F is selected from sets of diagonal, tridiagonal, or circulant matrices to optimize the norm dierence kf, F k subject to F F. We are also developing analogous recursive algorithms to sucessively approximate generalized matrix CR bounds, such as those developed in [7], for biased estimation. 11

References [1] R. Blahut, Principles and Practice of Information Theory, Addison-Wesley, 1987. [2] A. P. Dempster, N. M. Laird, and D. B. Rubin, \Maximum likelihood from incomplete data via the EM algorithm," J. Royal Statistical Society, Ser. B, vol. 39, pp. 1{38, 1977. [3] G. H. Golub and C. F. Van Loan, Matrix Computations (2nd Edition), The Johns Hopkins University Press, Baltimore, 1989. [4] J. Gorman and A. O. Hero, \Lower bounds for parametric estimation with constraints," IEEE Trans. on Inform. Theory, Nov. 1990. [5] A. O. Hero and L. Shao, \Information analysis of single photon computed tomography with count losses," IEEE Trans. on Medical Imaging, vol. 9, no. 2, pp. 117{127, June 1990. [6] A. O. Hero and J. A. Fessler, \On the convergence of the EM algorithm," Technical Report in prep., Comm. and Sig. Proc. Lab. (CSPL), Dept. EECS, University of Michigan, Ann Arbor, April 1992. [7] A. O. Hero, \A Cramer-Rao type lower bound for essentially unbiased parameter estimation," MIT Lincoln Laboratory, Lexington, Mass., Technical Rep. 890, (3 January 1992). DTIC AD-A246666. [8] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge, 1985. [9] I. A. Ibragimov and R. Z. Has'minskii, Statistical estimation: Asymptotic theory, Springer-Verlag, New ork, 1981. [10] S. Kullback, Information Theory and Statistics, Dover, 1978. [11] A. R. Kuruc, \Lower bounds on multiple-source direction nding in the presence of direction-dependent antennaarray-calibration errors," Technical Report 799, M.I.T. Lincoln Laboratory, Oct., 1989. [12] T. A. Louis, \Finding the observed information matrix when using the EM algorithm," J. Royal Statistical Society, Ser. B, vol. 44, no. 2, pp. 226{233, 1982. [13] M. I. Miller, D. L. Snyder, and T. R. Miller, \Maximum-likelihood reconstruction for single photon emission computed tomography," IEEE Trans. Nuclear Science, vol. 32, pp. 769{778, Feb. 1985. [14] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New ork, 1970. [15] L. A. Shepp and. Vardi, \Maximum likelihood reconstruction for emission tomography," IEEE Trans. on Medical Imaging, vol. MI-1, No. 2, pp. 113{122, Oct. 1982. [16] M. Usman and A. O. Hero, \Algebraic versus statistical optimization of convergence rates for recursive CR bound algorithms," Technical Report 298, Communications and Signal Processing Laboratory, Dept. of EECS, The University of Michigan, Ann Arbor, 48109-2122, Jan. 1993. 12