ALGORITHMS FOR TRACKING ON THE MANIFOLD OF SYMMETRIC POSITIVE DEFINITE MATRICES

Size: px

Start display at page:

Download "ALGORITHMS FOR TRACKING ON THE MANIFOLD OF SYMMETRIC POSITIVE DEFINITE MATRICES"

Shavonne Parks
5 years ago
Views:

1 ALGORITHMS FOR TRACKING ON THE MANIFOLD OF SYMMETRIC POSITIVE DEFINITE MATRICES By GUANG CHENG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF THE PHILOSOPHY UNIVERSITY OF FLORIDA 2012

2 c 2012 Guang Cheng 2

3 To my wife Yandi, my daughter Lerong, and my parents 3

4 ACKNOWLEDGMENTS I would like to gratefully thank Dr. Baba Vemuri my major advisor, for his insightful guidance, unflinching patience and encouragement throughout my PhD study. This dissertation would not have been written without his guidance and support. I would also like to thank my committee members, Dr. Jeffrey Ho, Dr. Anand Rangarajan, Dr. Arunava Banerjee and Dr. Brett Presnell, not only for agreeing to be my committee, but also for being always supportive during the entire academic program and the broad exposure I gained through their course offerings. I thank the generous research support provided by the NIH grants NS and EB to my advisor, Dr. Vemuri, that made it possible for me to have an uninterrupted RAship during the course of my PhD I also have received travel grants from the CISE department at the University of Florida. I thank Dr. Dena Howland, Dr. John Forder, Dr. Min-Sig Hwang and Dr. Sarah E. Mondello for providing the data and sharing the knowledge. I thank Dr. Bing Jian, Dr. Angelos Barmpoutis and Dr. Santhosh Kodipaka for the productive discussions and their help on the project. I thank all my lab-mates Yuchen Xie, Ting Chen, Meizhu Liu, Wenxing Ye, Dohyung Seo, Sile Hu, Yuanxiang Wang, Yan Deng, Hesamodin Salehian, Qi Deng and Theodore Ha, for all the helps they gave me. Special thanks are extended to Hesamodin Salehian for his great work in our collaboration, especially the experiments in the chapter of recursive Karcher expectation estimator. I would like to thank my wife Yandi, for her patience support and love. I also thank my parents, Kexin and Qixin, for the faith in me and allowing me to be what I want to be. 4

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS ABSTRACT CHAPTER 1 INTRODUCTION Motivation Main Contributions Recursive Karcher Expectation Estimator Intrinsic Recursive Filter Intrinsic Unscented Kalman Filter Outline RIEMANNIAN GEOMETRY ON P n GL-invariant metric vs. Euclidean metric on P n Log-Euclidean vs GL-invariance Algorithms on the Field of SPD Matrices RECURSIVE KARCHER EXPECTATION ESTIMATION Background and Previous Work Methods The Recursive Karcher Expectation Estimator Recursive form of the symmetrized KL-divergence mean Recursive mean for the Log-Euclidean Metric Experiments Performance of the Recursive Estimators Application to DTI Segmentation INTRINSIC RECURSIVE FILTER ON P n Background and Previous Work IRF: A New Dynamic Tracking Model on P n Generalization of the Normal Distribution to P n The mean and the variance of the generalized normal distribution The Probabilistic Dynamic Model on P n

6 4.3 IRF-based Tracking Algorithm on P n The Bayesian Tracking Framework The Tracking Algorithm Experiments The Synthetic Data Experiment The Real Data Experiment INTRINSIC UNSCENTED KALMAN FILTER Background and Previous Work Intrinsic Unscented Kalman Filter for Diffusion Tensors The State Transition and Observation Models The Intrinsic Unscented Kalman Filter Experiments ATLAS CONSTRUCTION FOR HARDI DATASET REPRESENTED BY GAUSSIAN MIXTURE FIELDS Background and Previous Work Methods Image Atlas Construction Framework L 2 Distance and Re-orientation for GMs Mean GMF Computation Experiments Synthetic Data Experiments Real Data Experiments DISCUSSION AND CONCLUSIONS REFERENCES BIOGRAPHICAL SKETCH

7 Table LIST OF TABLES page 3-1 Time (in seconds) for mean computation in the DTI segmentation on synthetic dataset Timing in seconds for segmentation of grey matter in a rat spinal cord Tracking result for the real data experiment

8 Figure LIST OF FIGURES page 3-1 Accuracy and speed comparisons of the recursive versus non-recursive mean computation algorithms for data on P Results for the DTI segmentation experiments on the synthetic dataset Segmentation results of grey matter in a rat spinal cord for 6 different methods Segmentation results of the molecular layer in a rat hippocampus for 3 different methods Mean estimation error from 20 trials for the synthetic data experiment Head tracking result for video sequences with moving camera Fiber tracking results on real datasets from rat spinal cords. c [2012] IEEE Biomarkers captured by computing density map for each fiber bundle. c [2012] IEEE Image registration results on synthetic dataset. c [2011] IEEE Registration results for real dataset from rat spinal cord. c [2011] IEEE

9 LIST OF ABBREVIATIONS DTI DWMRI GMF HARDI RKEE IRF IUKF MRI SPD diffusion tensor imaging diffusion weighted magnetic resonance imaging Gaussian mixture field high angular resolution diffusion imaging recursive Karcher expectation estimator intrinsic recursive filter intrinsic unscented Kalman filter magnetic resonance imaging symmetric positive definite 9

10 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of the Philosophy ALGORITHMS FOR TRACKING ON THE MANIFOLD OF SYMMETRIC POSITIVE DEFINITE MATRICES Chair: Baba C. Vemuri Major: Computer Engineering By Guang Cheng May 2012 The problem of tracking on the manifold of n n symmetric positive definite (SPD) matrices is an important problem and has many applications in several areas such as computer vision and medical imaging. The aim of this dissertation is to develop novel tracking algorithms on P n for several different applications. One of the basic tracking problems on P n is to recursively estimate the Karcher expectation an generalization of the expectation to the Riemannian manifold, which can be viewed as tracking a static system. In this dissertation, we proposed a novel recursive Karcher expectation estimator (RKEE), and we further proved its unbiasedness and L2-convergence to the Karcher expectation under symmetric distribution on P n. Synthetic experiments showed RKEE the similar accuracy as the Karcher mean but more efficient for sequential data. We then developed a fast DTI (diffusion tensor imaging) segmentation algorithm based RKEE. The experiments on the real data of rat spinal cord and rat brain with comparison to Karcher mean and other type of centres based algorithms demonstrated the accuracy and efficiency of RKEE. To further tackle the dynamic system tracking on P n, we studied and discovered several properties of the generalized Gaussian distribution on P n, based on which a novel probabilistic dynamic model is proposed in conjunction with an intrinsic recursive filter for tracking a time sequence of SPD matrix measurements in a Bayesian framework. This newly developed filtering method can then be used for the covariance 10

11 descriptor updating problem in covariance tracking, leading to new efficient video tracking algorithms. To show the the accuracy and efficiency of our covariance tracker in comparison to the state-of-the-art, we present synthetic experiments on P n, and real data experiments for tracking in video sequences. To handle the non-p n inputs and the non-linear observation model, a novel intrinsic unscented Kalman filter tracking points on P n is presented. With the combination of the stream line tracking strategy, an efficient fiber tracking method is proposed to track white matter fibers from diffusion weighted (DW) MR images of mammalian brains specifically, human and rats. Different from the first method, the input of filter could be the diffusion weighted MR signal, which makes it possible to track fibers directly without the pre-process step commonly required by existing methods. Real data experiments on data sets of human brain and rat spinal cords are presented and depicted the accuracy and efficiency of the method. For group-wise analysis of the white matter fiber bundles from our tracking algorithm, a novel group-wise registration and atlas construction algorithm for the DW MR datasets represented by Gaussian mixture fields is proposed and applied to the spinal cord dataset. The group-wise analysis result of the spinal cord fiber bundle in this dissertation showed the significant difference between injured and healthy rats. 11

12 CHAPTER 1 INTRODUCTION 1.1 Motivation Tracking in general is a task to recursively estimate the current system state based on a sequential dataset. It is very important task both computer vision and medical imaging. In computer vision, tracking is crucial for video surveillance, augmented reality, human-computer interaction, etc. It is also a necessary preprocessing step for high level computer vision tasks such as visual scene analysis. In medical imaging, tracking is very useful in not only analysing time sequences such as cardiac cycle in medical imaging but also in neural fiber tractography from diffusion weight (DW) MRI dataset. Tracking is traditionally a time series analysis problem, it is closely related to prediction. Prediction has a wide clinical applications, such as disease prediction. Both tracking and prediction take time sequences as data input, and are usually based on dynamic models. Many tracking methods, such as the well known Kalman filter, are based on the predict-update framework, where prediction is a crucial part in the tracking algorithm. The main difference between tracking and prediction is that, in tracking we estimate the current state of a certain process based on the current and previous observations, while in prediction the estimation is for future state where no direct observation is available. Most classical tracking techniques are in Euclidean space. However, in certain applications, the problems might not naturally be in the Euclidean space. Instead, they usually lie on a Riemannian manifold, but not in a vector space. Also, the input data dimension in modern problems is usually huge. Many linear and non-linear dimensional reduction techniques are hence used to find the meaningful lower dimensional representations of the data, and these representations might not be in the Euclidean space. Therefore, tracking and prediction algorithms on the Riemannian manifold could be applied in many practical problems. 12

13 This dissertation focuses the tracking problem in the space of n nsymmetric positive definite (SPD) matrices represented as P n. Many feature descriptors such as covariance matrices, Cauchy deformation tensors, diffusion tensors, metric tensors, etc, can be represented in P n. Thus algorithms on P n could be widely applied to practical problems in different areas such as computer vision, medical imaging, etc. P n is known to be a Riemannian space with non-positive curvature. Many researches have reported on different problems on P n such as the computation of intrinsic/extrinsic mean, linear/non-linear dimensional reduction, statistics, etc with applications in many different areas. This dissertation is primarily motivated by practical tracking problems such as video tracking and other problems where tracking algorithms can be applied including segmentation of DTI (diffusion tensor imaging) dataset and fiber tractography. 1.2 Main Contributions Recursive Karcher Expectation Estimator Finding the mean of a population of SPD matrices/tensors is an often encountered problem in medical image analysis and computer vision, specifically in diffusion MRI processing, tensor-based morphometry, texture analysis using the structure tensor etc. The mean tensor can be used to represent a population of structure tensors in texture analysis, diffusion tensors in diffusion tensor image (DTI) segmentation or for interpolation of diffusion tensors or in clustering applications. A mean is usually used as a good estimator of the expectation. If the data samples are given sequentially, the mean finding problem can also be viewed as a tracking problem for a static process. It is well known that computation of the mean can be posed as a minimization problem in which one minimizes the sum of squared distances between the unknown mean and the members of the set whose mean is being sought. Mathematically speaking, we want to find, µ n = min µ i d 2 (x i, µ), where, d is the chosen distance, x i are the data samples whose mean is being sought and µ is the mean. Depending on the definition of distance d, one gets different kinds of means. For example, if we choose 13

14 the Euclidean distance for d, we get the arithmetic mean, where as if we chose the L 1 -norm instead of the L 2 norm in the above formula, we get the median. If we choose the geodesic distance in the domain of x i, we get the Karcher mean [38]. Currently, there are no existing closed form solution for Karcher mean computation on P n for more than two sample points [54]. A gradient based optimization algorithm [54] is used in practice which is known to be inefficient. In this dissertation, we propose a novel recursive Karcher expectation estimator (RKEE), which is an algorithm to recursively estimate the Karcher expectation. The proof of the unbiasedness and L2-convergence of RKEE under any symmetric distributions on P n is also presented. Synthetic data experiments showed the similar accuracy of RKEE and Karcher mean as an estimator of Karcher expectation, but RKEE is more efficient especially for sequential dataset. Further we applied RKEE to DTI segmentation problem and compared with Karcher mean and other centres on real dataset of rat spinal cords and rat brains Intrinsic Recursive Filter In recent years, the covariance region descriptor which is the covariance matrix of the feature vectors at each pixel in the region, are shown to be robust and efficient in video tracking and detections [61]. Several works [43, 75, 76, 83, 83, 84, 84] were reported to address the problem of updating covariance descriptor in video tracking. Here, a novel probabilistic dynamic model on P n based on geometry and probability theory is presented. The noisy state and observations are described by matrix-variate random variables whose distribution is a generalized normal distribution to based on the GL-invariant measure. Then an novel intrinsic recursive filter(irf) on P n is developed based on the dynamic model, and applied to covariance tracking, which forms an real time video tracking algorithm. Synthetic and real data experiments are presented to support the effectiveness and the efficiency of the proposed algorithm. 14

15 1.2.3 Intrinsic Unscented Kalman Filter Diffusion-Weighted MR Imaging (DW-MRI) is a unique non-invasive technique that can locally infer the imaged tissue structure in vivo by MR signal that is sensitive to the water molecule diffusion. The DW-MRI dataset is then a 3D image that contains tissue (such as brain white matter) directional information at each of its voxel. This directional information at each voxel for single fiber case can be modelled by a 2nd order positive definite tensor (SPD matrix) which is the classical diffusion tensor image (DTI). High order models such as multi-tensor, high order tensor, etc have be reported in order to handle more complex cases, e.g. fiber crossing. However, the 3 3 SPD matrix which can be viewed as a point in P 3 is still a very useful descriptor in representing the local fiber information. To further visualize and analyse the tissue structure, the fiber tracking technique is needed to reconstruct the imaged tissue which is very important in both research and clinical applications in neuro science. The fiber tractography is formulated as a tracking problem in P n. Here we present a novel intrinsic unscented Kalman filter (IUKF) on P n, which to the best of our knowledge is the first extension of the unscented Kalman filter to P n. We apply this filter to both estimate and track the tensors in multi-tensor model using the intrinsic formulation to achieve as demonstrated through experiments. We perform real demonstrate the accuracy and efficiency of our method. Also, a group-wise registration and atlas construction method developed to register DW-MR datasets represented by Gaussian Mixture Fields is proposed for group fiber analysis. 1.3 Outline The remaining chapters are organized as follows: The basic properties of P n and common Riemannian manifolds can be found in Chapter 2. The RKEE and application to DTI segmentation is introduced in Chapter 3, followed by the IRF on the space of SPD matrices and its applications to covariance tracking is discussed in Chapter 4; The Fiber tracking with intrinsic unscented Kalman filter is presented in Chapter 5; This 15

16 is followed by the atlas construction method developed for the group fiber analysis in Chapter 6. And finally the conclusion can be found in Chapter 7. A large part of this thesis has been published in several papers [18, 20]. 16

17 CHAPTER 2 RIEMANNIAN GEOMETRY ON P N In this chapter we introduce the basic concepts of Riemannian geometry on P n, and refer the reader to [34, 53, 70] for details. P n is the space of n n symmetric positive definite (SPD) matrices, which is a Riemannian manifold. It can be identified with the quotient space O(n) \ GL(n) [70], where GL(n) denotes the General Linear group the group of (n n) non-singular matrices, and O(n) is the orthogonal group the group of (n n) orthogonal matrices. This makes P n to be a homogeneous space with GL(n) as the group that acts on it and the group action defined for any X P n by X[g] = gxg t. One can now define GL-invariant quantities such as the GL-invariant inner product based on the group action defined above. We will now begin with inner product in the tangent space of P n. For tangent vectors U and V T X P n (the tangent space at point X, which is the space of symmetric matrices of dimension (n + 1)n/2 and a vector space) the GL invariant inner product is defined as g GL(n), < U, V > X =< gug t, gvg t > gxg t. On P n this GL invariant inner product takes the form, < U, V > X = tr (X 1/2 UX 1 V X 1/2 ). (2 1) With metric/inner product defined on the manifold, the length of any curve in P n, γ : [0, 1] P n is defined as length(γ) 2 = 1 0 < _γ, _γ > γ(t) dt. The distance between X, Y P n is defined as the length of the shortest curve between X and Y (Geodesic distance). With the GL-invariant metric, the distance between X, Y P n is dist(x, Y) 2 = tr (log 2 (X 1 Y)) (2 2) where log is the matrix log operator. Since this distance is induced from the GL-invariant metric in Equation 2 1, this distance is naturally GL-invariant i.e. dist 2 (X, Y) = dist 2 (gxg t, gyg t ) (2 3) 17

18 With GL-invariant metric defined on P n, the intrinsic or Karcher mean of a set of elements X i P n can be computed by performing the following minimization: µ = argmin µ dist 2 (X i, µ) (2 4) using a gradient based technique, where the update equation in each iteration is i µ new = Exp µold (α i Log µ old (X i ) ) N (2 5) where α is the step size, and Exp µold () and Log µold () are the Log and Expential maps at point µ old P n. The Log and Exponential maps [34] are very useful tools on the Riemannian manifold. The Exponential map denoted as Exp X ( ), where X P n, maps a vector rooted at the origin of the tangent space T X P n to a geodesic emanating from X. The Log map (Log X ( )) is the inverse of the Exponential map. The Exponential and Log map on P n are given by: Exp X (V) = X 1/2 exp(x 1/2 VX 1/2 )X 1/2 (2 6) Log X (Y) = X 1/2 log(x 1/2 YX 1/2 )X 1/2 where X, Y P n, V T X P n, and log and exp denote the matrix exp and log operators. The Karcher mean can be viewed as an extension of the arithmetic mean from the Euclidean space to the Riemannian manifold. Similarly, the expectation and the variance can also be extended. Given a random variable M P n with a probability density P(M) E (M) = argmin µ dist(µ, X) P 2 P(X)[dX] (2 7) n Var (M) = dist(e (M), X) 2 P(X)[dX] (2 8) P n 18

19 2.1 GL-invariant metric vs. Euclidean metric on P n There are two primary theoretical reasons for the choice of a GL-invariant metric over the conventional Euclidean metric when doing operations on P n and since our dynamic model is on P n, this is highly relevant. Firstly, P n is an open subset of the corresponding Euclidean space R (n+1)n/2, which implies that P n would be incomplete with a Euclidean metric since its possible to find a Cauchy sequence which might not converge for this case. This implies that for some of the optimization problems set in P n, the optimum can not be achieved inside P n. This in turn means that the covariance updates could lead to matrices that are not covariance matrices, an unacceptable situation in practice. This problem will not arise when using the GL-invariant metric, since the space of P n is geodesically complete with a GL-invariant metric [70]. Secondly, in general, the feature vectors might contain variables from different sources, e.g. object position, object color etc. In this case, a normalization of the (in general) unknown scales of different variables would be necessary when using the Euclidean distance, which is non trivial and may lead to use of ad hoc methods. However, with a GL invariant metric, this scaling issue does not arise since, the presence of different scales for the elements of a feature vector from which the covariance matrix is constructed, is equivalent to multiplication of the covariance matrix with a positive definite diagonal matrix. This operation is a GL group operation and since GL-invariance implies invariance to GL group operations, the scaling issue is a non issue when using a GL-invariant metric. 2.2 Log-Euclidean vs GL-invariance Log-Euclidean defined in [3] is a framework that induces the metric from the Euclidean space to the Riemannian manifold (called Log-Euclidean metric) through the Log map at an arbitrarily chosen point on the manifold. From this definition we can say that, in general for P n (n > 1), the Log-Euclidean metric is not intrinsic. Moreover, 19

20 it is not GL-invariant and is dependent on the aforementioned arbitrarily chosen point. Hence, it is not a natural metric. A typical Log-Euclidean operation is a three step procedure. In the first step, all the data points on the manifold are projected to the tangent space at an arbitrarily chosen point, usually the identity, through the Log map. Then, standard vector space operations can be applied in the tangent space. In the last step, the result of vector space operations which lie in the tangent space are projected back to the manifold via the Exponential map. If one were to use the Log-Euclidean operations to compute the intrinsic/karcher mean of a given population of data on P n, the result will not be the true Karcher mean. Log-Euclidean operations have been used in covariance tracking in [75], wherein, the base point is arbitrarily chosen in the first frame and iteratively updated for subsequent frames using a predefined and constant state-transition matrix. Hence, the base point will never converge to any meaningful statistic of the dataset. Because the Log-Euclidean framework is used to approximate the GL-invariant metric using the Log-Euclidean metric, the approximation error will affect the tracking result, as shown in the Section Algorithms on the Field of SPD Matrices A field of SPD matrices is a map from 2D or 3D Euclidean space to P n. In the discrete case, it can be viewed as an image(volume) where the image value at each pixel(voxel) is a SPD matrix. This field of SPD matrices is also referred as a tensor field. The GL-invariance property is usually required for algorithms on tensor fields. This is because in many applications, e.g. metric tensor field, deformation tensor field, etc, the tensor values at each pixel(voxel) is directly related to the local coordinate system of the image lattice in such a way that when ever the image is deformed, the tensor values should changed linearly according to the Jacobian of the transformation. Assuming the transformation is T on the image lattice, and the tensor value is I (x) = D at point x 20

21 before the transformation. After the transformation, the tensor value would be I (T (x)) = J T (x)dj T (x) t (2 9) where J T (x) is the Jacobian of T at point x. (Note that in DTI (diffusion tensor image) registration Equation 2 9 is called re-transformation [78] which is one of the re-orientation strategies in DTI registration [19]). So operations such as interpolation and dissimilarity measurements on these tensor are required to have the GL-invariant property such that they can be preserved before and after the deformation. One example is the DTI segmentation. The segmentation result could not guaranteed to be the same before and after affine transformation if the distance used is not GL-invariant [78]. 21

22 CHAPTER 3 RECURSIVE KARCHER EXPECTATION ESTIMATION 3.1 Background and Previous Work Tomes of research has been published on finding the mean tensor using different kinds of distances/divergences and has been applied to DTI as well as structure tensor field segmentation, interpolation and clustering. In [79], authors generalized the geometric active contour model-based piece-wise constant segmentation [17, 72] to segmentation of DTIs using the Euclidean distance to measure the distance between two SPD tensors. Authors in [27], present a geometric active contour [16, 49] based approach for tensor field segmentation that used information from the diffusion tensors to construct the so called structure tensor which is a sum of structure tensors formed from each component of the diffusion tensor. A Riemannian metric on the manifold of SPD matrices was used in [32, 40, 77] and [11, 54, 59] for DTI segmentation and for computing the mean interpolant of diffusion tensors respectively. In [78, 80, 91] and [54] the symmetrized KL-divergence was used for DTI segmentation and interpolation respectively. The Log-Euclidean distance was introduced to simplify the computations on the manifold of SPD matrices and this was achieved by using the principal Log-map from the manifold to its tangent space and then using the Euclidean metric on the Log-mapped matrices in the tangent space at the identity [4]. More recently, in [77], a statistically robust measure called the total Bregman divergence (tbd) family was introduced and used for interpolation as well as DTI segmentation. None of the above methods for computing the mean of SPD matrices which are used within the segmentation algorithms or in their own right for interpolation purposes are in recursive form. A recursive formulation would be more desirable as it would yield a computationally efficient algorithm for computing the means of regions in the segmentation application. Also, in many applications such as DTI segmentation, clustering and atlas construction, data are incrementally supplied to the algorithm for 22

23 classification or assimilation for updation of the mean and an algorithm that recursively updates the mean rather than one that recomputes the mean in a batch mode would be much more efficient and desirable. In this dissertation, we pursue this very task of recursive mean computation. The key contributions are: (i) first, we present novel theoretical results proving the L2-convergence of the recursive intrinsic Karcher expectation computation of a set of SPD matrices to the true Karcher expectation. (ii) Additionally, we present recursive formulations for computing the mean using commonly used distance/divergence measures mentioned above and present experiments that depict significant gains in compute time over their non-recursive counterparts. (iii) We present synthetic and real data experiments depicting gains in compute time for DTI segmentation using these recursive algorithms. The rest of the section is organized as follows: in Section 3.2 we present novel theoretical results leading to the intrinsic Karcher expectation computation algorithm. In addition, we present the recursive formulations for the commonly used symmetrized KL-divergence based mean computation as well as the Log-Euclidean distance based mean. Section 3.3 contains synthetic and real data experiments depicting the improvements in computation time of the DTI segmentation task. 3.2 Methods The Recursive Karcher Expectation Estimator We now develop an estimator for the intrinsic (Karcher) expectation that can be used to represent a set of data points in P n (the space of diffusion tensors) and can be computed recursively. This recursive computation property is a very important property especially for online problems where the data points are provided sequentially. This is very pertinent to applications such as DTI and structure tensor field segmentation, diffusion/structure tensor clustering etc. 23

24 Let X k P n, k = 1, 2,... be iid samples in P n from probability measure P(M). The recursive Karcher expectation estimator can be defined as: M 1 = X 1 (3 1) M k+1 (w k+1 ) = M 1 2 k (M 1 2 k X k+1 M 1 2 k ) w k+1 M 1 2 k (3 2) Here we set w k+1 = 1. We now prove the following properties of the recursive Karcher k+1 expectation estimator presented here in the form of theorems with their proofs. Theorem 1.: Let i.i.d. samples X k be generated from a density P(X; µ) that is symmetric w.r.t. to its expectation µ, then M k is a unbiased estimator. By symmetry we mean that X P n P(X; µ) = P(µX 1 µ; µ), note that X, µ, µx 1 µ are on the same geodesic and dist(µ, X) = dist(µ, µx 1 µ) Proof. Without loss generality we assume that µ = I, where I is the identity matrix. Now we can prove the theorem by induction. For k = 1, E (M 1 ) = E (X 1 ) = I where E denotes the Karcher expectation. P m1 (M 1 ; I) is obviously symmetric. Assuming E (M k ) = I, and the the posterior P mk (M k ) is symmetric. Then, P mk+1 (M k+1) = P n P x (X k+1 )P mk (X 1 2 k+1 (X 1 2 k+1 M k+1x 1 2 k+1 )w k+1 1 X 1 2 k+1 )[dx k+1 ] = P mk+1 (M 1 k+1 ) since P x, P mk are symmetric and (X 1 2 k+1 (X 1 2 k+1 M k+1x 1 2 k+1 )w k+1 1 X 1 2 k+1 ) 1 = X 1 2 k+1 (X 1 2 k+1 M 1 k+1 X 1 2 k+1 ) w k+1 1 X 1 2 k+1 Thus, P mk+1 is symmetric with respect to I, and E (M k+1) = I = µ since Log(M)P (M)[dM] = mk+1 Log(N 1 )P (N 1 mk+1 )[dn 1 ] = 0 P n P n 24

25 Theorem 2: A, B P n, and w [0, 1] Log(A 1 2 (A 1 2 BA 1 2 ) w A 1 2 ) 2 tr (((1 w )Log(A) + wlog(b)) 2 ) (3 3) Note that the left side of the inequality is the square distance between the identity matrix and a geodesic interpolation between A and B, and the right side of the inequality is the square distance between the identity and the log linear interpolation. This inequality is true based on the fact that P n is a space with non-positive sectional curvature. Proof. Let γ(w ) = A 1 2 (A 1 2 BA 1 2 ) w A 1 2. Then γ(w ) is a geodesic between A, B Since P n is a Hadamard space, based on Lurie s notes on Hadamard space [46], we know that lhs = dist(i, γ(w )) 2 (1 w )dist(i, A) 2 + wdist(i, B) 2 w (1 w )dist(a, B) 2 (3 4) Also rhs ((1 w )dist(i, A) 2 + wdist(i, B) 2 w (1 w )dist(a, B) 2 ) = w (1 w )(dist(a, B) 2 tr (LogA LogB) 2 ) 0 (3 5) where the last inequality is based on the Cosine inequality in [6]. Thus, we have proved that lhs rhs. 1, 2... Theorem 3. Let w k = 1 k, we then have, Var (M k) 1 k u2, where u 2 = Var (X i ), i = Proof. : We can prove this theorem also by induction. We still assume that E (X k ) = I. When k = 1, Var (M 1 ) = Var (X 1 ) = u 2. 25

26 Assume that the claim is true for k = i, that is Var (M i ) 1 i u2. When k = i + 1, using the lemma above, we know that, Var (M i+1 (w )) P n (1 w )Log(M i ) + wlog(x i+1 ) 2 P(M i )P(X i+1 )[dm i ][dx i+1 ] P n = (1 w ) 2 Var (M i ) + w 2 Var (X i+1 ) (3 6) (1 w ) 2 1 i u2 + w 2 u 2 (1 w ) 2 1 i u2 + w 2 u 2 = 1 i+1 u2 when w = 1 i+1. From the theorems above, we can find that the recursive Karcher expectation estimator is an unbiased estimator for the Karcher expectation when the samples are drawn from a symmetric distribution on P n. And it converges in the L2 sense to the expectation. Of course, this recursive Karcher expectation estimator can be viewed as an approximation of the Karcher sample mean. However, in our experiments, we find that it actually has similar accuracy as the Karcher sample mean. Also, because it is a recursive estimator, it would be far more computationally efficient to use our estimator than the standard non-recursive Karcher mean algorithm when the diffusion tensors are input sequentially to the estimation algorithm as in all the aforementioned applications Recursive form of the symmetrized KL-divergence mean We now present a recursive formulation for computing the symmetrized KL-divergence based mean. Let s recall, the symmetrized KL divergence also called the J-divergence, is defined by J(p, q) = 1 (KL(p q) + KL(q p)) Using the square root of J, one can define 2 a divergence between two given positive definite tensors. The symmetrized KL (KL s ) divergence based mean of a set of SPD tensors is the minimizer of the sum of squared KL divergences. This minimization problem has a closed form solution as shown in Wang et al. [78] and repeated here for convenience, M KL = B 1 [ BA B] B 1 (3 7) 26

27 where A = 1 N T i is the arithmetic mean, B = i 1 N i T 1 i is the harmonic mean, T = {T i } is the given tensor field and N is the total number of tensors. The closed form Equation 3 7 can be computed in an recursive manner as follows. Let the arithmetic and harmonic means at iteration n be denoted by (A n ) and (B n ), respectively. When a new (n + 1) st tensor, T n+1 augments the data set, the quantities A n and B n are recursively updated via the following simple equations, A n+1 = B n+1 = n n A n n T n+1and + 1 (3 8) n n B n n + 1 T 1 n+1. (3 9) Using the above recursive form of the arithmetic and harmonic means of a set of tensors and the closed form expression (Equation 3 7), we can recursively compute the KL s mean of a set of SPD tensors Recursive mean for the Log-Euclidean Metric We now formulate the recursive form of the Log-Euclidean (LE) based mean. It is well known that P n can be diffeomorphically mapped to the Euclidean space using the matrix Log function, which makes it possible to directly induce the Euclidean metric on P n called the Log-Euclidean metric [4]. Then, the Log-Euclidean distance can be defined as, D LE (T 1, T 2 ) = Log(T 1 ) Log(T 2 ) (3 10) where. is the Euclidean norm. The LE-mean on a set of SPD matrices, is obtained by minimizing the sum of the squared LE distances which leads to a closed form solution n M LE = Exp( Log(T i )) (3 11) This closed form expression can be rewritten in a recursive form for more efficient computation. Let M n be the Log-Euclidean mean in the n th iteration. When the (n + 1) st tensor, say T n+1 is added, the current mean can be recursively updated using the i=1 27

28 following equation, n M n+1 = Exp( n Log(M n) n Log(T n+1)). (3 12) Experiments Performance of the Recursive Estimators To justify the performance of the recursive estimators, we first generate i.i.d. samples from the Log-normal distribution [64] on P 3 with the expectation at the identity matrix. Then, we input the 100 random samples sequentially to all estimators including the recursive Karcher expectation estimator (RKEE ), Karcher mean (KM), recursive KL s mean (RKLS), non-recursive KL s (KLS) mean, recursive Log-Euclidean mean (RLEM) and the non-recursive Log-Eucildean mean (LEM) respectively. To compare the accuracy of RKEE and KM, we evaluate the error of the estimator using the squared distance in Equation 2 2 between the ground truth (the identity matrix) and the computed estimate. The accuracy test of the remaining algorithms is not included because for KL s and Log-Euclidean metrics, the recursive and non-recursive algorithm will generate the exact same results. Also, the computation time for each step (each sample) is recorded. For comparison, we have the same settings for all the mean computation algorithms. We run the experiment 20 times and plot the average error and the average computation time at each step in Figure 4-1. In Figure 4-1 (a), we see that the accuracy of computed mean is nearly the same for both the non-recursive Karcher mean and the recursive Karcher expectation estimators after they are given 10 samples. The computation time (in CPU seconds on an I-7, 2.8GHZ processor) for the Karcher mean however increases linearly with the number of steps, while that for the recursive Karcher expectation estimator is nearly a constant and far less than the non-recursive case. This means that the recursive Karcher expectation estimator is computationally far superior especially for large size problems where data is input incrementally, for example in algorithms for segmentation, clustering, classification and atlas construction. Similar 28

29 Figure 3-1. Accuracy and speed comparisons of the recursive versus non-recursive mean computation algorithms for data on P 3. Figure (a) is the mean error of the Karcher mean (red dashed line) and the recursive Karcher expectation estimator (blue solid line) for each step. Figure (b) (c) (d) are the comparisons of computation time (in seconds) between the recursive (red dashed line) and non-recursive (blue solid line) mean computation algorithms for different metrics. Result for the Riemannian metric (Karcher mean) is in Figure (b), KL s in Figure (c), Log-Euclidean in Figure (d). conclusions can also be drawn in Figure 4-1 (c) and (d), where for sequentially input data the recursive mean algorithm for KL s divergence and Log-Euclidean mean are much more efficient than their own batch versions Application to DTI Segmentation In this section, we present results of applying our recursive algorithms to the DTI segmentation problem. In [78], the classical levelset based segmentation algorithm [17] was extended to the field of diffusion tensors. In this algorithm, based on a piecewise constant model, the segmentation procedure became a EM like algorithm, where at each iteration, the mean tensor is computed over each region and the region boundary is then evolved based on the mean tensor. In this section, we use this algorithm to segment DTIs, and plug in different tensor field mean computation techniques for comparison. Firstly, experiments on DTI segmentation of synthetic datasets is presented here. We manually generated an image region of size (64 64) which contains two different kind of tensors (differing in orientation, one vertical and another horizontal). Then DW MRsignal is generated based on [66] with 5 different level of Riccian noised added to 29

30 the DWMR signal σ = 0.1, 0.15, 0.2, 0.25, 0.3, where sigma 2 is variance of the Gaussian noise added to the real and image part of the DW MR signal. DTIs are constructed by using the technique in [9]. Exact same dataset and same setting are used for all six methods. The initialization curve overlayed on a noisy dataset is depicted in Figure 3-2 (a). To evaluate the segmentation result, the dice coefficient between the ground truth segmentation and the estimated segmentation are computed. These results are shown in Figure 3-2 with figure (b) depicting the dice coefficients and figure (c) showing the comparison of the running times. From the figure (b) we can see that the segmentation accuracies are very similar for the recursive and non-recursive methods for the same distance metric. For different distance metric s, result of the Riemannian (GL invariant) metric is the most accurate, since the GL invariant metric is the natural metric on P n. In Figure 3-2, we can find that segmentation using the KM takes significantly longer time than other methods, this is because there is no closed form computation formula for Karcher mean on P n, and hence the Karcher mean computation is very time consuming which can also be seen in Table 3-1. The recursive Karcher expectation estimator is about 2 times faster and has the similar accuracy. For KL s and Log-Euclidean metrics, the time saved by the recursive method is not so significant as for the GL-invariant metric. This is because, although the mean computation time for the recursive method is at least one tenth of the non-recursive method (0.01 versus 0.1 in Table 3-1), the time used for curve evolution is about 1 4 seconds which makes the savings in total segmentation time not significant. From these results we can find that the recursive Karcher expectation estimator is the most attractive from an accuracy and efficiency viewpoint. For the real data experiment, the DTI are estimated [78] from a DW-MR scan of a rat spinal cord. The DWMR data were acquired using a PGSE with TR = 1.5s, TE = 28.3ms, bandwidth = 35 Khz, 21 diffusion weighted images with a b-value of 1250s/mm 2 were collected. The image size is We used the same initialization for each segmentation. We applied all of 30

31 Figure 3-2. Results for the DTI segmentation experiments on the synthetic dataset. Figure (a) is the initialization curve overlayed on the synthetic dataset at one of the noise levels used in the experiments. Figure (b) is the segmentation accuracy evaluated by the dice coefficient of the segmentations from all methods at all noise levels. Figure (c) is the total segmentation time (in seconds) for all segmentation methods at all noise levels. Noise Level RKEE KM RKLS KLS RLEM LEM Table 3-1. Time (in seconds) for mean computation in the DTI segmentation on synthetic dataset the six methods (recursive and non-recursive for each of the three distance measures) to perform this experiment. In order to compare the time efficiency, we report the whole segmentation running time, including the total time required to compute the means. Table 3-2 shows the result of this comparison, from which we can find that it is much more efficient to use the recursive mean estimator in the segmentation than using the batch mean estimator. Especially, in the case of the Karcher mean, which has no closed form formula and takes nearly half of the total reported segmentation time, whereas, using the recursive Karcher expectation estimator makes the computation much faster, and also significantly reduces the total segmentation time. The segmentation results are depicted in Figure 3-3 for each method. Each (3, 3) diffusion tensor in the DTI data are illustrated as an ellipsoid whose axis directions and lengths correspond to 31

Segmentation Method RKEE KM RKLS KLS RLEM LEM Mean computation time 0.02 3.56 0.01 0.56 0.01 0.4 Total segmentation time 5.09 8.13 3.41 4.41 5.45 5.82 Table 3-2.

32 Segmentation Method RKEE KM RKLS KLS RLEM LEM Mean computation time Total segmentation time Table 3-2. Timing in seconds for segmentation of grey matter in a rat spinal cord Figure 3-3. Segmentation results of grey matter in a rat spinal cord for 6 different methods. Figure (a) is RKEE based segmentation. Figure (b) is segmentation using the Karcher mean KM. Figure (c) and (d) are results for the recursive and non-recursive KLs mean estimators respectively. Figure (e) and (f) are results for the recursive and non-recursive Log-Euclidean mean respectively. the eigen-vectors and eigen-values respectively. From the figure we can see that the segmentation results are visually similar to each other, while our recursive Karcher expectation based method takes much less time which would be very useful in practice. 32

33 Figure 3-4. Segmentation results of the molecular layer in a rat hippocampus for 3 different methods. Figure (a) RKEE based segmentation. (b) Recursive KL s based segmentation (c) Recursive Log-Euclidean based segmentation. A second real data set from an isolated rat-hippocampus was used to test the segmentation algorithms. Figure 3-4 depicts the segmentation of the molecular layer in the rat hippocampus. For the sake of space, we present only the segmentation results from the recursive algorithms presented and not the non-recursive counterparts as their results are visually similar and the key difference is in the time savings. 33

34 CHAPTER 4 INTRINSIC RECURSIVE FILTER ON P N 4.1 Background and Previous Work Since P n is a Riemannian manifold, but not a vector space, many operations and algorithms in Euclidean space can not be applied directly to P n and this has lead to a flurry of research activity in the recent past. Several operations in Euclidean space have been extended to Riemannian manifold. For example, the extension of arithmetic mean to Riemannian manifold is Karcher Mean [38]; The extension of Principal Component Analysis (PCA) is the Principle Geodesic Analysis [28, 29]; Mean shift [23] has also been extended to Riemannian manifolds [69]. However, for filtering operations in dynamic scenes such as the popular Kalman filter [67], an intrinsic extension does not exist in literature to date. Recursive filtering is a technique to reduce the noise in the measurements by using theory of recursion applied to filtering. It is often used in time sequence data analysis especially in the tracking problem where the model of the target needs to be updated based on the measurement and previous tracking results. Many recursive filtering techniques have been developed in the Euclidean space, such as Kalman filter, Extended Kalman filter etc, where the inputs and outputs of the filter are all vectors [37, 67]. However, several tracking problems are naturally set in P n, a Riemannian symmetric space [34]. Recent work reported in [61] on covariance tracking uses a covariance matrix (constructed from pixel-wise features inside the object region) that belongs to P n in order to describe the appearance of the target being tracked. This covariance descriptor has proved to be robust in both video detection [71, 74] and tracking [21, 39, 43, 44, 60, 61, 75, 83]. The covariance descriptor is a compact feature representation of the object with relatively low dimension compared to other appearance models such as s the histogram model in [24]. In [73] an efficient algorithm for generating covariance descriptors from feature vectors is reported based on the 34

35 integral image technique, which makes it possible to use covariance descriptors in real time video tracking and surveillance. One major challenge in covariance tracking is how to recursively estimate the covariance template (a covariance descriptor that serves as the target appearance template) based on the input video frames. In [61] and also in [44, 60, 61] the Karcher mean of sample covariance descriptors from a fixed number of video frames is used as the covariance template. This method is based on the natural Riemannian distance the GL-invariant distance in P n. Currently, this Karcher mean can not be computed in closed form, and the computation is achieved using a gradient based optimization technique which is inefficient especially when the input contains a large number of samples. To solve this problem, a Log-Euclidean metric is used in [39, 43], an arithmetic mean like method is used in [83], and an extension of the optimal filter to P n was developed in [75]. However, none of these are intrinsic due to the use of approaches that are extrinsic to P n. Recently, some methods were reported addressing the recursive filtering problem on Riemannian manifolds other than P n. For example, the geometric particle filter in handling 2D affine motions (2-by-2 non-singular matrix) was reported in [39, 42], and an extension to the Riemannian manifold was developed in [65]. However, since the covariance descriptor is usually a high dimensional descriptor, e.g. the degrees of freedom of a 5 5 covariance matrix are 15, the number of samples required for the particle filter would be quite large in this case. Additionally, computing the intrinsic (Karcher) mean on P n is computationally expensive for large sample sizes. Thus, using an intrinsic particle filter to update covariance descriptor would be computationally expensive for the tracking problem. There are also existing tracking methods on Grassmann manifolds [22, 68] however, it is non-trivial to extend these to P n, since the Grassmann manifolds and P n have very different geometric properties, e.g. Grassmann manifolds are compact and have a non-negative sectional curvature 35

36 when using an invariant Riemannian metric [81], while P n is non-compact and has non-positive sectional curvature when using an invariant (to the general linear group (GL)) Riemannian metric [34]. In this dissertation, we focus on the problem of developing an intrinsic recursive filter abbreviated IRF for the rest of this chapter on P n. A novel probabilistic dynamic model on P n based on Riemannian geometry and probability theory is presented. Here, the noisy state and observations are described by matrix-variate random variables whose distribution is a generalized normal distribution to P n based on the GL-invariant measure. In [41, 58] authors provide a linear approximation of this distribution for cases when the variance of the distribution is very small. In contrast, in this dissertation, we explored several properties of this distribution for the arbitrary variance case. We then develop the IRF based on this dynamic model and the Bayesian framework described in [22]. By applying this recursive filter to achieve covariance tracking in conjunction with a particle position tracker [5], we obtain a new efficient real time video tracking algorithm described in Section We present experiments with comparisons to existing state-of-the-art methods and quantitative analysis that support the effectiveness and efficiency of the proposed algorithm. The remainder of this chapter is organized as follows: In Section 4.2 we introduce the probabilistic dynamic model on P n. Then the IRF and the tracking algorithms are presented in Section 4.3, followed by the experiments in Section IRF: A New Dynamic Tracking Model on P n Generalization of the Normal Distribution to P n To define a probability distribution on a manifold, first we need to define a measure on the manifold. Here we use the GL-invariant measure [dx] on P n. GL invariance here implies g GL(n) and X P n, [dgxg t ] = [dx]. From [70], we know [dx] = X (n+1)/2 1 i j n dx ij, where x ij is the element in the i-th row and j-th column of the 36

37 SPD matrix X. Also, this measure is consistent with the GL invariant metric defined on P n defined earlier and also presented in [58]. Similar to the Karcher mean, the Karcher Expectation for the random variable X in any Riemannian Manifold M can be defined as the result of the following minimization problem, E (X) = argmin Y M M dist 2 (X, Y)dµ(X) (4 1) where µ(x) is the probability measure defined in M. Similarly, the variance can be defined based on this expectation by, Var (X) = M dist 2 (X, E (X))dµ(X) (4 2) Note that, in Euclidean space R m which is also a Riemannian manifold, the Karcher Expectation is equivalent to the traditional definition of expectation, and the variance in Equation 4 2 is the trace of the covariance matrix. In P n, by taking gradient of the energy function in Equation 4 1 and setting it to zero, we find that the expectation of the random variable will satisfy the following equation. follows: P n log(e (X) 1/2 XE (X) 1/2 )p(x)[dx] = 0 (4 3) The generalization of the normal distribution to P n used here can be defined as dp(x; M, ω 2 ) = p(x; M, ω 2 )[dx] = 1 Z exp( dist(x, M)2 2ω 2 )[dx] (4 4) where P( ) and p( ) are the probability distribution and density respectively of the random variable X P n, with two parameters M P n and ω 2 R +, and Z is the scalar normalization factor. dist( ) is defined in Equation 2 2. As shown in [58], this distribution has minimum information given the Karcher mean and variance. That is, in the absence of any information this distribution would be the best possible assumption from an information theoretic view point. Also, this distribution is different from the Log-normal distribution which was used in [64, 75]. Actually, the two distributions have 37

38 very similar densities, but the density used here is based on GL invariant measure while Log-normal density is based on the Lebesgue measure in the Euclidean space. A very important property of the above generalized normal distribution is summarized in the following theorem. Theorem 4.1. The normalization factor Z in Equation 4 4 is a finite constant with respect to parameter M P n The consequence of Theorem 4.1 is that, if the prior and the likelihood are both based on the generalized normal distribution defined using the GL-invariant measure, computing the mode of the posterior density can be achieved by minimizing the sum of squared GL-invariant distances from the unknown expectation to the given samples. To prove the Theorem 4.1, we need to first prove the following lemma: Lemma 1. W = exp( Tr (log X log Xt ) P n 2ω 2 )[dx] < Proof. This lemma indicates that the normalization factor Z is constant, and hence p(x; M, ω 2 ) is a probability density function on P n. To prove this lemma, we first represent X in polar coordinates {λ i }, R based on the eigen decomposition, X = RR t, where = diag(λ 1,...λ n ), RR t = I n n. From [70] we know that [dx ] = c n n j=1 λ (n 1)/2 j 1 i<j n λ i λ j n j=1 dλ i λ i dr (4 5) where dr is the invariant measure on the orthogonal group O(n) with O(n) dr = 1, c n is a constant depending on n, and dλ i is the Lebesgue measure in R. With the following 38

39 change of variables, y i = log(λ i ), we get W = c n R n exp( = c n n i=1 ( 1 2ω 2 y 2 i + n 1 2 y i )) R n γ S n sgn(γ)exp( 1 2 c n (2πω 2 ) n 2 n i=1 1 i<j n exp(y i ) exp(y j ) dy (y 2 i /ω 2 + (n 1 2γ(i))y i )) dy n exp( ω2 i=1 γ(i)2 ω 2 n(n 1) 2 /4 ) < γ S n 2 (4 6) where γ is an element of S n which is the set of all permutations of 0, 1,..., n 1, and sgn(γ) denotes the signature of γ which is 1 or 1 depending on the permutation. We are now ready to present the proof of Theorem 4.1. Proof. Assume Z is a function of M, M P n denoted by Z (M). Z (M) = P n exp( dist 2 (X, M) 2ω 2 )[dx] (4 7) Since the GL group action is transitive on P n, N P n, g GL(n) such that N = gmg t. Thus, Z (N) = P n exp( dist 2 (X, gmgt ) 2ω 2 )[dx] = exp( dist 2 (g 1Xg t, M) P n 2ω 2 )[dx] Let Y = g 1 Xg t, so X = gyg t. Substituting this into the above equation we get Z (N) = P n exp( dist 2 (Y, M) 2ω 2 )[dgyg t ] = exp( dist 2 (Y, M) P n 2ω 2 )[dy] = Z (M) Thus, M, N P n, Z (M) = Z (N). From Lemma 1 we know that Z (I) <, by substitution as in the above, we obtain the result, Z is finite and constant with respect to M. One direct consequence of Theorem 4.1 is the following corollary. Corollary 1. Given a set of i.i.d samples {X i } drawn from the distribution dp(x; M, ω 2 ), the MLE of the parameter M is the Karcher mean of all samples. 39

40 Proof. log(p(x 1, X 2,...X m ; M, ω 2 )) = i = nlogz + log(p(x i ; M, ω 2 )) i dist 2 (X i, M) 2ω 2 Since Z is constant with respect to M as proved the Theorem 4.1, we have argmax M p(x 1, X 2,...X m ; M, ω 2 ) = argmin M dist 2 (X i, M) Thus, MLE of the parameter M of the distribution dp(x; M) equals to the Karcher mean of samples. From Theorem 4.1 we know that the normalization factor Z in Equation 4 4 is a function of ω. The integral in Equation 4 6 is non-trivial, and currently no exact solution is available for arbitrary n. For n = 2 we can have, i Z 2 (ω) = 2c 2 = y1 exp( 2πc 2 ωexp( 1 4 ω2 ) 2 i=1 ( 1 2ω 2 y 2 i y i ))(exp(y 1 ) exp(y 2 ))dy 2 dy 1 (exp( (y 1 0.5ω 2 ) 2 2ω 2 )(1 + erf ( y ω 2 2ω 2 )) exp( (y ω 2 ) 2 2ω 2 )(1 + erf ( y 1 0.5ω 2 2ω 2 ))))dy 1 (4 8) where erf (x) = 2 π x 0 exp( t2 )dt is the error function. = 4πc 2 ω 2 exp( 1 4 ω2 )erf ( ω 2 ) The mean and the variance of the generalized normal distribution Similar to the normal distribution in Euclidean space, the mean and the variance of the generalized normal distribution on P n in Equation 4 4 are controlled by the parameters M and ω 2 respectively. The relation between M and dp(x; M, ω 2 ) is given by the following theorem. Theorem 4.2. M is the Karcher Expectation of the generalized normal distribution dp(x; M, ω 2 ). 40

41 Proof. To prove this, we need to show that dp(x; M, ω 2 ) satisfies Equation 4 3. Let = P n log(m 1/2 XM 1/2 )dp(x; M, ω 2 ) then in the integral, using a change of variable, X to Y = MX 1 M (X = MY 1 M). Since P n is a symmetric space and the metric/measure is GL-invariant we know that [dx] = [dy], and dist(x, M) = dist(y, M). Thus we have, = = log(m 1/2 XM 1/2 ) 1 2 P n Z exp( dist (X, M) 2ω 2 )[dx] P n log(m 1/2 Y 1 M 1/2 ) 1 Z exp( dist 2 (Y, M) 2ω 2 )[dy] = = 0. Since P n has non-positive curvature, the solution of Equation 4 1 is unique [38]. Thus M is the Karcher Expectation of dp(x; M, ω 2 ). The variance of dp(x; M, ω 2 ) is controlled by the parameter ω 2. Unlike the multi-variate normal distribution in the Euclidean space, where the Karcher variance(equation 4 2) is equal to nω 2, the relation between the variance and ω 2 of the generalized normal distribution is much more complex. Without loss of generality we assume X P n is a matrix-valued random variable from dp(x; I, ω 2 ). The variance Var (X) = 1 Z P n log(x) 2 exp( log(x) 2 )[dx]. As in Equation 4 6, by using the Polar coordinates 2ω 2 and taking log of the eigen values, we can get Var (X) = ω 2 Var q (y) (4 9) where y is a random variable in R n having the distribution with density function, q(y) = 1 z(ω) exp( 1 2 i y 2 i ) 1 i<j n ω(y i y j ) 2 sinh( ), (4 10) 2 where z(ω) is the normalization factor. Currently there are no analytic solutions for Var q (y) for arbitrary n. When n = 2 we can compute Var (y) using the similar technique 41

42 as in Equation4 8 Var q (y) = ω ω2 πexp( 1 4 ω2 )erf ( ω + 2(1 + ) 4 ) (4 11) 2 From Equation 4 11 we can find that, in P 2 when ω is closed to zero, Var (X) 3ω 2, and when ω is large Var (X) ω4 2. This is because P n can be locally approximated by an Euclidean space. When ω is closed to zero, the major portion of the distribution would be in a small region in P n, where the Euclidean approximation is relatively accurate. Hence, Var (X) is proportional to ω 2 which is similar to the normal distribution in the Euclidean space. When ω 2 is not closed to zero, the the Euclidean approximation is no longer accurate, and the Var (X) becomes a complicated function of ω 2. This property has been used to get the approximation of the generalized normal distribution with small ω 2 in [41, 58]. The following two theorems show that the above stated approximations will still be satisfied for n > 2. Theorem 4.3. Proof. Let v (y, ω) = Var (X) lim ω 0 ω 2 = γ S n sgn(γ)exp( 1 2 n i=1 n(n + 1) 2 (4 12) (y 2 i + ω(n 1 2γ(i))y i )) (4 13) where γ, S n and sgn(γ) are related to the permutation of 0, 1,..., n 1 which is defined to be the same as in Equation 4 6. Also we can find that q(y) = v(y,ω) z(ω) and z(ω) = R n v (y, ω) dy. The Taylor expansion of v (y, ω) up to n(n 1) 2 -th order with respect to ω around zero is v (y, ω) = γ S n sgn(γ) n(n 1) 2 k=0 ( ω) k k! exp( n i=1 y i 2 )( 2 n i=1 n = C ( ω) n(n 1) i=1 2 exp( y i 2 ) 2 ( n i<j n γ(i ))y i ) k + O(ω 2 n n+2 2 ) (y i y j ) + O(ω 2 n n+2 2 ) (4 14) 42

43 where C is a constant. The Equation 4 14 above used the fact that given n non-negative integers κ i, and n i=1 κ i n(n 1) 2, sgn(γ) γ n γ(i) κ i = 0 if {κ i } / S n (4 15) i=1 So, in the Taylor expansion all the terms with degree less than n(n 1) 2 are zeros. In the n(n 1) 2 -th order terms, only terms with powers in S n will be non-zero. Let the density ^q(y) = 1^z exp( n y 2 i=1 i ) 2 1 i<j n y i y j, which is exactly the joint distribution of the eigen values of a Gaussian Orthogonal Ensemble [52] which is a symmetric random matrix with each of its elements being independent random variables drawn from a zero mean Gaussian. In this case, the variance of the diagonal elements in the random matrix is 1 and that of the off diagonal elements is 1. Recall that we are 2 now in polar coordinates. By transforming ^q to the Cartesian coordinates of the space of symmetric matrices we get Var^q (y) = 1^z = (2π) n(n 1) 4 R n y t y^q(y)dy A Sym(n) tr (A 2 )exp( tr (A2 ) )da = 2 n(n + 1) 2 (4 16) where Sym(n) is the space of n n symmetric matrices, and da is the Lebesgue measure in Sym(n). From above we know that, Var (X) lim ω 0 ω 2 = lim ω 0 Var q (y) = Var^q (y) = n(n + 1) 2 (4 17) Note that this theorem could also be got using the approximation of the generalized normal distribution with small ω 2 in [41, 58]. Further more, from the proof above we can get that since the Log-Normal is a projection of a normal distribution from the tangent space (can be identified with Sym(n)) to P n, and here the random variable y 43

44 is the normalized log of the eigen values of X, we can see that when ω is close to zero the generalized normal distribution can be approximated by a Log-Normal distribution. Theorem 4.4. Var (x) lim ω ω 4 = (n3 n) 12 Proof. We first define the upper bound and lower bound on q(y). q u (y) = 1 z u (ω) exp( 1 2 q ι (y) = 1 z ι (ω) exp( 1 2 γ i n i=1 y 2 i ) (y 2 i + ω(n 1 2γ(i))y i )) 1 i<j n ω(y i y j ) 2(cosh( ) 1) 2 (4 18) (4 19) with z u and z ι being the normalization factors respectively. Note that both q u and q ι are both Gaussian mixtures. In q u all mixing weights are positive, while in q ι there are negative weights. After expansion we have, q ι (y) = 1 z ι (ω) β B n w β exp( 1 2 n (y i + ω(n 1 2β(i))/2) 2 ) i=1 w β = α β exp( ω2 n i=1 β(i)2 ω 2 (n(n 1) 2 /4) ) 2 where B n is the set of all possible power combinations of polynomial (4 20) β B n α β n i=1 x β(i) i = 1 i<j n and α β are the coefficients. We can prove that (x i + x j 2 x i x j ) (4 21) max β B n n i=1 β(i) 2 = n i=1 γ(i) 2 = (2n 1)(n2 n) 6 (4 22) The maximum can be achieved only at β S n, and α β = 1, β S n. 44

45 From the definition we can compute the normalization constants and the variances of q ι and q u in a closed form. z u = (2π) n/2 γ exp( ω2 n i=1 γ(i)2 ω 2 n(n 1) 2 /4 ) 2 = (2π) n/2 n!exp( ω2 n(n 2 1) ) 24 z ι = (2π) n/2 α β exp( ω2 n β(i i=1 )2 ω 2 n(n 1) 2 /4 ) 2 β B n n = z u + (2π) n/2 α β exp( ω2 i=1 β(i)2 ω 2 n(n 1) 2 /4 ) 2 β B n \S n Var qu (y) = n + n3 n 12 ω2 n(n 1)2 Var qι (y) = n ω (2π)n/2 ω n 2 exp( ω2 i=1 β(i) 2 ω 2 n(n 1) 2 /4 )( z ι 2 β B n n i=1 β(i) 2 ) (4 23) Since 0 cosh(x) 1 sinh(x) and i x i i x i, we have y R n, z ι q ι (y) zq(y) < z u q u (y) and also z ι z < z u. We can then get the following bounds for Var q (y). From Equation 4 23 we can show that, z ι lim = lim (1 + ω z u ω z ι z u Var qι (y) Var q (y) z u z ι Var qu (y) (4 24) β B n \S n α β exp( ω2 2 ( because β B n \ S n, i β(i)2 < (2n 1)(n2 n) 6. Similarly, n i=1 β(i) 2 (2n 1)(n2 n) )) = 1, (4 25) 6 Var qι (y) lim ω ω 2 = + lim α β exp( ω2 ω 2 ( β B n = n3 n 12 2 n(n 1) 4 n i=1 = lim ω Var qu (y) ω 2 β(i) 2 (2n 1)(n2 n) 6 = lim ω Var q (y) ω 2 ))( n i=1 = lim ω Var (X) ω 4 β(i) 2 ) (4 26) 45

46 4.2.2 The Probabilistic Dynamic Model on P n To perform tracking on P n, obviously the observation Y k and the state X k at time k P n respectively. The state transition model and the observation model can then be defined as p(x k X k 1 ) = 1 Z s exp( dist 2 (X k, gx k 1 g t ) 2ω 2 ) p(y k X k ) = 1 Z o exp( dist 2 (Y k, hx k h t ) 2ϕ 2 ) where g, h GL(n). ω 2, ϕ 2 > 0 are the parameters that control the variance of the state transition and the observation noise. The above two densities are both based on the GL invariant measure on P n, unlike in [64, 75] where they are based on the Lebesgue measure. What does this imply? The key implication of this is that the normalization factor in the densities is a constant for the GL invariant measure and not so for the Lebesgue measure case. If the normalization factor was not a constant, one does not have a valid density. 4.3 IRF-based Tracking Algorithm on P n The Bayesian Tracking Framework For simplicity we use the Bayesian tracking framework described in [22]. The tracking problem can be viewed as, given a time sequence of observations Y s = {Y 1, Y 2,..., Y s } from time 1 to time s, how can one compute the state X s at time s? To solve this problem, first we make two assumptions: (1) The state transition is Markovian, i.e., the state X s depends only on X s 1, or say p(x s X s 1, Y s 1 ) = p(x s X s 1 ) (4 27) (2) The observation Y s is dependent only on the state X s at the current time point s, in other words, p(y s X s, Y s 1 ) = p(y s X s ) (4 28) 46

47 And hence, p(x s X s 1 ) is called the state transition model and p(y s X s ) is called the observation model. The goal of tracking can thus be viewed as computing the posterior p(x s Y s ). First we have And also, p(x s Y s ) = p(x s, Y s )/p(y s ) p(x s, Y s ) (4 29) p(x s, Y s ) = p(y s X s )p(x s X s 1 )p(x s 1, Y s 1 ) s = p(y k X k )p(x k X k 1 ). k=1 So, if X k 2, X k 3,..., X 0 have already been computed, we can then compute ^X k, ^X k 1 by solving the following optimization problem: k ^X k, ^X k 1 = argmax Xk,X k 1 p(y j X j )p(x j X j 1 ) j=k 1 = argmin Xk,X k 1 E k(x k, X k 1 ) where E k (X k, X k 1 ) = ϕ 2 dist 2 (h 1 Y k h t, X k ) + ω 2 dist 2 (gx k 1 g t, X k )+ ϕ 2 dist 2 (h 1 Y k 1 h t, X k 1 ) + ω 2 dist 2 (X k 1, gx k 2 g t ) This problem can be solved by using the gradient descent method on P n, where at each step we compute the gradient which lies in the tangent space, and get the new state by moving along the geodesic in the corresponding direction. At the i-th iteration step, E k X (i) X (i+1) k 1 k TP X (i) = Exp X (i) k 1 k, and E k X (i) k 1 (δ E k X (i) k 1 TP (i) X. Therefore, X (i+1) k k 1 = Exp X (i) k (δ E k X (i) k ) and, ), where δ is the step size and Exp(.) is the Exponential map as 47

48 defined in Chapter 2. The gradient is given by, E k X (i) = ϕ 2 Log Xk (h 1 Y k h t ) + ω 2 Log Xk (gx k 1 g t ) k E k X (i) = ϕ 2 Log (h 1 Xk 1 Y k 1 h t ) k 1 +ω 2 Log (g 1 Xk 1 X k g t ) + ω 2 Log (gx Xk 1 k 2g t ) It is easy to show that, the state update here is an estimation of the mode of the posterior p(x s Y s ), which is different from the usual Kalman filter and particle filter methods, where the state update is the mean of the posterior p(x s Y s ). In the proposed update process, the covariance of the posterior is not necessary for updating the state. We do not provide an update of the covariance here, partly because the covariance update is hard to compute for this distribution on P n. Actually, there s no existing closed form solution for the covariance matrices even for the distribution p(x k X k 1 ) = 1 Z s exp( dist2 (X k,gx k 1g t ) 2ω 2 ) The Tracking Algorithm The recursive filter for covariance matrices (descriptors) on P n presented above can be used in combination with many existing tracking techniques. Many algorithms based on covariance descriptors like those in [61, 75] can use our IRF as the model updating method for covariance descriptors. Here we combine the IRF with a particle position tracker and get a real-time video tracking algorithm. Feature Extraction: Assume we have an rectangular region R with width W and height H which represents the target object in a certain image I in the video sequence. The feature vector f (x, y ),where (x, y ) R, can be extracted to include the information of appearance, position and etc to describe information at the point (x, y ). In [61], the feature vector was chosen to be f = [x, y, I (x, y ), I x (x, y ), I y (x, y ) ] where I x and I y are the components of the gradient I. For color images, I (x, y ) = [R, G, B] is a vector. With the feature vectors at each point in the region of the object, the covariance 48

49 matrix can be computed as C R = 1 WH k R (f k µ R )(f k µ R ) t. This covariance matrix can be computed in constant time with respect to the size of the region R by using the technique called the integral image as was done in [73]. We can also add the mean µ R in to the covariance descriptor and still obtain a symmetric positive definite matrix in the following manner, ^ C R = C R + λ 2 µµ t λµ (4 30) λµ t 1 where λ is a parameter used to balance the affect of the mean and variance in the descriptor (in the experiments λ = 0.001). As in [73] we use several covariance descriptors for each object in the scene. Very briefly, each region enclosing an object is divided into 5 regions and in each of these, a covariance descriptor is computed and tracked individually. A matching score (likelihood) is computed using 4 of them with relatively small distance to the corresponding template in the template matching stage described below. This approach is used in order to increase the robustness of our algorithm. Tracking and Template Matching: We use a sampling importance re-sampling (SIR) particle filter [5] as a position and velocity tracker. The state vector of the particle filter is now given by, (x, y, v x, v y, log(s)) t, where x, y, v x.v y denote the position and velocity of the object in the 2D image, and log(s) is the log of the scale. The state transition matrix is defined based on Newton s first law F = (4 31) The variance of the state transition is a diagonal matrix and in our work reported here, they were set to, (4 2, 4 2, 20 2, 20 2, ). These state transition parameters are 49

50 dependent on the videos being tracked. They could be learned from the manually labeled training sets. The likelihood for the particle filter is based on the generalized normal distribution given in Equation 4 4. At step k, we first compute the prediction of the object covariance template using ^Y k = hgx k 1 g t h t, and the prediction of the position & scale of the object represented in the set of particles based on the state transition matrix (Equation 4 31). Covariance descriptors then are computed for each of the predicted particle state at the corresponding object regions. The likelihood for each covariance descriptor is computed based on the generalized normal distribution centered at the predicted corresponding covariance template. And the likelihood for each particle s state is computed as multiplication of the likelihoods of covariance descriptors that are closer to their corresponding template as mentioned above. After multiplying the likelihood with the weight of each particle, the mean of the sample set is computed. This is followed by computation of covariance descriptors at the location of the mean of the particle set. This covariance descriptor then forms the input to our IRF. In our experiments, we use 300 particles for the particle set. Our tracking algorithm runs in around 15Hz for videos with a frame size of , on a desktop with a 2.8GHz CPU. 4.4 Experiments In this section, we present the results of applying our intrinsic recursive filter to both synthetic and real data sets. The real data sets were taken from standard video sequences used in the computer vision community for testing tracking algorithms. First we present the synthetic data experiments and then the real data The Synthetic Data Experiment To validate the proposed filtering technique, we first performed synthetic data experiments on P 3, the space of 3 3 SPD matrices. A time sequence of i.i.d samples of SPD matrices were randomly drawn from the Log-normal distribution [64] centered at the identity matrix. This was done by first drawing samples {v i }in R 6 (isomorphic to 50

51 Sym(3)) from the Normal distribution N(0, σ 2 I 6 ). Then, these samples are projected to P 3 (denoted by {X i }) using the exponential map at the point I 3 (identity matrix). Thus, {X i } can be viewed as a time sequence of random measurements of the identity matrix. Our recursive filter can then be used as an estimator of this random process. The estimation error at time point k can be computed as the Riemannian distance by Equation 2 2 between the estimate ^X k and the ground truth (the identity matrix). We tested our recursive filter (IRF) and evaluated by comparing its performance with the optimal recursive filter for linear systems on P n (denoted by ORF) reported in [75]. The parameters of ORF are set to be exactly the same as in [75] except for the initial base point X b, where all the samples are projected to the tangent space T Xb P n and then processed with ORF. We set X b to be the observation in the first step. In this problem, setting X b to be the ground truth would give the best result for ORF, because in this case ORF would reduce to the Kalman filter on the tangent space. Since in the practical case, the ground truth is unknown, here we set X b as the observation at the first step which is the best information we know about the data sequence before tracking. We also tried to randomly set X b, and this did not lead to any observable big differences. For the proposed method, the GL actions g, h were both set to be the identity, and ϕ 2 /ω 2 = 200. We performed experiments with three different noise levels, σ 2 = 0.1, 1 and 2. At each noise level we execute the whole process 20 times and computed mean error for the corresponding time step. The results are summarized in Figure 4-1. From the figure, we can see that the ORF performs better when σ 2 = 0.1, and our method (IRF) performs better when the data is more noisy (σ 2 = 1, 2). The reason is that ORF uses several Log-Euclidean operations, which is in fact an approximation. For low noise level data, the data points are in a relatively small region around the ground truth (Identity), in which case, the Log-Euclidean approximation is quite accurate. But for higher noise levels in the data, the region becomes larger and the approximation becomes inaccurate which leads to 51

52 Figure 4-1. Mean estimation error from 20 trials for the synthetic data experiment. The x-axis denotes the time step. The y-axis denotes the estimation error measured using the Riemannian distance between the estimates and the ground truth. In all three sub-figures the red curves denote the estimation error for our IRF, the blue curves for ORF with X b set as the observation in the first step. large estimation error. In contrast, our filtering method is fully based on the Riemannian geometry without any Log-Euclidean approximation, so it performs consistently, and correctly converges for all the three noise levels. In conclusion, although our recursive filter might converge a little bit slower than the ORF, it is more robust to larger amounts noise which is common in real tracking situations The Real Data Experiment For the real data experiment, we applied our IRF to more than 3000 frames in different video sequences. Two other covariance descriptor updating methods were also applied to these sequences for comparison namely, (1) the ORF reported in [75]; (2) The updating method using Karcher mean of tracking results in previous frames (KM) reported in [61]. The image feature vectors for the target region were computed as reported in [61]. The buffer size T in the KM method were set to be 20 which means the Karcher mean of covariance descriptors in 20 previous frames were used for the prediction of the covariance descriptor in current frame. The parameters for the ORF were set to values given in the paper [75]. The parameter controlling the state transition and observation noise in our IRF are set to ω 2 = and ϕ 2 = Since our IRF is 52

53 Table 4-1. Tracking result for the real data experiment. Seq. Obj. Start End Err(IRF) Err(ORF) Err(KM) C3ps C3ps C3ps C3ps C3ps Cosow Cosow combined with a particle filter as a position tracker, for the purpose of comparisons, the KM and ORF are also combined with exactly the same particle filter. Firstly, we used three video sequences from the dataset CAVIAR : 1. ThreePastShop1cor (C3ps1); 2. ThreePastShop2cor (C3ps2); 3. OneShopOneWait2cor (Cosow2). All three sequences are from a fixed camera and a frame size of objects were tracked separately. Given ground truth was used to quantitatively evaluate the tracking results. To measure the error for the tracking results, we used the distance between the center of the estimated region and the ground truth. With all the three methods having the same initialization, the tracking results are shown in the Table 4-1, where all the errors shown are the average errors over all the tracked frames. From the table we can see that ORF is more accurate than KM based methods in most of the results, and our IRF outperforms both these methods. The KM drifts away from the target because it is based on a sliding window approach. If the number of consecutive noisy frames is close to the window size, the tracker will tend to track the noisy features. For ORF, since it is a non-intrinsic approach, the approximation of the GL-invariant metric would introduce errors that accumulate over time across the frames causing it to drift away. Since IRF is an intrinsic recursive filter, which uses the GL-invariant metric, there is less error introduced in the covariance tracker updates. This in turn leads to higher accuracy in the experiments above. In the second experiment, we performed head tracking in video sequences with a moving camera. Two video sequences were used: (i) Seq mb sequence(tracking face) 53

54 and (ii) Seq sb. Each of the sequences have 500 frames with frame size Both sequences are challenging because of complex background, fast appearance changes and occlusions. The results are summarized in Figure 4-2. In Seq mb, KM fails at frame 450 where the occlusion occurs, while ORF and IRF do not lose track. Both KM and ORF produce relatively large errors in capturing the position of the girl s face after the girl turns around the first time between frames 100 to 150 due to the complete change in appearance of the target (girl s face). ORF produces relatively larger error in estimating the scale (compared to the initialization) of the face between frames 400 to 500, which can be found in the snap shots included in Figure 4-2. The result of our method (IRF) has relatively larger error at around frame 100 and 180, because at this time, the camera is tracking the hair of the girl where no feature can be used to locate the position of the face. However, for other frames, IRF tracks the face quite accurately. In Seq sb both KM and ORF fail at frame 200,and IRF however successfully tracked the whole sequence with relatively high accuracy even with fast appearance changes and occlusions, as shown in the quantitative analysis in Figure 4-2. These experiments thus demonstrate the accuracy of our method in both moving camera and fixed camera cases. 54

Figure 4-2. Head tracking result for video sequences with moving camera. Top and bottom rows depict snap shots and quantitative evaluations of the results from the Seq mb and Seq sb respectively.

55 Figure 4-2. Head tracking result for video sequences with moving camera. Top and bottom rows depict snap shots and quantitative evaluations of the results from the Seq mb and Seq sb respectively. The tracking error is measured by the distance between the estimated object center and the ground truth. Tracking results from the three methods are shown by using different colored boxes superposed on the images and different colored lines in the plots. Results from our method (IRF) are in red, ORF in green and KM in blue. 55

56 CHAPTER 5 INTRINSIC UNSCENTED KALMAN FILTER 5.1 Background and Previous Work Diffusion Weighted MR Imaging (DW MRI) is the technique that can measure the local constrained water diffusion properties in different spatial directions in MR signals and thus infer the underlying tissue structure. It is a unique non-invasive technique that can reveal the neural fiber structures in-vivo. The local water diffusion property can be described either via a diffusivity function or a diffusion propagator function. The diffusivity function can be estimated from the DW-MR signals and represented by a 2nd order tensor at each image voxel yielding the so called Diffusion Tensor Imaging (DTI) pioneered in [12]. It is now well known that DTI fails to accurately represent locations the data volume containing complex tissue structures such as fiber crossings. To solve this problem, several higher order models were proposed such as [1, 26, 51]. To further reveal the fibrous structures such as brain white matter, fiber tracking methods were proposed to analyze the connectivities between different regions in the brain. Existing fiber tracking methods fall mainly in two categories, deterministic and probabilistic. One popular deterministic tracking method is the stream line method [13, 50], where the tracking problem is tackled using a line integration. The deterministic tracking method can also based on the (Riemannian or Finsler) geometry imposed by the diffusivity function [62] where the tracking problem is posed as a shortest path problem. In probabilistic fiber tracking methods [14, 63, 89], a probabilistic dynamic model is first built and then a filtering technique such as particle filter is applied. Most of the existing fiber tracking methods are based on two stages namely, first estimating the tensors from DWI and then tracking using these estimated tensors. Recently, in [48] a filtered multi-tensor tractography method was proposed in which the fiber tracking and the multi-tensor reconstruction was performed simultaneously. There are mainly two advantages of this approach: (1) The reconstruction is performed 56

57 only at locations where it is necessary, which would significantly reduce the computational complexity compared to the approaches that first reconstruct whole tensor field and then apply tractography, (2) fiber tracking is used as a regularization in the reconstruction i.e., the smoothness of the fiber path is used to regularize the reconstruction. However, in [48] the filtering is applied only to the tensor features (major eigen vectors etc.) all of which have strict mathematical constraints that ought to be satisfied but not all of the constraints were enforced. For example, the constraint on eigen vectors to lie on the unit sphere was not enforced. In general it would be more favorable to track the full tensor and enforce the necessary constraints. It is known that diffusion tensors are in the the space of symmetric positive definite(spd) matrices denoted as P n, which is not a Euclidean space but a Riemannian manifold. Vector operations are not available on P n. So algorithms that are based on vector operations can not be applied directly to these spaces, and non-trivial extensions are needed. In this dissertation, we propose a novel intrinsic unscented Kalman filter on P n, which to the best of our knowledge is the first extension of the unscented Kalman filter to P n. We apply this filter to both estimate and track the tensors in the multi-tensor model using the intrinsic formulation to achieve better accuracy as demonstrated through experiments. We perform real data experiments to demonstrate the accuracy and efficiency of our method. The rest of the chapter is organized as follows: the intrinsic unscented Kalman filter is described in Section 5.2, where a novel dynamic model defined for the multi-tensor model. We then present the intrinsic unscented Kalman filter algorithm and finally the experiments are presented in Section Intrinsic Unscented Kalman Filter for Diffusion Tensors The State Transition and Observation Models The state transition model on P n here is based on the GL operation and the LogNormal distribution. For the bi-tensor (sum of two Gaussians) model, the state 57

58 transition model at step k is given by, D (1) k+1 = Exp FD (1) k (v (1) k Ft ) D (2) k+1 = Exp FD (2) k Ft (v (2) k ) (5 1) where, D (1) k, D(2) k are the two tensor states at step k, F is the state transition GL based operation, v (1) k and v (2) k are the Gaussian distributed state transition noise for D (1) k and D (2) k in the tangent space T (1) D P 3 and T (2) D P 3 respectively. Here we assume that the k k two state transition noise models are independent from each other and the previous states. The covariance matrices of the two state transition noise models are Q (1) k Q (2) k respectively. The covariance matrix Q (i) k tangent vectors in T D (1) k P 3. Note that Q (i) k and i = 1, 2 is a 6 6 matrix defined for the is not invariant to GL coordinate transform on P n. Assume a random variable X = Exp µ (v) in P n, where v is a random vector from a zero mean Gaussian with Q being the covariance matrix. Then, after a GL coordinate transform g GL(n), the new random variable Y = gxg t = Exp gµg t (u). The covariance matrix of u is Q(g) = (g g) 1 Q(g g) t (5 2) where denotes the Kronecker product. Here we first define the covariance matrix at the identity Q I3 3 = qi 6 6, where q is a positive scalar. And the covariance matrix at point X can be computed using Equation 5 2 by setting g = X 1 2. With this definition the state transition noise is independent with respect to the system state. The observation model is based on the bi-tensor diffusion model. S (n) k = S 0 (e b ng t n D(1) k gt n + e b ng t n D(2) k gt n ) (5 3) where g n denotes the direction of n-th magnetic gradient, and b n is the corresponding b-value, and S (n) k is the MR signal for n-th gradient at iteration step k. The covariance matrix of the observation model for all the magnetic gradients is a diagonal matrix 58

59 denoted by R. This assumes that the measurements from distinct gradient directions are independent The Intrinsic Unscented Kalman Filter Just as in the standard Kalman filter, at each iteration step of the unscented Kalman filter [48] there are two stages, the prediction and update stages respectively. In the prediction stage, the state of the filter at the current iteration is predicted based on the result from the previous step and the state transition model. In the update step, the information from the observation at the current iteration is used in the form of the likelihood to correct the prediction. Since the states are now diffusion tensors which are in the space of P n, where no vector operations are available, we need a non-trivial extension of the Unscented Kalman filter, especially for the prediction stage to be valid on P n. To begin with, we define the augmented state for the bi-(diffusion) tensor state at iteration step k to be where v (i) k u (i) k = Log EK (D (i) X k = [u (1),t k, u (2),t k, v (1),t k, v (2),t k ] t (5 4) i = 1, 2 is the state transition noise vector for diffusion tensor state D (i) k k )(D(i) k ) which is the representation of the state random variable in the tangent plane at its Karcher expectation(e K ( )). X k is zero mean and with covariance matrix denoted by P a k. The covariance matrix for the state [u(1),t k and, u (2),t k ] t is denoted by P k,dd. Note that P a k is a block-wise diagonal matrix composed from P k,dd, Q (1) k and Q (2) k. In the prediction stage, 2L + 1 weighted samples from the distribution of X t k are first computed by a deterministic sampling scheme given below. Here, L = 24 and denotes the dimension of X t k. X k,0 = 0, w 0 = κ/(l + κ) (5 5) X k,j = ( (L + κ)p a k ) j, w j = 1/2(L + κ) (5 6) (L + κ)p a k ) j, w j+n = 1/2(L + κ) (5 7) X k,j+l = ( 59

60 where w j is the weight for the corresponding sample, κ R is a parameter to control the scatter of the samples, and ( (L + κ)p a k ) j is the j-th column vector of matrix (L + κ)p a k. Since samples X k,j = [u (1),t k,j, u (2),t k,j, v (1),t k,j, v (2),t k,j ] t are generated from the joint distribution of posterior and state transition at frame k, we can get the samples from the distribution of prediction in frame k + 1 based on X k,j through a two-step procedure. First we can get the samples from the posterior D (i) k,j = Exp ^D (i) k (u (i) k,j ) (5 8) where ^D (i) k is the state estimate from the last iteration (the estimator of E K (D (i) k )). And then the samples from the predicted distribution can be generated based on D (i) k,j and v (i) k,j, D (i) k+1,j = Exp D (i) k,j (v (i) k,j ) (5 9) where D (i) k+1,j denotes the j-th sample from the distribution of the prediction. The predicted mean is computed as the weighted Karcher mean, ^D (i) k+1 = j w jd (i) k+1,j (5 10) The predicted covariance of the states is computed in the product space T ^D P (1) 3 k+1 T ^D P (2) 3, k+1 P k+1,dd = j w j U j U t j (5 11) where Uj t = [Log ^D (D (1) (1) k+1,j ), Log (D (2) ^D (2) k+1,j )] is a concatenation of the two vectors k+1 k+1 obtained from the Log-map of each predicted sample. Applying the observation model defined in Equation 5 3 to the predicted state samples we get the predicted vector of MR signals for different magnetic gradients denoted by S k+1,j. Because this is in a vector space, we can use standard vector operations to compute the predicted mean ^S k+1 as the average of S k+1,j. Using the observation noise covariance R, the predicted observation covariance can be computed 60

61 as P k+1,ss = R + w j (S k+1,j ^S k+1 )(S k+1,j ^S k+1 ) t (5 12) j Also the cross-correlation matrix between the observation and the states is given by, P k+1,ds = In the update step, the Kalman gain is computed as K k+1 w j (U j (S k+1,j ^S k+1 ) t ) (5 13) j = P k+1,ds P 1 k+1,ss. Knowing the Kalman gain we can update of the states and covariance which are given by: ^D (i) k+1 = Exp z (i) ^D (i) k+1 k+1 (5 14) P k+1,dd = P k+1,dd K k+1 P k,ss K t k+1 where [z (1),t k+1, z(2),t k+1 ]t = K k+1 (S k+1 ^S k+1 ), and S k+1 is the observation (MR signal vector) at step k Experiments To validate our tractography, we applied IUKF to HARDI scans of rat cervical spinal cord at C 3 C 5. In this experiment, 8 different rats were included 6 of them healthy and 2 injured with the injury in the thoracic spinal cord. The HARDI scan for each rat was acquired with 1 s0 image (taken with b closed to zero), and 21 different diffusion gradients with b = 1000s/mm 2, = 13.4ms and δ = 1.8ms. The voxel size of the scan is 35µm 35µm 300µm, and the image resolution is 128x128 in the x y plane and in the z-direction the resolution is 24 to 34. All HARDI datasets where aligned into the same coordinate system by a similarity transform before tracking. To initialize the algorithm, for each scan we first placed a seed point at each voxel of the grey matter, and then a 2nd order tensor estimation is employed as an initialization for the algorithm. In the experiment, various parameters were set to: the state transition noise variance in Equation 5 1 Q 1 = Q 2 = 0.1I, the observation noise variance R = 0.03I and the size of each tacking step δt = 0.01mm. The algorithm stops if the angle between two 61

62 Figure 5-1. Fiber tracking results on real datasets. Figure (a) is the region of interest overlayed with the S0 image. Figure (b) & (c) are the fiber tracking result of a healthy (injured) rat overlayed on S0 where the fibers are colored by its local c direction with xyz being encoded by RGB. [2011] IEEE consecutive tangent vectors becomes larger than 60 degree or the fiber tract arrives at the boundary of the spinal cord. The fiber bundle of interest is the motoneuron which starts from the gray matter and ends at the boundary of the spinal cord. To visualize the motoneuron fiber bundle, we took corresponding ROIs such that only the fiber passing through the ROIs are displayed. The results are shown in Figure 5-2, where we can find fiber bundles starting from the gray matter and end at the boundary of the spinal cord. The differences between the injured and control rats are not easily seen directly. To visualize the difference between the healthy and injured rats, we first computed the axonal fiber density map for each rat by counting the number of fibers passing through the 3-by-3 neighborhood of each voxel. We then non-linearly deform the density map to a spinal cord atlas derived from HARDI data [20] and do voxel-wise t-test analysis. The result are shown in the Figure 5-2, where we can find significant differences between the healthy and the injured rats in the motoneuron region, which demonstrates the effectiveness of our tracking method. 62

63 Figure 5-2. Biomarkers captured by computing density map for each fiber bundle. Figure (a) & (b) show a sample slice of fiber density maps obtained for each control and injured rats, respectively. Figure (c) is the region in which the p-value is less than 0.005, overlaid on the S 0 image. c [2012] IEEE 63

64 CHAPTER 6 ATLAS CONSTRUCTION FOR HARDI DATASET REPRESENTED BY GAUSSIAN MIXTURE FIELDS 6.1 Background and Previous Work Groupwise image registration and image atlas construction is a very important and challenging task in medical image analysis. It has many applications in image segmentation and statistical analysis of a group of subject images. Several researcher groups have tackled variations of this problem and reported their results in literature [36, 47, 57, 82]. Most of these are on groupwise registration and atlas construction from scalar images or segmented shapes. Diffusion-Weighted MR Imaging (DW-MRI) is a powerful noninvasive technique that can capture the information of water diffusion in tissues and thus infer its structure in vivo. Serveral methods have been reported in litrature to model and estimate the diffusivity functions from the MRI signals. One popular method is the so called Diffusion Tensor Imaging(DTI) [12] which approximates the diffusivity function at a voxel by a positive definite matrix. A DTI based atlas will obviously provide more information than conventional scalar image based atlas [55] since DTI contains both scalar and directional information. Atlas construction requires the DTI data to be groupwise registered and in this regard, until recently, most of the DTI registration techniques reported in literature were pairwise registration methods [10, 15, 86, 88]. Some of the existing DTI based atlases are built by coregistration techniques as in [45] but a DTI based groupwise registration and atlas construction methods was reported in [90]. It is however well known that the DTI model cannot resolve complex tissue structure such as fiber crossings. To handle this problem, several higher order models [7, 26, 35, 56] based on High Angular Resolution Diffusion Imaging(HARDI) dataset were reported in literature. Serveral recent works were reported for the HARDI pairwise registrations [8, 10, 19, 30], and shown to outperform DTI based registration especially in aligning fiber crossing regions [10]. But very few works have been reported in 64

65 the groupwise registration for HARDI dataset, except a 4-th order tensor field based groupwise registration reported in [8] which extended the unbiased atlas construction technique in [36] to handle 4-th order tensor fields by using novel distances. In this dissertation, we present a novel atlas construction method for HARDI datasets represented by a Gaussain Mixture Field (GMF) generated by the algorithm described in [35]. GMF is a field of zero mean 3D Gaussian mixture models one each at each lattice point of the field. We use the L 2 distance between GMFs to measure the dissimilarity between two Gaussian mixture fields as was defined in [19]. And we significantly extended the framework in [36] to construct the atlas from a set of GMFs. A novel mean GMF computation method is also presented along with the groupwise registration process. The key contributions are: 1. A GMF based groupwise registration is proposed which is the first of its kind; 2. An objective function involving the L 2 distance between Gaussian mixtures is used that leads to a closed form expression for the distance and the gradient computation. 3. A minimal distance projection is defined and used to obtain a sharp (non-blurry) atlas which is useful in atlas based segmentation. Experiments along with comparisons are presented to demonstrate the performance of our algorithm. The rest of the chapter is organized as follows: The method for atlas construction on GMF is presented in Section 6.2, along with the atlas construction framework in Section 6.2.1, followed by the distance metric in Section 6.2.2, and the implementation and mean GMF computation in Section In Section 6.3, we present synthetic and real data experiments along with comparisons to other existing methods. 6.2 Methods Image Atlas Construction Framework An atlas of a set of images/shapes etc. is commonly defined as an average over the set, which is taken to be a representative of the set. The problem with simply taking an average as the atlas is that the average tends to be rather blurred and is not effective 65

66 for use in tasks such as, atlas-based segmentation or atlas-based registration etc. Mathematically speaking, this can be caused due to the fact that the average may not necessarily belong to the same abstract space (e.g., space of brain images) defined by the original data set. For instance, the technique described in [36] searches for the average in the image space without a constraint on the space of images, which can lead to a blurred atlas image. To solve this problem, in [47], the atlas is constrained to be deformed diffeomorphically from a super template which needs to be pre-selected; And in [31, 57, 85] the structure of the subject image space is learned from the dataset, using which the atlas is computed. These methods need registrations between all the image pairs in the dataset (O(N 2 ) registrations), which makes the approach computationally expensive for large datasets. Here we define the space of images of interest to us (spinal cord images) to be spanned by a set of GMFs {I n } N n=1 and denoted by S = n O(I n), where O(I n ) = {J : J = I n T, T Di } is the orbit spanned by the image I n and all the diffeomorphic deformations T n : In, where denotes the domain of the image. Thus, finding the atlas I can be viewed as solving the following problem m, T 1,... = argmin m,t 1,... E (I n T n, I m T m ) + ϕ(t n ) And the final atlas could be defined as ^I = 1 m [I m T m ], where m is the Jacobian of the deformation T m, and 1 m [] denotes the re-orientation operation discussed in Section Solving this problem directly would make the computational complexity similar to O(N 2 ) pairwise registrations. What we would like to do, is to achieve an approximate solution using a two step procedure. n In the first step, we try to find a intermidiate atlas in the space of all images, which can be viewed as solving the optimization problem (similar as [36] but generalized to GMFs) I, T 1,... = argmin I,T1,... n E (I n T n, I ) + ϕ(t n ) (6 1) 66

67 where the data term energy function E (, ) is defined as a sum of squared voxelwise distance (details in Section6.2.2) E (I n T n, I ) = I dist 2 n (I n T n (x), I (x))dx (6 2) ϕ is the penalty term used to enforce the smoothness constraint on the deformation. The deformation can be modeled as a diffeomorphism, and parametrized by a velocity field T n t = v n (T n (x, t), t). Thus the deformation can be computed as T n (x) = x+d n (x) = x v n(x(t), t)dt, where d n represents the displacement field. The smoothness constraint we use here is given by, ϕ(t n ) = λ log(det( n )) (1 det( n )) + I 1 0 Lv n (x, t) 2 dt. (6 3) Where, L is a linear operator, and the first term in Equation 6 3 imposes additional smoothness as in [87]. In the second step, we project the intermidiate atlas to the space S by solving another distance minimization ^I = 1 m [I m T ] m, T = argmin n,t E (I n T, I ) (6 4) and the projection result ^I is our final atlas L 2 Distance and Re-orientation for GMs We use the L 2 distance as a dissimilarity measure between two Gaussian mixture densities (GMs), which can be computed in a closed form [19]. Let f (r) = M i=1 η ig (r; 0, i ) and g(r) = N j=1 ρ jg (r; 0, j ) be two Gaussian mixture density functions, where r R 3 is the displacement vector and η i, ρ j denote the mixture weights of the corresponding Gaussian components G (r; 0, i ) and G (r; 0, j ) with covariance matrices i and j respectively. The L 2 distance between f and g can be written as a quadratic function of the mixture weights dist 2 (f, g) = η t Aη + ρ t Bρ 2η t Cρ, where 67

68 η = (η 1,..., η M ) t and ρ = (ρ 1,..., ρ N ) t, and A M M, B N N and C M N are the matrices generated by the Gaussian components, see [19] for details. A re-orientation operation is needed for image transformation, when the image value at each pixel/voxel is related to the image coordinates and is not rotationally invariant [2]. In this case, the image value should change according to the image coordinates. Otherwise, an artifact might be introduced due to this transformation. As in [19], the Preservaion of Principal Direction(PPD) re-orientation is extended to GMs, and it is the only re-orientation strategy that can capture the change of angle between fiber crossings during a non-rigid transformation. In this dissertation, we adopt this re-orientation strategy, and the energy function with reorientation would be dist 2 (f, g) = ηt A 1η + ρ t Bρ 2η t C 1ρ, where computation of A 1 and C 1 can be found in [19] Mean GMF Computation We employ an (iterative) greedy algorithm to solve the problem in Equation 6 1. Given the initialization of atlas I and {T n }, in each iteration step, we first fix the deformations and update the atlas by optimizing w.r.t. I. I new = argmin I n dist 2 n (I n T n (x), I (x))dx Since we use an L 2 distance, the global minimum can be found as I new = 1 n [I n T] N. However, from this formula, I new (x) would have many more components than I n (x) would have had, which would make the algorithm computational expensive. To solve this problem, we fixed the group of mixture components of I new. Since each mixture component is zero mean, and cylindrically symmetric with fixed eigenvalues. All we need to do is to decide on the eigen vectors. In this dissertation, we discretize the hemi-sphere using 46 different directions, and used them as the eigen vectors (the same approach has been used in the reconstruction method [35]). Then, the only thing left is to compute the mixing weights, which is equivalent to solving the linear system at each voxel Aρ = n C η 1 n. This would be easy to solve since A is low dimensional and full rank. n I 68

69 After updating the atlas, the force field can be first computed as the first order variation of data term of the objective function plus the first term on the right (enforcing additional smoothness) in Equation 6 3; And then the velocity field is updated as the Gaussain kernel based smoothed version of the force field [8], and the deformation field T(x) is updated using the following update equation T new (x) = T old (x + ϵv). The derivative of the objective function is computed by applying the derivative chain rule as in [19]. And the derivative of the regularization term in Equation 6 3 can be computed directly using the derivative for the determinent. We employ a coarse-to-fine strategy in our registration algorithm for an efficient implementation. With initialization of the deformation set to identity, the algorithm yields satisfactory results in 200 steps. After we get the atlas I, we can project it to S to get ^I by using Equation Experiments Synthetic Data Experiments To validate the registration framework used in our atlas construction method, we first apply our method to the pairwise registration problem. For the two images case, Equation 6 1 would reduce to T 1, T 2 = argmin T1,T 2 E (I 1 T 1, I 2 T 2 ) + ϕ(t 1 ) + ϕ(t 2 ). (6 5) By using the displacement field inverse method [25], we can get T = T 1 T 1 2 : I2 I1. Thus we have a pairwise registration algorithm. We applied this algorithm to the synthetic dataset and then compared it to a GA based registration algorithm (using an SSD cost) and a DTI based registration algorithm in [88] with the same dataset. To generate the synthetic dataset, a 3D synthetic image ( ) with two crossing fiber bundles was generated and then, 20 randomly deformed images were synthesized from this by using a bspline-based non-rigid deformation. The method described in [66] was used to generate the simulated MR signals from the fiber bundles. 69

70 Figure 6-1. Experimental results on synthetic dataset. Figure (a) and (b) are the mean and standard deviation of error for the 20 registrations from all the three methods at different noise levels in the two different ROIs. c [2011] IEEE Rician noise was added to simulate data at 4 different noise levels with SNR = 50, 20, 10 and 5. The method in [35] was used to generate the GMF from the MR signals with 46 Gaussian components at each voxel. After the data generation, we registered each of the randomly deformed images (source image) to the original image (target image) separately. To evaluate the registration, the resulting deformation obtained from the registration was applied to the noise free source image, and then the dissimilarity between the deformed source and target images were computed as the error in registration. The dissimilarity measure we used here was Hellinger distance between the displacement probability profiles (represented in spherical harmonic coefficients) at corresponding lattice pointsr = ( S f (x) g(x)) 2 dx. Also, we computed the registration error in two 2 different regions: (1) the whole image, (2) the region that contains only the crossing fibers. The data sets and the results are displayed in Figure 6-1. Figure (a) and (b) show that our method yields a slightly lower mean and standard deviation of the registration errors for the whole image, and much lower error in the fiber crossing region for all four noise levels. This demonstrates the accuracy of our HARDI registration method. 70

71 6.3.2 Real Data Experiments For the real data experiments, we apply our method to the HARDI scans of a rat spinal cord at C 3 5 for 7 different rats. In each scan 21 diffusion gradients were used (with and the δ was set to be 13.4ms and 1.8ms) with b = 1000s/mm 2. Also, one S0 image is taken with b close to zero. The image resolution in the x y plane was , with the number of slices varying from 25 to 34. The voxel size is 35µm 35µm 300µm. [35] was used to generate the GMF from the MR signals with 46 Gaussian components at each voxel. We first apply a similarity registration to quotient out the translation, rotation and scaling factors, and apply our groupwise registration algorithm to the dataset to get an atlas I. In the following, we projected I to the space spaned by the given data samples using Equation 6 4 to get the sharp (non-blurry) atlas ^I. The results of our atlas construction method are depicted in Figure 6-2 (via S 0 images even though the algorithm was applied to the the GMF representation of the HARDI data), where the (a) is the voxelwise mean of S0 images before registration, and (b) after registration. We can see that (a) is fuzzy because the structure is not well aligned, and (b) is not. This indicates the effectiveness of our groupwise registration method. (c) is the I m in Equation 6 4, and (d) is the final atlas ^I. We can see that (b) is much more blurry than (d), and the shape of the boundary between white and grey matter is nearly the same for (b) and (d). This indicates that ^I could be a good representative for the whole dataset, and thus justifies the effectiveness of our method. 71

Figure 6-2. Experimental results on real datasets depicted using S0 images. Figure (a)-(d) are all S0 images, with (a) the voxelwise mean before groupwise registration, and (b) after registration.

72 Figure 6-2. Experimental results on real datasets depicted using S0 images. Figure (a)-(d) are all S0 images, with (a) the voxelwise mean before groupwise registration, and (b) after registration. (c) is the S0 image for Im and (d) is the S0 image for the final atlas ^I. Figure (f) is the diffusion profile of ^I, and is colored by the max direction of the diffusion profile, with xyz direcitions c mapped to RGB. [2011] IEEE 72

H. Salehian, G. Cheng, J. Sun, B. C. Vemuri Department of CISE University of Florida

Tractography in the CST using an Intrinsic Unscented Kalman Filter H. Salehian, G. Cheng, J. Sun, B. C. Vemuri Department of CISE University of Florida Outline Introduction Method Pre-processing Fiber