Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family

Size: px

Start display at page:

Download "Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family"

Easter Paul
5 years ago
Views:

1 1 Bayesian Inference via Approximation of Log-lielihood for Priors in Exponential Family Tohid Ardeshiri, Umut Orguner, and Fredri Gustafsson arxiv: v1 [cs.lg 5 Oct 015 Abstract In this paper, a Bayesian inference technique based on Taylor series approximation of the logarithm of the lielihood function is presented. The proposed approximation is devised for the case, the prior distribution belongs to the exponential family of distributions. The logarithm of the lielihood function is linearized with respect to the sufficient statistic of the prior distribution in exponential family such that the posterior obtains the same exponential family form as the prior. Similarities between the proposed method and the extended Kalman filter for nonlinear filtering are illustrated. Furthermore, an extended target measurement update for target models the target extent is represented by a random matrix having an inverse Wishart distribution is derived. The approximate update covers the important case the spread of measurement is due to the target extent as well as the measurement noise in the sensor. Index Terms Approximate Bayesian inference, exponential family, Bayesian graphical models, extended Kalman filter, extended target tracing, group target tracing, random matrices, inverse Wishart. I. INTRODUCTION Determination of the posterior distribution of a latent variable x given the measurements observed data y is at the core of Bayesian inference using probabilistic models. These probabilistic models describe the relation between the random latent variables, the deterministic parameters, and the measurements. Such relations are specified by prior distributions of the latent variables px, and the lielihood function py x which give a probabilistic description of the measurements given some of the latent variables. Using the probabilistic model and measurements the exact posterior can be expressed in a functional form using the Bayes rule px y = pxpy x pxpy x dx. 1 The exact posterior distribution can be analytical. A subclass of cases the posterior is analytical is when the posterior belongs to the same family of distributions as the prior distribution. In such cases, the prior distribution is called a conjugate prior for the lielihood function. A wellnown example an analytical posterior is obtained using conjugate priors is when the latent variable is a priori normaldistributed and the lielihood function given the latent variable as its mean is again normal. The conjugacy is especially useful when measurements are processed sequentially as they appear in the filtering tas for stochastic dynamical systems nown as hidden Marov models Tohid Ardeshiri and Fredri Gustafsson are with the Department of Electrical Engineering, Linöping University, Linöping, Sweden, tohid@isy.liu.se, fredri@isy.liu.se. The authors gratefully acnowledge funding from the Swedish Research Council VR for the project Scalable Kalman Filters. Umut Orguner is with Department of Electrical and Electronics Engineering, Middle East Technical University, Anara Turey, umut@metu.edu.tr. x 1 y 1 x y x 3 Figure 1: A probabilistic graphical model for stochastic dynamical system with latent state x and measurements y. HMMs whose probabilistic graphical model is presented in Fig. 1. In such filtering problems, the posterior to the last processed measurement is in the same form as the prior distribution before the measurement update. Thus, the same inference algorithm can be used in a recursive manner. The exact posterior distribution of a latent variable can not always be given a compact analytical expression since the integral in the denominator of 1 may not be available in analytical form. Consequently, the number of parameters needed to express the posterior distribution will increase every time the Bayes rule 1 is used for inference. Several methods for approximate inference over probabilistic models are proposed in the literature such as variational Bayes [1, [, expectation propagation [3, integrated nested Laplace approximation INLA [4, generalized linear models GLMs [5 and, Monte-Carlo MC sampling methods [6, [7. Variational Bayes VB and expectation propagation EP are two analytical optimization-based solutions for the approximate Bayesian inference [8. In these two approaches, Kullbac-Leibler divergence [9 between the true posterior distribution and an approximate posterior is minimized. INLA is a technique to perform approximate Bayesian inference in latent Gaussian models [10 using the Laplace approximation. GLMs are an extension of ordinary linear regression when errors belong to the exponential family. Sampling methods such as particle filters and Marov Chain Monte Carlo MCMC methods provide a general class of numerical solutions to the approximate Bayesian inference problem. However, in this paper, our focus is on fast analytical approximations which are applicable to large-scale inference problems. The proposed solution is built on properties of the exponential family of distributions and earlier wor on extended Kalman filter [11. In this paper, a Bayesian inference technique based on Taylor series approximation of the logarithm of the lielihood function is presented. The proposed approximation is derived for the case the prior distribution belongs to the exponential family of distributions. The rest of this paper is organized as follows; In Section II an introduction to exponential family of distribution is provided. In Section III a general algorithm for approximate inference in the exponential family of distributions is suggested. We show how the y 3 x T y T

2 proposed technique is related to the extended Kalman filter for nonlinear filtering in Section III-B. A novel measurement update for tracing extended targets using random matrices is given in Section IV. The new measurement update for tracing extended targets is evaluated in numerical simulations in Section V. The concluding remars are given in Section VI. II. THE EXPONENTIAL FAMILY The exponential family of distributions [8 include many common distributions such as Gaussian, beta, gamma and Wishart. For x X the exponential family in its natural form can be represented by px; η = hx expη T x Aη, η is the natural parameter, T x is the sufficient statistic, Aη is log-partition function and hx is the base measure. η and T x may be vector-valued. Here a b denotes the inner product of a and b 1. The log-partition function is defined by the integral Aη log hx expη T x dx. 3 X Also, η Ω = {η R m Aη < + } Ω is the natural parameter space. Moreover, Ω is a convex set and A is a convex function on Ω. When a probability density function PDF in the exponential family is parametrized by a non-canonical parameter θ, the PDF in can be written as px; θ = hx expηθ T x Aηθ, 4 θ is the non-canonical parameter vector of the PDF. Example 1. Consider the normal distribution px = N x; µ, Σ with mean µ R d and covariance matrix Σ. Its PDF can be written in exponential family form given in hx = π d, 5a [ x T x = vecxx T, 5b [ Σ ηµ, Σ = 1 µ vec 1, 5c Σ 1 Aηµ, Σ = 1 µt Σ 1 µ + 1 log Σ. 5d Remar 1. In the rest of this document, the vec operators will be dropped to eep the notation less cluttered. That is, we will use the simplified notation T x = x, xx T instead of 5b and ηµ, Σ = Σ 1 µ, 1 Σ 1 instead of 5c. A. Conjugate Priors in Exponential Family In this section we study the conjugate prior family for a lielihood function belonging to the exponential family. Consider m independent and identically distributed IID measurements Y {y j R d 1 j m} and the lielihood function 1 In the rest of this paper we may use a T b in place of a b when a and b are vector valued. When a and b are matrix valued we may equivalently write Tra T b instead of veca vecb Tr is the trace operator and the operator vec vectorizes its argument lays it out in a vector. py x, λ in exponential family, λ is a set of hyperparameters of the lielihood and x is the latent variable. The lielihood of the m IID measurements can be written as m py x, λ = hy j exp ηx, λ T y j maηx, λ. 6 We see a conjugate prior distribution for x denoted by px. The prior distribution px is a conjugate prior if the posterior distribution px Y py x, λpx 7 is also in the same exponential family as the prior. Let us assume that the prior is in the form for some function F. Hence, px expfx 8 px Y py x, λ expfx 9 exp ηx, λ T y j maηx, λ + Fx. 10 For the posterior 10 to be in the same exponential family as the prior [1, F needs to be in the form Fx = ρ 1 ηx, λ ρ 0 Aηx, λ 11 for some ρ ρ 0, ρ 1, such that px Y exp ρ 1 + T y j ηx, λ m + ρ 0 Aηx, λ. 1 Hence, the conjugate prior for the lielihood 6 is parametrized by ρ and is given by px; ρ = 1 Z exp ρ 1 ηx, λ ρ 0 Aηx, λ, 13 Z = exp ρ 1 ηx, λ ρ 0 Aηx, λ dx. 14 In Example, the conjugate prior for the normal lielihood is derived using the exponential family form for the normal distribution given in Example 1. In Example 3, a conjugate prior for a more complicated lielihood function is derived. Example. Consider the measurement y with the lielihood py x = N y; Cx, R. The lielihood can be written in exponential family form as in py x = π d exp ηcx, R T y AηCx, R 15 T y = y, yy T, ηcx, R = R 1 Cx, 1 R 1 and AηCx, R = 1 xt C T R 1 Cx + 1 log R 16a = 1 C T R 1 C xx T + 1 log R. 16b

3 3 Therefore, the lielihood function can be written in the form py x exp C T R 1 y, 1 CT R 1 C T x. 17 Hence, the conjugate prior for the lielihood function of this example should have T as its sufficient statistic i.e., the conjugate prior is normal. When the prior distribution is normal i.e., px = N x; µ, Σ the posterior distribution obeys px y exp C T R 1 y + Σ 1 µ, 1 CT R 1 C 1 Σ 1 T x 18 which is the information filter form for the Kalman filter s measurement update [13. Example 3. Consider the measurement y with the lielihood py x exp 1 4 y x + cosy x. 19 The lielihood function is a multi-modal lielihood function and is illustrated in Fig. for y = 3. The lielihood can be written in the form py x expλ T x 0 y 1 cos y sin y T and T x = λ = [ 1 [ 4 x x cos x sin x T. Hence, a conjugate prior for the lielihood function of this example should have T as its sufficient statistic such as px exp [ T x. Thus, the posterior for the lielihood function and the prior distribution obeys px y exp [ y 1 cos y sin y T T x. 1 Although such posteriors can be computed analytically up to a normalization factor, the density may not be available in analytical form. The numerically normalized prior and the posterior are illustrated in Fig.. prior lielihood posterior x Figure : The lielihood function 19 prior distribution 0 and the posterior distribution 1 in Example 3 are plotted. Similar reasoning as the one presented in this section can be applied to the case the prior is fixed and the lielihood function is to be selected; The concept of conjugate lielihoods will be considered in the sequel. Table I: Some continuous exponential family distributions and their sufficient statistic are listed. Continuous Exp. Family Distribution T Exponential distribution x Normal distribution with nown variance σ x/σ Normal distribution x, xx T Pareto distribution with nown minimum x m log x Weibull distribution with nown shape x Chi-squared distribution log x Dirichlet distribution log x 1,, log x n Laplace distribution with nown mean µ x µ Inverse Gaussian distribution x, 1/x Scaled inverse Chi-squared distribution log x, 1/x Beta distribution log x, log1 x Lognormal distribution log x, log x Gamma distribution log x, x Inverse gamma distribution log x, 1/x Gaussian Gamma distribution log τ, τ, τx, τx Wishart distribution log X, X Inverse Wishart distribution log X, X 1 B. Conjugate Lielihoods in Exponential Family First, we define the conjugate lielihood functions; Definition 1. A lielihood function py x is called conjugate lielihood for prior distribution px in a family of distributions, if the posterior distribution px y belongs to the same family of distributions as the prior distribution. Now, consider the prior distribution px; ρ in the exponential family form on the latent variable x X, ρ is a set of hyper-parameters for the prior px; ρ = hx exp ηρ T x Aηρ. We will follow the treatment given for characterization of the conjugate priors in Section II-A as well as [1, for the conjugate lielihood functions. Assume now that we have observed m IID measurements Y {y j R d 1 j m}. Let the lielihood of the measurements be written as py x exp Lx, Y 3 for some function L. We see a conjugate lielihood function in the exponential family. Hence, the posterior distribution has to be in the exponential family, px Y hx exp ηρ T x Aηρ + Lx, Y. 4 For the posterior 4 to be in the same exponential family, L needs to be in the form Lx, y + = λy T x 5 + = means equality up to an additive constant with respect to latent variable x such that px Y hx exp ηρ + λy T x. 6 Hence, the conjugate lielihood for the prior distribution family is parametrized by λy and is given by py x exp λy T x. 7 The property expressed in 7 is the necessary condition which should hold for a conjugate lielihood for a given prior distribution family. Another condition which should hold is that the log-partition function of the posterior 6 should obey A = log hx expηρ + λy T x dx < 8 X

4 4 such that the natural parameter of the posterior belongs to the natural parameter space Ω. These properties of the conjugate lielihood are summarized in theorem 1. Theorem 1. The lielihood function py x is a conjugate lielihood for the prior distribution in an exponential family px = hx expη T x Aη with x X if and only if 1 py x expλ T x for some λ and, the lielihood function py x is integrable with respect to y and, 3 log X hx expη + λ T x dx < for η Ω. In Examples 4 and 5, the conjugate lielihood functions for two prior distributions are derived using the exponential family form. Example 4. Consider the normal distribution px = N x; µ, Σ whose PDF is written in exponential family form in Example 1. The conjugate lielihood function has to be in the form py x exp λ 1 y x + λ y vecxx T 9 for some permissible λ [λ 1, λ, which includes py x = N y; Cx, R for a matrix C with appropriate dimensions and symmetric R 0. Example 5. Consider the prior distribution px exp λ T x 30 T x = [ x x cos x sin x T. The family of conjugate lielihood density functions py x is parametrized by ψ R 4 such that 1 log py x = + ψ T x, py x dy = 1, i.e., the lielihood is integrable with respect to y, 3 expλ + ψ T x dx <, i.e., the posterior is integrable with respect to x. A member of this family is py x exp α y x + cosβy x + γ 31 for [α, β, γ T R 3. In Table I the sufficient statistic for some continuous members of the exponential family are given. Some members of the continuous exponential family have conjugate lielihoods which are summarized in Table II. III. MEASUREMENT UPDATE VIA APPROXIMATION OF THE LOG-LIKELIHOOD In this section a method is proposed for the analytical approximation of complex lielihood functions based on linearization of the log-lielihood with respect to sufficient statistic function T. We will use the property of conjugate lielihood functions i.e., linearity of log-lielihood function with respect to sufficient statistic of the prior distribution to come up with an approximation of the lielihood function for which there exists analytical posterior distribution. In the proposed method first, the log-lielihood function is derived in analytical form. Second, the lielihood function is approximated by a dot product of a statistic which depends on the measurement and the sufficient statistic of the prior distribution. For example, consider a lielihood function py x exp Lx, Y 3 the log-lielihood function is given by L and the prior distribution is given by px = hx exp η T x Aη 33 η Ω. If we approximate L such that Lx, Y Lx, Y + = λy T x 34 for some λ such that for any measurement Y belonging to the support of the lielihood function and, η + λy Ω 35 exp Lx, Y dy <, 36 then the approximate lielihood will be conjugate lielihood for the prior distribution with sufficient statistic T x. When the prior distribution is normal, the proposed approximation can be obtained using well-nown Taylor series approximation at the global maximum of the log-lielihood function. This is due to the fact that T x = x, xx T for the normal prior and a second order Taylor series approximation of the log-lielihood approximates the log-lielihood with a function linear in T x. This mathematical convenience is used in the well-nown approximate Bayesian inference technique INLA [4. Similarly, when the prior distribution is exponential distribution, a first order Taylor series approximation can be used to approximate the log-lielihood with a function linear in T x = x. Taylor series expansion of functions will be used in the proposed approximations in the following. Hence, we establish the notation used in this paper for Taylor series expansion of arbitrary functions before we proceed. A. Taylor series expansion The Taylor series expansion of a real-valued function fx that is infinitely differentiable at a real vector x is given by the series fx =f x + 1 1! xf x x x + 1! xf x x xx x T + 37 x f x is the gradient vector which is composed of the partial derivatives of fx with respect to elements of x, evaluated at x. Similarly, xf x is the Hessian matrix of fx containing second order partial derivatives and evaluated at x. When fx is a vector-valued function, i.e., fx R d with ith element denoted by f i x, first the Taylor series expansion is derived for each element separately. Then, the expansion of fx at x is constructed by rearranging the terms in a matrix. fx =f x + 1 1! xf x T x x + 1! d e i xf i x x xx x T + i=1 38 In 38, x f x is the Jacobian matrix of fx evaluated at x and e i is a unity vector with its ith element equal to 1.

5 5 Table II: Some prior distributions and their conjugate lielihood functions are listed. The arguments of the PDFs on the left columns are the latent random variables. On the right hand side column the conjugate lielihood functions py are given y denotes the measurement. In the middle column the sufficient statistic functions of the prior distributions in the left column are given. The logarithm of the conjugate lielihood functions are linear in sufficient statistic function to their left. Prior distribution with sufficient statistic T T Conjugate lielihood function N x; µ; Σ x, xx T N y; Cx, R exp Tr R 1 y Cxy Cx T N x; µ; Σ x, xx T log N y; x, σ exp 1 σ logy x Gammax; α, β log x, x Expy; x = x exp xy Gammax; α, β log x, x IGammay; α, x x α exp x/y Gammax; α, β log x, x Gammay; α, x x α exp xy Gammax; α, β log x, x N y; µ, x 1 x 1 exp x y µ IGammax; α; β log x, 1/x N y; µ, x x 1 exp 1 x y µ IGammax; α; β log x, 1/x Weibully; x, x y 1 exp y x GaussianGammax, τ; µ, λ, α, β log τ, τ, τx, τx N y; x, τ 1 τ 1 exp 1 τy x WX; n, V log X, X N y; µ, X 1 X d exptrxy µy µ T IWX; ν, Ψ log X, X 1 N y; µ, X X d exptrx 1 y µy µ T B. The extended Kalman filter A well-nown problem a special case of the proposed inference technique is used is the filtering problem for nonlinear state-space models in presence of additive Gaussian noise which will be described here. Consider the discrete-time stochastic nonlinear dynamical system y x N y ; cx, R, x +1 x N x +1 ; fx, Q, 39a 39b x and y are the latent variable and the measurement at time index and c and f are nonlinear functions. A common solution to the inference problem is the extended Kalman filter EKF [11. In the following example, we show that the measurement update for EKF is a special case of the proposed algorithm. Example 6. Consider the latent variable x with a priori PDF px = N x; µ, Σ and the measurement y with the lielihood function py x = N y; cx, R. The lielihood function can be written in exponential family form as in py x = π d exp ηx, R T y Aηx, R 40 T y = y, yy T, ηx, R = R 1 cx, 1 R 1 and Thus, Aηx, R = 1 Tr R 1 cxcx T + 1 log R. 41 py x exp R 1 cx y 1 cxt R 1 cx and the log-lielihood function can be written as 4 Lx + = y T R 1 cx 1 cxt R 1 cx 43 + = means equality up to an additive constant with respect to the latent variable x. The second order Taylor series approximation of any function will be linear in the sufficient statistic T x = x, xx T. However, such an approximation does not guarantee the integrability of the approximate lielihood and the corresponding approximate posterior due to the dependence of second element of the posterior s natural parameter on the measurement y. A common solution to the log-lielihood linearization problem has been the first order Taylor series approximation of cx about x = µ instead of second order Taylor series approximation of Lx, i.e., cx cµ + c µx µ 44 c µ x cx x=µ. 45 With this approximation the approximate log-lielihood becomes Lx + c µr 1 y T cµ c µ T µ x 1 c µr 1 c µ T xx T. 46 Hence, a conjugate lielihood function for prior distribution is obtained by approximating the log-lielihood function by a function that is linear in sufficient statistic of the prior distribution. Also note that the posterior distribution obeys φ = px y exp φ T x 47 c µr 1 y T cµ c µ T µ + Σ 1 µ, 1 c µr 1 c µ T 1 Σ 1 48 which is the extended information filter form for the extended Kalman filter [14, Sec An advantage of this solution over Taylor series approximation of the log-lielihood is that the natural parameter of the posterior will be in natural parameter space by construction i.e., 1 c µr 1 c µ T + 1 Σ 1 0. C. A general linearization guideline The approximation of a general scalar function L with another function L linear in a general vector valued function t i.e., L λ t can be expressed as an optimization problem various cost functions are suggested and minimized. Study of such procedures is considered outside the scope of this paper and a subject of future research. However,

6 6 a few trics are introduced here which can be used for the purpose of linearization. Lemma 1. Let Lx be an arbitrary scalar-valued continuous and differentiable function and tx be a vector-valued invertible function with independent continuous elements such that x = t 1 z and Lt 1 z is differentiable with respect to z. Then, Lx can be linearized with respect to tx about t x such that Lx L x + Φ tx t x 49 and Φ = z Lt 1 z z=t x. Proof. Let z = tx and Qz Lt 1 z. Then, x = t 1 z and Qz = Lx for z = tx. Using the first-order Taylor series approximation of the scalar function Qz about ẑ t x we obtain Qz Qẑ + z Qz z=ẑ z ẑ. 50 Hence, the proof follows. Remar. The variable z and the function tx can be scalarvalued, vector-valued or even matrix-valued. Further, z Qz can be computed using the chain rule and the fact that Qz = L t 1 z. The elements of sufficient statistic of most continuous members of the exponential family are dependent, such as x and xx T for normal distribution. Furthermore, some of them are not invertible such as log X for Wishart distribution. However, the linearization method given in Lemma 1 can be applied to individual summand terms of the log-lielihood function and individual elements of the sufficient statistic. Due to the freedom in choosing the element of the sufficient statistic to linearize the log-lielihood with respect to, there is no unique solution for the linearization problem. In Example 7 we will use Lemma 1 and linearize a log-lielihood function in various ways and describe their properties. Example 7. Consider the scalar latent variable x with a priori PDF px = IGammax; α, β and the measurement y with the lielihood function py x = N y; 0, x + σ. Since the exact posterior density is not analytically tractable, an approximation is needed. The prior distribution can be written in the exponential family form the natural parameter is η = α 1, β and the sufficient statistic is T x = log x, x 1. The logarithm of the prior distribution is given by log px α + 1 log x βx In order to compute the posterior using the proposed linearization approach, first the log-lielihood function is computed Lx, y = + logx + σ + y x + σ. 5 The linearization of the log-lielihood with respect to the sufficient statistic of the prior distribution can be carried out in several ways, four of which are listed in the following; Solution 1: We can linearize the right-hand-side of 5 with respect to log x using Lemma 1 by letting z log x and subsequent linearization with respect to z about ẑ log x which gives Lx, y + x x + σ + y x x + σ log x, 53 + means approximation up to an additive constant with respect to all sufficient statistics i.e., the elements of T x. The approximate lielihood can be found as p 1 y x x x x+σ x+σ y 54 which is not integrable with respect to y since x > 0 and p 1 y x goes to infinity as y goes to infinity. Solution : We can linearize the right-hand-side of 5 with respect to x 1 by letting z x 1 and subsequent linearization with respect to z about ẑ x 1 using Lemma 1 which gives Lx, y + x x + σ + y x x + σ x The approximate lielihood can be found as x p y x exp x + σ x y x σ, 56 which is integrable with respect to y. However, the approximate posterior obtains the form px y exp x y x σ α + 1 log x x + σ + β x 1, 57 which is not integrable with respect to x for a small y such that x y x σ x+σ + β < 0. Solution 3: We can linearize the first term on the righthand-side of 5 with respect to log x and the second term with respect to x 1 about log x and x 1, respectively, using Lemma 1 which gives Lx, y + x y x log x + x + σ x + σ x The approximate lielihood can be found as p 3 y x x x y x+σ exp x x x + σ, 59 which is integrable with respect to y. The approximate posterior remains integrable with respect to x for x > 0. Solution 4: We can first expand the log-lielihood as in Lx, y = + log x + log 1 + σ + y x x + σ 60 and then linearize the last two terms on the right hand side of 60 with respect to z x 1 about the nominal point ẑ = x 1 using Lemma 1 which gives Lx, y + log x + σ x σ + x + y x σ + x x The approximate lielihood can be found as p 4 y x x 1 exp 1 σ x x σ + x + y x σ + x 6 which is integrable with respect to y. The posterior remains integrable with respect to x for x > 0. Integrability of the approximate posterior is not guaranteed for the first two solutions, while the last two solutions will result in an inverse gamma approximate posterior distribution.

7 7 The parameters of the posterior depends on the linearization point x among other factors. A candidate point for linearization is the prior mean x = α β. As pointed out in Theorem 1 and illustrated in Example 7, linearity of the log-lielihood with respect to sufficient statistic of the prior is a necessary condition for conjugacy. Hence, even after succeeding in linearization of the log-lielihood with respect to T x, the integrability of the posterior distribution should be examined. Consequently, the linearization should be done sillfully such that the approximation error is minimized and the approximate posterior remains integrable. In the rest of this paper we will focus on exemplifying the application of the linearization technique in a specific Bayesian inference problem in Section IV and the related numerical simulations in Section V. In this specific example, two of the sufficient statistics is not invertible as it is required in Lemma 1. Hence, a remedy is suggested and used in the example. IV. EXTENDED TARGET TRACKING A. The problem formulation In the Bayesian extended target tracing ETT framewor [15 the state of an extended target consists of a inematic state and a symmetric positive definite matrix representing the extent state. Suppose that the extended target s inematic state and the target s extent state are denoted by x and X, respectively. Further, we assume that the measurements Y {y j Rd } m generated by the extended target are independent and identically distributed conditioned on x and X as y j N yj ; Hx, sx + R, see Fig. 3. The following measurement lielihood which will be used in this paper was first introduced in [16 m py x, X = N y j ; Hx, sx + R, 63 m is the number of measurements at time. s is a real positive scalar constant. R is the measurement noise covariance. We assume the following prior form on the inematic state and target extent state. px, X Y 0: 1 = N x ; x 1, P 1 IWX ; ν 1, V 1 64 x R n, X R d d >0 and the double time index a b is read at time a using measurements up to and including time b. The inverse Wishart distribution IWX; ν, V we use in this wor is given in the following form. IWX; ν, V V 1 ν d 1 exp tr 1 [ V X 1 1 ν d 1d Γ 1 d ν d 1 X ν 65 X is a symmetric positive definite matrix of dimension d d, ν > d is the scalar degrees of freedom and V is a symmetric positive definite matrix of dimension d d and is called the scale matrix. This form of the inverse Wishart distribution is the same one used in the well-nown reference [17. No analytical solution for the posterior exists. The reason for the lac of an analytical solution for the posterior corresponding to the lielihood 63 and the prior 64 is both the x 1 X 1 N y 1 m 1 > 0 x X N y m > 0 x T X T N y T m T > 0 Figure 3: A probabilistic graphical model for extended targets with inematic state x, extent state X and measurements y. covariance addition in the Gaussian distributions on the right hand side of 63 and the intertwined nature of the inematic state and extent state in the lielihood. In order to get rid of the covariance addition, latent variables are defined in [18 and the problem of intertwined inematic and extent states are solved using variational approximation. In [16 the authors design an unbiased estimator. Both of these measurement updates can be used in a recursive estimation framewor which requires the posterior to be in the same form as the prior as in px, X Y 0: N x ; x, P IWX ; ν, V. 66 B. Solution proposed by Feldmann et al. [16 In [16 the authors cleverly design a measurement update based on unbiasedness properties for the measurement model 63 to calculate x, P, ν and V in the posterior 66. The inematic state x is updated as follows: and x = x 1 + K ȳ Hx 1 67 P = P 1 K S K T 68 K = P 1 H T S 1 69 S = HP 1 H T + 1 m sx 1 + R 70 X 1 = V 1 ν 1 d ȳ 71 1 m y j 7 m is the mean of the measurements at time. The extent state updates given in [16 is in the following form ν =ν 1 + m 73 V =V 1 + M FFK 74 M FFK is a given positive definite update matrix and the superscript FFK which is used to ease presentation consists of the authors initials in [16. Before describing the construction of the matrix M FFK of [16 let us define the measurement spread Y from the predicted measurement Hx 1 as follows. Y 1 m y j m Hx 1y j Hx 1 T 75

8 8 Feldmann et al. first write the measurement spread Y as Y = Y 1 + Y. 76 The summands Y 1 and Y are defined as Y 1 ȳ Hx 1 ȳ Hx 1 T, 77 Y 1 m y j m y j ȳ T. 78 When we tae the conditional expected values of Y 1 and Y we obtain Y 1 E [ Y 1 X = X 1 The matrix M FFK = HP 1 H T + sx 1 + R, 79 m Y E [ Y X = X 1, = m 1 m sx 1 + R. 80 is then proposed to be M FFK = X 1/ 1 Y 1 1/ Y 1 Y 1 1/ X 1/ 1 + m 1X 1/ 1 Y 1/ Y Y 1/ X 1/ The conditional expected value of M FFK is E [ M FFK X = X 1 = m X 1, 8 which results in an unbiased estimator. C. ETT via log-lielihood linearization A new measurement update is obtained by performing linearization of the logarithm of the lielihood function 63 with respect to sufficient statistic of the prior distribution 64. The sufficient statistics of the prior distribution 64 are given as follows. T x, X = x, x x T, X 1, log X 83 As seen above the second and the fourth elements of T are not invertible. In the following lemma, using log-lielihood linearization LLL via a first order Taylor series expansion, we approximate the lielihood function 63 to obtain a conjugate lielihood with respect to the prior 64 with sufficient statistics given in 83. Lemma. The lielihood m N yj ; Hx, sx + R can be approximated up to a multiplicative constant by a first order Taylor series approximation around the nominal points X for variable X and ˆx for variable x as m N y j ; Hx, sx + R m N y j ; Hx, s X + R IW X; m, M 84 M =m X + ms Xs X + R 1 [ Y s X + R s X + R 1 85 X. i.e., constant with respect to the variables X and x. Proof. Proof is given in Appendix A for the sae of clarity. Lemma states that the lielihood 63 can be approximately factorized into two independent lielihood terms corresponding to inematic and extent states. This type of factorization gives independent measurement updates for the inematic state and the extent state. For this purpose, one can V 1 ν 1 d set X = and ˆx = x 1 in order to find the factors of Lemma. Using the conjugate lielihood factors in 84, the posterior density px, X Y 0: is given in the form of 66 with the update parameters given below. x = P P 1 1 x 1 + m H T sx 1 + R 1 ȳ, 86 1 P = P m H T sx 1 + R H 1, 87 V 1 X 1 = ν 1 d, 88 ȳ = 1 m y j m. 89 Note that the inematic updates are the same as the inematic updates proposed in [16. The extent state is updated as given in M LLL ν =ν 1 + m 90 V =V 1 + M LLL 91 = m X 1 + m sx 1 sx 1 + R 1 [ Y sx 1 + R sx 1 + R 1 X 1. 9 This form of the update suggests that the matrix Y serves as a pseudo-measurement for the quantity sx + R. If Y is larger than the current predicted estimate of sx + R i.e., sx 1 + R then a positive matrix quantity is added to the statistics V 1 and vice versa. We here note that E [ Y X = X 1 = HP 1 H T + sx 1 + R 93 the expectation is taen over all the measurements at time and the estimate x 1. Hence we conclude that expected value of the second term on the right hand side of 85 is always positive which maes the current update biased. Another drawbac of the update is that the uncertainty of the predicted estimate x 1 does not affect the update. In order to solve both problems we modify the quantity M LLL as follows. M ULL m X 1 + m sx 1 HP 1 H T + sx 1 + R 1 [ Y HP 1 H T + sx 1 + R HP 1 H T + sx 1 + R 1 X Note that in the current update the difference Y HP 1 H T + sx 1 + R 95 has zero conditional mean which maes the update unbiased and the terms HP 1 H T + sx 1 + R 1 decrease

NUMERICAL SIMULATION In this section the newly proposed measurement update ULL, is compared with the update based on variational Bayes [18 and update proposed by Feldmann et al.

9 9 with increasing uncertainty in the predicted estimate x 1 which maes the update term smaller. In the following we will call the proposed unbiased measurement update based on linearization of the lielihood in the log domain as ULL for ease of presentation. V. NUMERICAL SIMULATION In this section the newly proposed measurement update ULL, is compared with the update based on variational Bayes [18 and update proposed by Feldmann et al. in [16 which will be referred to as VB and FFK, respectively. In section V-A, we will use a simulation scenario which is partly based on the simulation scenario described in [18, Section IV which in turn is based on [19, Section 6.1. In section V-B an ETT scenario based on the numerical simulation described in [16, VI-B will be presented. A. Monte-Carlo simulations A planar extended target in the two dimensional space whose true inematic state x 0 and the extent state X0 are x 0 = [0 m, 0 m, 100 m/s, 100 m/s T X 0 = E Diag[300 m, 00 m E T 96a 96b is considered. Here, E [e 1, e is a matrix whose columns e 1 and e are the normalized eigenvectors of X 0 which are e 1 1 [1, 1 T and e 1 [1, 1 T. For the extended target with these true parameters, we conduct a Monte Carlo MC simulations to quantify the differences between the measurement updates FFK, ULL and VB. For each MC run we assume that the initial predicted target density for all three methods is the same and has the following structure: px, X Y 0: 1 = N x ; x j 1, P 1IWX ; ν j 1, V j 1 97 superscript j indicates the jth MC run. The quantities x j 1, P 1 for the inematic state are selected as x j 1 N ; x 0, P 1 98a α P 1 = Diag [50, 50, 10, 10 98b the scalar α is the scaling parameter for the covariance of the density in 98a to adjust the distribution of x j 1 in the MC simulation study. The quantities ν j 1 and V j 1 are selected as ν j 1 = max7, νj Poisson100 V j 1 ν j 1 d νj 1 W ; δ, X0 δ 99a 99b the functions Poissonλ and W ; n, Ψ represent the Poisson density with expected value λ and Wishart density with degrees of freedom n and scale matrix Ψ. The scalar δ controls the variance of the Wishart density in this study. For Figure 4: Extent estimation error for FFK, VB and ULL with respect to the optimal Bayesian solution. Solid lines show the mean error E X and shaded areas of respective colors illustrate the interval between the fifth and the ninety fifth percentiles. the j th MC run, m j measurements are generated according to m j = max, mj Poisson10, 100a y j,i N ; Hx0, sx 0 + R 100b s = 0.5, R = 100 I m and H = [I, 0. Matrices I and 0 are the identity matrix and zero matrix of size, respectively. The performance of the measurement updates are compared for various levels of accuracy of the initial predicted target density in view of the optimal Bayesian solution. To this end, the optimal solution is numerically computed via importance sampling using the predicted density as the proposal density. Total of 10 5 samples are generated from the predicted density and the normalized importance weights are calculated from the lielihood function given in 63. Using the importance weights, the posterior expected values of the inematic state x and extent state X are calculated. The resulting expected values are referred to as x opt and Xopt respectively. To change the level of accuracy for the predicted target density, the parameters α are selected on a linear grid of 40 values between 1 and 50 while values of δ were selected on a logarithmic grid of 40 values between and For each pair of α and δ out of the 40 pairs, N MC = 1000 runs have been made by selecting the parameters x j 1, P 1, ν j 1, V j 1 and Yj as above and the estimates xu,j and X U,j for U {FFK, VB, ULL} are calculated3. Using the results of the MC runs, the average error metrics E U x and EX U for U {FFK, VB, ULL} for the inematic state and the extent state, respectively, are calculated as follows: 1 Ex U 1 N MC Hx U,j dn xopt,j, 101 MC E U X 1 d N MC N MC trx U,j Xopt,j In each MC run 0 iterations were performed for the VB solution, although the VB solution would usually converge within the first 5 iterates.

10 1 0 y-coordinate [m 1 3 4 Figure 5: Kinematic state estimation error for FFK, VB and ULL with respect to the optimal Bayesian solution.

Since FFK and ULL use identical inematic state update for identical X 1 their respective graphs coincide. 0 1 3 4 5 6 7 8 9 10 x-coordinate [m Figure 6: Trajectory of an extended target.

10 y-coordinate [m Figure 5: Kinematic state estimation error for FFK, VB and ULL with respect to the optimal Bayesian solution. Solid lines show the mean error E x and shaded areas of respective colors illustrate the interval between the fifth and the ninety fifth percentiles. Since FFK and ULL use identical inematic state update for identical X 1 their respective graphs coincide x-coordinate [m Figure 6: Trajectory of an extended target. For every third scan, the true extended target and its center are shown. Table III: Comparison of measurement updates for ETT with respect to cycle time and estimation errors. The data are obtained from the single ETT simulation scenario. Since the VB update diverged several time, the errors exceeding 4m are set to 4m to mae the quantitative comparison meaningful. E x[m E X [m Cycle times [s Update Mean ± St.Dev. Mean ± St.Dev. Mean± St.Dev. FFK ± ± ± VB ± ± ± ULL ± ± ± The errors are illustrated in Fig. 4 and Fig. 5. Since FFK and ULL use identical inematic state update, their respective graphs coincide. VB has the lowest extent state estimation error. ULL has a lower estimation compared to FFK in the simulations scenario presented here. When the simulation is repeated with R = 50 I FFK performs better than ULL while VB maintains its superior estimation performance in terms of E X. B. Single extended target tracing scenario The performance of measurement update is evaluated in a single extended target tracing scenario an extended target follows a trajectory in a two dimensional space illustrated in Fig.6. In this simulation scenario, an elliptical extended target with diameters 340 and 80 meters moves with a constant velocity of about 50 m/h. The extended target s true initial inematic state x 0 1 and true initial extent state X 0 1 are as follows: x 0 1 = [0m, 0m, 9.8m/s, 9.8m/s T X 0 1 = E Diag[170 m, 400 m E T 103a 103b E is the matrix whose columns are the normalized eigenvectors of X 0 1 which are e 1 1 [ 1, 1 T and e 1 [1, 1 T, i.e., E [e 1, e. We conduct MC study to quantify the difference between different measurement updates in a single extended target tracing scenario. The measurement set Y j = {yj,i }mj i=1 for the Figure 7: Histogram of the root mean square error of the inematic state Ex U,j for U {FFK, VB, ULL} in a single ETT scenario. j th MC run is generated according to 100 s = 0.5, R = 0 I m and H = [I, 0 and In the filters, the inematic state vector consists of the position and the velocity x = [p x, p y, ṗ x, ṗ y T, which evolves according to the discrete-time constant velocity model in two dimensional Cartesian coordinates [0 px +1 x = N x +1 ; A x, Q and A = [ I τi, Q 0 I = σν [ τ 4 4 I τ 3 I τ 3 I τ I τ = 10s, σ v = 0.1m/s. In the filters the initial prior density for the inematic state and extent state are px 1, X 1 = N x 0 ; x j 1 0, P 1 0IWX 0 ; ν j 1 0, V j x j 1 0 N ; x0 0, P 0 α 0 P 1 0 = Diag[50, 50, 10, 10 ν j 1 0 = max7, νj Poisson10 V j 1 0 ν j 1 0 d νj 1 0 W ; δ 0, X0 0 δ 0, 105a 105b 105c 105d

11 11 Figure 8: Histogram of the average extent state estimation error for U {FFK, VB, ULL} in a single ETT scenario. E U,j X and α 0 = 10 and δ 0 = 5. 1 Time update: The expression for the time update of the target s inematic state in the filter follows the standard Kalman filter time update equations, x j +1 = A x j 106 P j +1 = A P j AT + Q. 107 The extent state is assumed to be slowly varying. Hence, exponential forgetting strategy [15, [1, [ can be used to account for possible changes in the parameters in time. In [3 it is shown that using the exponential forgetting factor will produce maximum entropy distribution in the time update for the processes which are slowly varying with unnown dynamics but bounded by a Kullbac-Leibler divergence KLD constraint. Forgetting factor is applied to the parameters of the inverse Wishart distribution as follows. ν j +1 = exp τ/τ 0ν j, 108 V j +1 = νj +1 d ν j d V j. 109 The exponential decay time constant was selected τ 0 = 15. Evaluation of filters: MC runs are performed the parameters x j 1 0, P 1 0, ν j 1 0, V j 1 0 and Yj for are selected as above. Using the results of the MC runs, the average error metrics Ex U,j and E U,j X for U {FFK, VB, ULL} and j th MC run for the target s inematic state and the extent state, respectively, are calculated as follows: E U,j x E U,j X 1 K Hx M,j dk x =1 1 K d trx M,j K X = , 110a. 110b The errors mean± standard deviation and the cycle times mean± standard deviation for evaluated measurement update rules are given in Table III. The histograms of distribution of the errors defined in 110 are given in Fig. 7 and Fig.8. The estimation errors of FFK and ULL are comparable but the proposed update ULL, is less computationally expensive. VI. CONCLUSION A Bayesian inference technique based on Taylor series approximation of the logarithm of the lielihood function is presented. The proposed approximation is devised for the case the prior distribution belongs to the exponential family of distribution and is continuous. The logarithm of the lielihood function is linearized with respect to the sufficient statistic of the prior distribution in exponential family such that the posterior obtains the same exponential family form as the prior. Taylor series approximation is used for linearization with respect to the sufficient statistic. However, Taylor series approximation is a local approximation which is used for approximation of the posterior over the entire support. In spite of this inherent weaness, the proposed algorithm performs well in the numerical simulations that are presented here for extended target tracing. Furthermore, there are numerous successful applications of extended Kalman filter, which is a special case of the proposed algorithm, in theory and practice. When a Bayesian posterior estimate is needed with limited computational budget which prohibits the use of Monte-Carlo methods or even variational Bayes approximation as in realtime filtering problems, the proposed algorithm offers a good alternative. The comparison of possible choices for the linearization point and linearization methods with respect to the sufficient statistic are among the future research problems. VII. ACKNOWLEDGMENTS The authors would lie to than Henri Nurminen for proof reading and providing comments on an earlier version of the manuscript. REFERENCES [1 M. Jordan, Learning in Graphical Models. MIT Press, [ M. I. Jordan, Z. Ghahramani, T. S. Jaaola, and L. K. Saul, An introduction to variational methods for graphical models, Mach. Learn., vol. 37, no., pp , Nov [Online. Available: [3 T. Mina, Expectation propagation for approximate Bayesian inference, in Proceedings of the Seventeenth Conference Annual Conference on Uncertainty in Artificial Intelligence UAI-01. San Francisco, CA: Morgan Kaufmann, 001, pp [4 H. Rue, S. Martino, and N. Chopin, Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, Journal of the Royal Statistical Society: Series B Statistical Methodology, vol. 71, no., pp , 009. [5 J. A. Nelder and R. W. M. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society. Series A General, vol. 135, no. 3, pp. pp , 197. [6 W. Hastings, Monte Carlo sampling methods using Marov chains and their applications, Biometria, vol. 57, pp , [7 S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp , [8 M. Wainwright and M. Jordan, Graphical Models, Exponential Families, and Variational Inference, ser. Foundations and trends in machine learning. Now Publishers, 008. [9 T. M. Cover and J. Thomas, Elements of Information Theory. John Wiley and Sons, 006. [10 W. Hennevogl, L. Fahrmeir, and G. Tutz, Multivariate Statistical Modelling Based on Generalized Linear Models, ser. Springer Series in Statistics. Springer New Yor, 001. [11 G. Smith, S. Schmidt, L. McGee, U. S. N. Aeronautics, and S. Administration, Application of Statistical Filter Theory to the Optimal Estimation of Position and Velocity on Board a Circumlunar Vehicle, ser. NASA technical report. National Aeronautics and Space Administration, 196. [1 M. I. Jordan, Lecture notes on conjugacy and exponential family, 014. [13 R. E. Kalman, A new approach to linear filtering and prediction problems, Transactions of the ASME Journal of Basic Engineering, vol. 8, no. Series D, pp , [14 S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, ser. Intelligent robotics and autonomous agents. MIT Press, 005.

12 1 [15 J. W. Koch, Bayesian approach to extended object and cluster tracing using random matrices, IEEE Trans. Aerosp. Electron. Syst., vol. 44, no. 3, pp , Jul [16 M. Feldmann, D. Fränen, and W. Koch, Tracing of extended objects and group targets using random matrices, IEEE Trans. Signal Process., vol. 59, no. 4, pp , Apr [17 A. K. Gupta and D. K. Nagar, Matrix variate distributions. Boca Raton, FL: Chapman & Hall/CRC, 000. [18 U. Orguner, A variational measurement update for extended target tracing with random matrices, IEEE Trans. Signal Process., vol. 60, no. 7, pp , 01. [19 M. Baum, M. Feldmann, D. Fraenen, U. D. Hanebec, and W. Koch, Extended object and group tracing: A comparison of random matrices and random hypersurface models. in GI Jahrestagung, ser. LNI, K.- P. Fähnrich and B. Franczy, Eds., vol GI, 010, pp [0 X. Li and V. Jilov, Survey of maneuvering target tracing. part i. dynamic models, IEEE Trans. Aerosp. Electron. Syst., vol. 39, no. 4, pp , Oct 003. [1 R. Kulhavy and M. B. Zarrop, On a general concept of forgetting, International Journal of Control, vol. 58, no. 4, pp , [ C. M. Carvalho and M. West, Dynamic matrix-variate graphical models, Bayesian Anal, vol., pp , 007. [3 E. Özan, V. Smidl, S. Saha, C. Lundquist, and F. Gustafsson, Marginalized adaptive particle filtering for non-linear models with unnown time-varying noise parameters, to appear in Automatica. A. Proof of Lemma APPENDIX The logarithm of the lielihood m N yj ; Hx, sx + R is given as + log N y j ; Hx, sx + R = + m log sx + R y j Hx T sx + R 1 y j Hx 111 =m log X + m log si + X 1/ RX 1/ + y j Hx T sx + R 1 y j Hx 11 =m log X + m log si + X 1/ RX 1/ + tr [ y j Hxy j Hx T sx + R the sign + = denotes an equality up to an additive constant. We approximate the right hand side of 113 by a function that is linear in the sufficient statistic T x, X = x, xx T, log X, X As aforementioned, the statistics xx T and log X are not invertible. The remedy we suggest here is to mae a first order Taylor series expansion of 113 with respect to the new variables Z X 1, 115a N j y j Hxy j Hx T, j = 1,..., m, 115b around the corresponding nominal values 1 Ẑ X 116a N j y j H ˆxy j H ˆx T, j = 1,..., m. 116b Writing 113 in terms of Z and N = [N 1,..., N m, we obtain log N y j ; Hx, sx + R = + m log Z 1 + m log si + Z 1/ RZ 1/ m + tr [ N j sz 1 + R If we mae a first order Taylor series expansion for the second and the third terms on the right hand side of 117 with respect to the variables Z and N around Ẑ and N, we obtain log N y j ; Hx, sx + R + m log Z 1 + m tr + + s sr 1 + Ẑ 1 Z tr [N j sẑ 1 + R 1 [Ẑ 1 tr sẑ 1 + R 1 N j sẑ 1 + R 1 Ẑ 1 Z represents an approximation up to an additive constant. The first order Taylor series approximations for the scalar valued functions of matrix variables of interest are given in Appendix B. We now substitute bac the relations 115 and 116 into 118 to obtain the approximation logn y j ; Hx, sx + R + m log X + m tr sr 1 + X 1 1 X 1 + tr [y j Hxy j Hx T s X + R 1 + s [ tr Xs X + R 1 y j H ˆxy j H ˆx T s X + R 1 1 XX 119 =m log X + m tr sr 1 + X 1 1 X 1 + y j Hx T s X + R 1 y j Hx + s tr Xs X + R 1 y j H ˆxy j H ˆx T s X + R 1 XX 1 10 Rearranging the terms in 10, dividing both sides by and then taing the exponential of both sides, we can see that m N y j ; Hx, sx + R m N y j ; Hx, s X + R IW X; m, M 11

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracing Karl Granström Division of Automatic Control Department of Electrical Engineering Linöping University, SE-58 83, Linöping,