Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family

Size: px
Start display at page:

Download "Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family"

Transcription

1 1 Bayesian Inference via Approximation of Log-lielihood for Priors in Exponential Family Tohid Ardeshiri, Umut Orguner, and Fredri Gustafsson arxiv: v1 [cs.lg 5 Oct 015 Abstract In this paper, a Bayesian inference technique based on Taylor series approximation of the logarithm of the lielihood function is presented. The proposed approximation is devised for the case, the prior distribution belongs to the exponential family of distributions. The logarithm of the lielihood function is linearized with respect to the sufficient statistic of the prior distribution in exponential family such that the posterior obtains the same exponential family form as the prior. Similarities between the proposed method and the extended Kalman filter for nonlinear filtering are illustrated. Furthermore, an extended target measurement update for target models the target extent is represented by a random matrix having an inverse Wishart distribution is derived. The approximate update covers the important case the spread of measurement is due to the target extent as well as the measurement noise in the sensor. Index Terms Approximate Bayesian inference, exponential family, Bayesian graphical models, extended Kalman filter, extended target tracing, group target tracing, random matrices, inverse Wishart. I. INTRODUCTION Determination of the posterior distribution of a latent variable x given the measurements observed data y is at the core of Bayesian inference using probabilistic models. These probabilistic models describe the relation between the random latent variables, the deterministic parameters, and the measurements. Such relations are specified by prior distributions of the latent variables px, and the lielihood function py x which give a probabilistic description of the measurements given some of the latent variables. Using the probabilistic model and measurements the exact posterior can be expressed in a functional form using the Bayes rule px y = pxpy x pxpy x dx. 1 The exact posterior distribution can be analytical. A subclass of cases the posterior is analytical is when the posterior belongs to the same family of distributions as the prior distribution. In such cases, the prior distribution is called a conjugate prior for the lielihood function. A wellnown example an analytical posterior is obtained using conjugate priors is when the latent variable is a priori normaldistributed and the lielihood function given the latent variable as its mean is again normal. The conjugacy is especially useful when measurements are processed sequentially as they appear in the filtering tas for stochastic dynamical systems nown as hidden Marov models Tohid Ardeshiri and Fredri Gustafsson are with the Department of Electrical Engineering, Linöping University, Linöping, Sweden, tohid@isy.liu.se, fredri@isy.liu.se. The authors gratefully acnowledge funding from the Swedish Research Council VR for the project Scalable Kalman Filters. Umut Orguner is with Department of Electrical and Electronics Engineering, Middle East Technical University, Anara Turey, umut@metu.edu.tr. x 1 y 1 x y x 3 Figure 1: A probabilistic graphical model for stochastic dynamical system with latent state x and measurements y. HMMs whose probabilistic graphical model is presented in Fig. 1. In such filtering problems, the posterior to the last processed measurement is in the same form as the prior distribution before the measurement update. Thus, the same inference algorithm can be used in a recursive manner. The exact posterior distribution of a latent variable can not always be given a compact analytical expression since the integral in the denominator of 1 may not be available in analytical form. Consequently, the number of parameters needed to express the posterior distribution will increase every time the Bayes rule 1 is used for inference. Several methods for approximate inference over probabilistic models are proposed in the literature such as variational Bayes [1, [, expectation propagation [3, integrated nested Laplace approximation INLA [4, generalized linear models GLMs [5 and, Monte-Carlo MC sampling methods [6, [7. Variational Bayes VB and expectation propagation EP are two analytical optimization-based solutions for the approximate Bayesian inference [8. In these two approaches, Kullbac-Leibler divergence [9 between the true posterior distribution and an approximate posterior is minimized. INLA is a technique to perform approximate Bayesian inference in latent Gaussian models [10 using the Laplace approximation. GLMs are an extension of ordinary linear regression when errors belong to the exponential family. Sampling methods such as particle filters and Marov Chain Monte Carlo MCMC methods provide a general class of numerical solutions to the approximate Bayesian inference problem. However, in this paper, our focus is on fast analytical approximations which are applicable to large-scale inference problems. The proposed solution is built on properties of the exponential family of distributions and earlier wor on extended Kalman filter [11. In this paper, a Bayesian inference technique based on Taylor series approximation of the logarithm of the lielihood function is presented. The proposed approximation is derived for the case the prior distribution belongs to the exponential family of distributions. The rest of this paper is organized as follows; In Section II an introduction to exponential family of distribution is provided. In Section III a general algorithm for approximate inference in the exponential family of distributions is suggested. We show how the y 3 x T y T

2 proposed technique is related to the extended Kalman filter for nonlinear filtering in Section III-B. A novel measurement update for tracing extended targets using random matrices is given in Section IV. The new measurement update for tracing extended targets is evaluated in numerical simulations in Section V. The concluding remars are given in Section VI. II. THE EXPONENTIAL FAMILY The exponential family of distributions [8 include many common distributions such as Gaussian, beta, gamma and Wishart. For x X the exponential family in its natural form can be represented by px; η = hx expη T x Aη, η is the natural parameter, T x is the sufficient statistic, Aη is log-partition function and hx is the base measure. η and T x may be vector-valued. Here a b denotes the inner product of a and b 1. The log-partition function is defined by the integral Aη log hx expη T x dx. 3 X Also, η Ω = {η R m Aη < + } Ω is the natural parameter space. Moreover, Ω is a convex set and A is a convex function on Ω. When a probability density function PDF in the exponential family is parametrized by a non-canonical parameter θ, the PDF in can be written as px; θ = hx expηθ T x Aηθ, 4 θ is the non-canonical parameter vector of the PDF. Example 1. Consider the normal distribution px = N x; µ, Σ with mean µ R d and covariance matrix Σ. Its PDF can be written in exponential family form given in hx = π d, 5a [ x T x = vecxx T, 5b [ Σ ηµ, Σ = 1 µ vec 1, 5c Σ 1 Aηµ, Σ = 1 µt Σ 1 µ + 1 log Σ. 5d Remar 1. In the rest of this document, the vec operators will be dropped to eep the notation less cluttered. That is, we will use the simplified notation T x = x, xx T instead of 5b and ηµ, Σ = Σ 1 µ, 1 Σ 1 instead of 5c. A. Conjugate Priors in Exponential Family In this section we study the conjugate prior family for a lielihood function belonging to the exponential family. Consider m independent and identically distributed IID measurements Y {y j R d 1 j m} and the lielihood function 1 In the rest of this paper we may use a T b in place of a b when a and b are vector valued. When a and b are matrix valued we may equivalently write Tra T b instead of veca vecb Tr is the trace operator and the operator vec vectorizes its argument lays it out in a vector. py x, λ in exponential family, λ is a set of hyperparameters of the lielihood and x is the latent variable. The lielihood of the m IID measurements can be written as m py x, λ = hy j exp ηx, λ T y j maηx, λ. 6 We see a conjugate prior distribution for x denoted by px. The prior distribution px is a conjugate prior if the posterior distribution px Y py x, λpx 7 is also in the same exponential family as the prior. Let us assume that the prior is in the form for some function F. Hence, px expfx 8 px Y py x, λ expfx 9 exp ηx, λ T y j maηx, λ + Fx. 10 For the posterior 10 to be in the same exponential family as the prior [1, F needs to be in the form Fx = ρ 1 ηx, λ ρ 0 Aηx, λ 11 for some ρ ρ 0, ρ 1, such that px Y exp ρ 1 + T y j ηx, λ m + ρ 0 Aηx, λ. 1 Hence, the conjugate prior for the lielihood 6 is parametrized by ρ and is given by px; ρ = 1 Z exp ρ 1 ηx, λ ρ 0 Aηx, λ, 13 Z = exp ρ 1 ηx, λ ρ 0 Aηx, λ dx. 14 In Example, the conjugate prior for the normal lielihood is derived using the exponential family form for the normal distribution given in Example 1. In Example 3, a conjugate prior for a more complicated lielihood function is derived. Example. Consider the measurement y with the lielihood py x = N y; Cx, R. The lielihood can be written in exponential family form as in py x = π d exp ηcx, R T y AηCx, R 15 T y = y, yy T, ηcx, R = R 1 Cx, 1 R 1 and AηCx, R = 1 xt C T R 1 Cx + 1 log R 16a = 1 C T R 1 C xx T + 1 log R. 16b

3 3 Therefore, the lielihood function can be written in the form py x exp C T R 1 y, 1 CT R 1 C T x. 17 Hence, the conjugate prior for the lielihood function of this example should have T as its sufficient statistic i.e., the conjugate prior is normal. When the prior distribution is normal i.e., px = N x; µ, Σ the posterior distribution obeys px y exp C T R 1 y + Σ 1 µ, 1 CT R 1 C 1 Σ 1 T x 18 which is the information filter form for the Kalman filter s measurement update [13. Example 3. Consider the measurement y with the lielihood py x exp 1 4 y x + cosy x. 19 The lielihood function is a multi-modal lielihood function and is illustrated in Fig. for y = 3. The lielihood can be written in the form py x expλ T x 0 y 1 cos y sin y T and T x = λ = [ 1 [ 4 x x cos x sin x T. Hence, a conjugate prior for the lielihood function of this example should have T as its sufficient statistic such as px exp [ T x. Thus, the posterior for the lielihood function and the prior distribution obeys px y exp [ y 1 cos y sin y T T x. 1 Although such posteriors can be computed analytically up to a normalization factor, the density may not be available in analytical form. The numerically normalized prior and the posterior are illustrated in Fig.. prior lielihood posterior x Figure : The lielihood function 19 prior distribution 0 and the posterior distribution 1 in Example 3 are plotted. Similar reasoning as the one presented in this section can be applied to the case the prior is fixed and the lielihood function is to be selected; The concept of conjugate lielihoods will be considered in the sequel. Table I: Some continuous exponential family distributions and their sufficient statistic are listed. Continuous Exp. Family Distribution T Exponential distribution x Normal distribution with nown variance σ x/σ Normal distribution x, xx T Pareto distribution with nown minimum x m log x Weibull distribution with nown shape x Chi-squared distribution log x Dirichlet distribution log x 1,, log x n Laplace distribution with nown mean µ x µ Inverse Gaussian distribution x, 1/x Scaled inverse Chi-squared distribution log x, 1/x Beta distribution log x, log1 x Lognormal distribution log x, log x Gamma distribution log x, x Inverse gamma distribution log x, 1/x Gaussian Gamma distribution log τ, τ, τx, τx Wishart distribution log X, X Inverse Wishart distribution log X, X 1 B. Conjugate Lielihoods in Exponential Family First, we define the conjugate lielihood functions; Definition 1. A lielihood function py x is called conjugate lielihood for prior distribution px in a family of distributions, if the posterior distribution px y belongs to the same family of distributions as the prior distribution. Now, consider the prior distribution px; ρ in the exponential family form on the latent variable x X, ρ is a set of hyper-parameters for the prior px; ρ = hx exp ηρ T x Aηρ. We will follow the treatment given for characterization of the conjugate priors in Section II-A as well as [1, for the conjugate lielihood functions. Assume now that we have observed m IID measurements Y {y j R d 1 j m}. Let the lielihood of the measurements be written as py x exp Lx, Y 3 for some function L. We see a conjugate lielihood function in the exponential family. Hence, the posterior distribution has to be in the exponential family, px Y hx exp ηρ T x Aηρ + Lx, Y. 4 For the posterior 4 to be in the same exponential family, L needs to be in the form Lx, y + = λy T x 5 + = means equality up to an additive constant with respect to latent variable x such that px Y hx exp ηρ + λy T x. 6 Hence, the conjugate lielihood for the prior distribution family is parametrized by λy and is given by py x exp λy T x. 7 The property expressed in 7 is the necessary condition which should hold for a conjugate lielihood for a given prior distribution family. Another condition which should hold is that the log-partition function of the posterior 6 should obey A = log hx expηρ + λy T x dx < 8 X

4 4 such that the natural parameter of the posterior belongs to the natural parameter space Ω. These properties of the conjugate lielihood are summarized in theorem 1. Theorem 1. The lielihood function py x is a conjugate lielihood for the prior distribution in an exponential family px = hx expη T x Aη with x X if and only if 1 py x expλ T x for some λ and, the lielihood function py x is integrable with respect to y and, 3 log X hx expη + λ T x dx < for η Ω. In Examples 4 and 5, the conjugate lielihood functions for two prior distributions are derived using the exponential family form. Example 4. Consider the normal distribution px = N x; µ, Σ whose PDF is written in exponential family form in Example 1. The conjugate lielihood function has to be in the form py x exp λ 1 y x + λ y vecxx T 9 for some permissible λ [λ 1, λ, which includes py x = N y; Cx, R for a matrix C with appropriate dimensions and symmetric R 0. Example 5. Consider the prior distribution px exp λ T x 30 T x = [ x x cos x sin x T. The family of conjugate lielihood density functions py x is parametrized by ψ R 4 such that 1 log py x = + ψ T x, py x dy = 1, i.e., the lielihood is integrable with respect to y, 3 expλ + ψ T x dx <, i.e., the posterior is integrable with respect to x. A member of this family is py x exp α y x + cosβy x + γ 31 for [α, β, γ T R 3. In Table I the sufficient statistic for some continuous members of the exponential family are given. Some members of the continuous exponential family have conjugate lielihoods which are summarized in Table II. III. MEASUREMENT UPDATE VIA APPROXIMATION OF THE LOG-LIKELIHOOD In this section a method is proposed for the analytical approximation of complex lielihood functions based on linearization of the log-lielihood with respect to sufficient statistic function T. We will use the property of conjugate lielihood functions i.e., linearity of log-lielihood function with respect to sufficient statistic of the prior distribution to come up with an approximation of the lielihood function for which there exists analytical posterior distribution. In the proposed method first, the log-lielihood function is derived in analytical form. Second, the lielihood function is approximated by a dot product of a statistic which depends on the measurement and the sufficient statistic of the prior distribution. For example, consider a lielihood function py x exp Lx, Y 3 the log-lielihood function is given by L and the prior distribution is given by px = hx exp η T x Aη 33 η Ω. If we approximate L such that Lx, Y Lx, Y + = λy T x 34 for some λ such that for any measurement Y belonging to the support of the lielihood function and, η + λy Ω 35 exp Lx, Y dy <, 36 then the approximate lielihood will be conjugate lielihood for the prior distribution with sufficient statistic T x. When the prior distribution is normal, the proposed approximation can be obtained using well-nown Taylor series approximation at the global maximum of the log-lielihood function. This is due to the fact that T x = x, xx T for the normal prior and a second order Taylor series approximation of the log-lielihood approximates the log-lielihood with a function linear in T x. This mathematical convenience is used in the well-nown approximate Bayesian inference technique INLA [4. Similarly, when the prior distribution is exponential distribution, a first order Taylor series approximation can be used to approximate the log-lielihood with a function linear in T x = x. Taylor series expansion of functions will be used in the proposed approximations in the following. Hence, we establish the notation used in this paper for Taylor series expansion of arbitrary functions before we proceed. A. Taylor series expansion The Taylor series expansion of a real-valued function fx that is infinitely differentiable at a real vector x is given by the series fx =f x + 1 1! xf x x x + 1! xf x x xx x T + 37 x f x is the gradient vector which is composed of the partial derivatives of fx with respect to elements of x, evaluated at x. Similarly, xf x is the Hessian matrix of fx containing second order partial derivatives and evaluated at x. When fx is a vector-valued function, i.e., fx R d with ith element denoted by f i x, first the Taylor series expansion is derived for each element separately. Then, the expansion of fx at x is constructed by rearranging the terms in a matrix. fx =f x + 1 1! xf x T x x + 1! d e i xf i x x xx x T + i=1 38 In 38, x f x is the Jacobian matrix of fx evaluated at x and e i is a unity vector with its ith element equal to 1.

5 5 Table II: Some prior distributions and their conjugate lielihood functions are listed. The arguments of the PDFs on the left columns are the latent random variables. On the right hand side column the conjugate lielihood functions py are given y denotes the measurement. In the middle column the sufficient statistic functions of the prior distributions in the left column are given. The logarithm of the conjugate lielihood functions are linear in sufficient statistic function to their left. Prior distribution with sufficient statistic T T Conjugate lielihood function N x; µ; Σ x, xx T N y; Cx, R exp Tr R 1 y Cxy Cx T N x; µ; Σ x, xx T log N y; x, σ exp 1 σ logy x Gammax; α, β log x, x Expy; x = x exp xy Gammax; α, β log x, x IGammay; α, x x α exp x/y Gammax; α, β log x, x Gammay; α, x x α exp xy Gammax; α, β log x, x N y; µ, x 1 x 1 exp x y µ IGammax; α; β log x, 1/x N y; µ, x x 1 exp 1 x y µ IGammax; α; β log x, 1/x Weibully; x, x y 1 exp y x GaussianGammax, τ; µ, λ, α, β log τ, τ, τx, τx N y; x, τ 1 τ 1 exp 1 τy x WX; n, V log X, X N y; µ, X 1 X d exptrxy µy µ T IWX; ν, Ψ log X, X 1 N y; µ, X X d exptrx 1 y µy µ T B. The extended Kalman filter A well-nown problem a special case of the proposed inference technique is used is the filtering problem for nonlinear state-space models in presence of additive Gaussian noise which will be described here. Consider the discrete-time stochastic nonlinear dynamical system y x N y ; cx, R, x +1 x N x +1 ; fx, Q, 39a 39b x and y are the latent variable and the measurement at time index and c and f are nonlinear functions. A common solution to the inference problem is the extended Kalman filter EKF [11. In the following example, we show that the measurement update for EKF is a special case of the proposed algorithm. Example 6. Consider the latent variable x with a priori PDF px = N x; µ, Σ and the measurement y with the lielihood function py x = N y; cx, R. The lielihood function can be written in exponential family form as in py x = π d exp ηx, R T y Aηx, R 40 T y = y, yy T, ηx, R = R 1 cx, 1 R 1 and Thus, Aηx, R = 1 Tr R 1 cxcx T + 1 log R. 41 py x exp R 1 cx y 1 cxt R 1 cx and the log-lielihood function can be written as 4 Lx + = y T R 1 cx 1 cxt R 1 cx 43 + = means equality up to an additive constant with respect to the latent variable x. The second order Taylor series approximation of any function will be linear in the sufficient statistic T x = x, xx T. However, such an approximation does not guarantee the integrability of the approximate lielihood and the corresponding approximate posterior due to the dependence of second element of the posterior s natural parameter on the measurement y. A common solution to the log-lielihood linearization problem has been the first order Taylor series approximation of cx about x = µ instead of second order Taylor series approximation of Lx, i.e., cx cµ + c µx µ 44 c µ x cx x=µ. 45 With this approximation the approximate log-lielihood becomes Lx + c µr 1 y T cµ c µ T µ x 1 c µr 1 c µ T xx T. 46 Hence, a conjugate lielihood function for prior distribution is obtained by approximating the log-lielihood function by a function that is linear in sufficient statistic of the prior distribution. Also note that the posterior distribution obeys φ = px y exp φ T x 47 c µr 1 y T cµ c µ T µ + Σ 1 µ, 1 c µr 1 c µ T 1 Σ 1 48 which is the extended information filter form for the extended Kalman filter [14, Sec An advantage of this solution over Taylor series approximation of the log-lielihood is that the natural parameter of the posterior will be in natural parameter space by construction i.e., 1 c µr 1 c µ T + 1 Σ 1 0. C. A general linearization guideline The approximation of a general scalar function L with another function L linear in a general vector valued function t i.e., L λ t can be expressed as an optimization problem various cost functions are suggested and minimized. Study of such procedures is considered outside the scope of this paper and a subject of future research. However,

6 6 a few trics are introduced here which can be used for the purpose of linearization. Lemma 1. Let Lx be an arbitrary scalar-valued continuous and differentiable function and tx be a vector-valued invertible function with independent continuous elements such that x = t 1 z and Lt 1 z is differentiable with respect to z. Then, Lx can be linearized with respect to tx about t x such that Lx L x + Φ tx t x 49 and Φ = z Lt 1 z z=t x. Proof. Let z = tx and Qz Lt 1 z. Then, x = t 1 z and Qz = Lx for z = tx. Using the first-order Taylor series approximation of the scalar function Qz about ẑ t x we obtain Qz Qẑ + z Qz z=ẑ z ẑ. 50 Hence, the proof follows. Remar. The variable z and the function tx can be scalarvalued, vector-valued or even matrix-valued. Further, z Qz can be computed using the chain rule and the fact that Qz = L t 1 z. The elements of sufficient statistic of most continuous members of the exponential family are dependent, such as x and xx T for normal distribution. Furthermore, some of them are not invertible such as log X for Wishart distribution. However, the linearization method given in Lemma 1 can be applied to individual summand terms of the log-lielihood function and individual elements of the sufficient statistic. Due to the freedom in choosing the element of the sufficient statistic to linearize the log-lielihood with respect to, there is no unique solution for the linearization problem. In Example 7 we will use Lemma 1 and linearize a log-lielihood function in various ways and describe their properties. Example 7. Consider the scalar latent variable x with a priori PDF px = IGammax; α, β and the measurement y with the lielihood function py x = N y; 0, x + σ. Since the exact posterior density is not analytically tractable, an approximation is needed. The prior distribution can be written in the exponential family form the natural parameter is η = α 1, β and the sufficient statistic is T x = log x, x 1. The logarithm of the prior distribution is given by log px α + 1 log x βx In order to compute the posterior using the proposed linearization approach, first the log-lielihood function is computed Lx, y = + logx + σ + y x + σ. 5 The linearization of the log-lielihood with respect to the sufficient statistic of the prior distribution can be carried out in several ways, four of which are listed in the following; Solution 1: We can linearize the right-hand-side of 5 with respect to log x using Lemma 1 by letting z log x and subsequent linearization with respect to z about ẑ log x which gives Lx, y + x x + σ + y x x + σ log x, 53 + means approximation up to an additive constant with respect to all sufficient statistics i.e., the elements of T x. The approximate lielihood can be found as p 1 y x x x x+σ x+σ y 54 which is not integrable with respect to y since x > 0 and p 1 y x goes to infinity as y goes to infinity. Solution : We can linearize the right-hand-side of 5 with respect to x 1 by letting z x 1 and subsequent linearization with respect to z about ẑ x 1 using Lemma 1 which gives Lx, y + x x + σ + y x x + σ x The approximate lielihood can be found as x p y x exp x + σ x y x σ, 56 which is integrable with respect to y. However, the approximate posterior obtains the form px y exp x y x σ α + 1 log x x + σ + β x 1, 57 which is not integrable with respect to x for a small y such that x y x σ x+σ + β < 0. Solution 3: We can linearize the first term on the righthand-side of 5 with respect to log x and the second term with respect to x 1 about log x and x 1, respectively, using Lemma 1 which gives Lx, y + x y x log x + x + σ x + σ x The approximate lielihood can be found as p 3 y x x x y x+σ exp x x x + σ, 59 which is integrable with respect to y. The approximate posterior remains integrable with respect to x for x > 0. Solution 4: We can first expand the log-lielihood as in Lx, y = + log x + log 1 + σ + y x x + σ 60 and then linearize the last two terms on the right hand side of 60 with respect to z x 1 about the nominal point ẑ = x 1 using Lemma 1 which gives Lx, y + log x + σ x σ + x + y x σ + x x The approximate lielihood can be found as p 4 y x x 1 exp 1 σ x x σ + x + y x σ + x 6 which is integrable with respect to y. The posterior remains integrable with respect to x for x > 0. Integrability of the approximate posterior is not guaranteed for the first two solutions, while the last two solutions will result in an inverse gamma approximate posterior distribution.

7 7 The parameters of the posterior depends on the linearization point x among other factors. A candidate point for linearization is the prior mean x = α β. As pointed out in Theorem 1 and illustrated in Example 7, linearity of the log-lielihood with respect to sufficient statistic of the prior is a necessary condition for conjugacy. Hence, even after succeeding in linearization of the log-lielihood with respect to T x, the integrability of the posterior distribution should be examined. Consequently, the linearization should be done sillfully such that the approximation error is minimized and the approximate posterior remains integrable. In the rest of this paper we will focus on exemplifying the application of the linearization technique in a specific Bayesian inference problem in Section IV and the related numerical simulations in Section V. In this specific example, two of the sufficient statistics is not invertible as it is required in Lemma 1. Hence, a remedy is suggested and used in the example. IV. EXTENDED TARGET TRACKING A. The problem formulation In the Bayesian extended target tracing ETT framewor [15 the state of an extended target consists of a inematic state and a symmetric positive definite matrix representing the extent state. Suppose that the extended target s inematic state and the target s extent state are denoted by x and X, respectively. Further, we assume that the measurements Y {y j Rd } m generated by the extended target are independent and identically distributed conditioned on x and X as y j N yj ; Hx, sx + R, see Fig. 3. The following measurement lielihood which will be used in this paper was first introduced in [16 m py x, X = N y j ; Hx, sx + R, 63 m is the number of measurements at time. s is a real positive scalar constant. R is the measurement noise covariance. We assume the following prior form on the inematic state and target extent state. px, X Y 0: 1 = N x ; x 1, P 1 IWX ; ν 1, V 1 64 x R n, X R d d >0 and the double time index a b is read at time a using measurements up to and including time b. The inverse Wishart distribution IWX; ν, V we use in this wor is given in the following form. IWX; ν, V V 1 ν d 1 exp tr 1 [ V X 1 1 ν d 1d Γ 1 d ν d 1 X ν 65 X is a symmetric positive definite matrix of dimension d d, ν > d is the scalar degrees of freedom and V is a symmetric positive definite matrix of dimension d d and is called the scale matrix. This form of the inverse Wishart distribution is the same one used in the well-nown reference [17. No analytical solution for the posterior exists. The reason for the lac of an analytical solution for the posterior corresponding to the lielihood 63 and the prior 64 is both the x 1 X 1 N y 1 m 1 > 0 x X N y m > 0 x T X T N y T m T > 0 Figure 3: A probabilistic graphical model for extended targets with inematic state x, extent state X and measurements y. covariance addition in the Gaussian distributions on the right hand side of 63 and the intertwined nature of the inematic state and extent state in the lielihood. In order to get rid of the covariance addition, latent variables are defined in [18 and the problem of intertwined inematic and extent states are solved using variational approximation. In [16 the authors design an unbiased estimator. Both of these measurement updates can be used in a recursive estimation framewor which requires the posterior to be in the same form as the prior as in px, X Y 0: N x ; x, P IWX ; ν, V. 66 B. Solution proposed by Feldmann et al. [16 In [16 the authors cleverly design a measurement update based on unbiasedness properties for the measurement model 63 to calculate x, P, ν and V in the posterior 66. The inematic state x is updated as follows: and x = x 1 + K ȳ Hx 1 67 P = P 1 K S K T 68 K = P 1 H T S 1 69 S = HP 1 H T + 1 m sx 1 + R 70 X 1 = V 1 ν 1 d ȳ 71 1 m y j 7 m is the mean of the measurements at time. The extent state updates given in [16 is in the following form ν =ν 1 + m 73 V =V 1 + M FFK 74 M FFK is a given positive definite update matrix and the superscript FFK which is used to ease presentation consists of the authors initials in [16. Before describing the construction of the matrix M FFK of [16 let us define the measurement spread Y from the predicted measurement Hx 1 as follows. Y 1 m y j m Hx 1y j Hx 1 T 75

8 8 Feldmann et al. first write the measurement spread Y as Y = Y 1 + Y. 76 The summands Y 1 and Y are defined as Y 1 ȳ Hx 1 ȳ Hx 1 T, 77 Y 1 m y j m y j ȳ T. 78 When we tae the conditional expected values of Y 1 and Y we obtain Y 1 E [ Y 1 X = X 1 The matrix M FFK = HP 1 H T + sx 1 + R, 79 m Y E [ Y X = X 1, = m 1 m sx 1 + R. 80 is then proposed to be M FFK = X 1/ 1 Y 1 1/ Y 1 Y 1 1/ X 1/ 1 + m 1X 1/ 1 Y 1/ Y Y 1/ X 1/ The conditional expected value of M FFK is E [ M FFK X = X 1 = m X 1, 8 which results in an unbiased estimator. C. ETT via log-lielihood linearization A new measurement update is obtained by performing linearization of the logarithm of the lielihood function 63 with respect to sufficient statistic of the prior distribution 64. The sufficient statistics of the prior distribution 64 are given as follows. T x, X = x, x x T, X 1, log X 83 As seen above the second and the fourth elements of T are not invertible. In the following lemma, using log-lielihood linearization LLL via a first order Taylor series expansion, we approximate the lielihood function 63 to obtain a conjugate lielihood with respect to the prior 64 with sufficient statistics given in 83. Lemma. The lielihood m N yj ; Hx, sx + R can be approximated up to a multiplicative constant by a first order Taylor series approximation around the nominal points X for variable X and ˆx for variable x as m N y j ; Hx, sx + R m N y j ; Hx, s X + R IW X; m, M 84 M =m X + ms Xs X + R 1 [ Y s X + R s X + R 1 85 X. i.e., constant with respect to the variables X and x. Proof. Proof is given in Appendix A for the sae of clarity. Lemma states that the lielihood 63 can be approximately factorized into two independent lielihood terms corresponding to inematic and extent states. This type of factorization gives independent measurement updates for the inematic state and the extent state. For this purpose, one can V 1 ν 1 d set X = and ˆx = x 1 in order to find the factors of Lemma. Using the conjugate lielihood factors in 84, the posterior density px, X Y 0: is given in the form of 66 with the update parameters given below. x = P P 1 1 x 1 + m H T sx 1 + R 1 ȳ, 86 1 P = P m H T sx 1 + R H 1, 87 V 1 X 1 = ν 1 d, 88 ȳ = 1 m y j m. 89 Note that the inematic updates are the same as the inematic updates proposed in [16. The extent state is updated as given in M LLL ν =ν 1 + m 90 V =V 1 + M LLL 91 = m X 1 + m sx 1 sx 1 + R 1 [ Y sx 1 + R sx 1 + R 1 X 1. 9 This form of the update suggests that the matrix Y serves as a pseudo-measurement for the quantity sx + R. If Y is larger than the current predicted estimate of sx + R i.e., sx 1 + R then a positive matrix quantity is added to the statistics V 1 and vice versa. We here note that E [ Y X = X 1 = HP 1 H T + sx 1 + R 93 the expectation is taen over all the measurements at time and the estimate x 1. Hence we conclude that expected value of the second term on the right hand side of 85 is always positive which maes the current update biased. Another drawbac of the update is that the uncertainty of the predicted estimate x 1 does not affect the update. In order to solve both problems we modify the quantity M LLL as follows. M ULL m X 1 + m sx 1 HP 1 H T + sx 1 + R 1 [ Y HP 1 H T + sx 1 + R HP 1 H T + sx 1 + R 1 X Note that in the current update the difference Y HP 1 H T + sx 1 + R 95 has zero conditional mean which maes the update unbiased and the terms HP 1 H T + sx 1 + R 1 decrease

9 9 with increasing uncertainty in the predicted estimate x 1 which maes the update term smaller. In the following we will call the proposed unbiased measurement update based on linearization of the lielihood in the log domain as ULL for ease of presentation. V. NUMERICAL SIMULATION In this section the newly proposed measurement update ULL, is compared with the update based on variational Bayes [18 and update proposed by Feldmann et al. in [16 which will be referred to as VB and FFK, respectively. In section V-A, we will use a simulation scenario which is partly based on the simulation scenario described in [18, Section IV which in turn is based on [19, Section 6.1. In section V-B an ETT scenario based on the numerical simulation described in [16, VI-B will be presented. A. Monte-Carlo simulations A planar extended target in the two dimensional space whose true inematic state x 0 and the extent state X0 are x 0 = [0 m, 0 m, 100 m/s, 100 m/s T X 0 = E Diag[300 m, 00 m E T 96a 96b is considered. Here, E [e 1, e is a matrix whose columns e 1 and e are the normalized eigenvectors of X 0 which are e 1 1 [1, 1 T and e 1 [1, 1 T. For the extended target with these true parameters, we conduct a Monte Carlo MC simulations to quantify the differences between the measurement updates FFK, ULL and VB. For each MC run we assume that the initial predicted target density for all three methods is the same and has the following structure: px, X Y 0: 1 = N x ; x j 1, P 1IWX ; ν j 1, V j 1 97 superscript j indicates the jth MC run. The quantities x j 1, P 1 for the inematic state are selected as x j 1 N ; x 0, P 1 98a α P 1 = Diag [50, 50, 10, 10 98b the scalar α is the scaling parameter for the covariance of the density in 98a to adjust the distribution of x j 1 in the MC simulation study. The quantities ν j 1 and V j 1 are selected as ν j 1 = max7, νj Poisson100 V j 1 ν j 1 d νj 1 W ; δ, X0 δ 99a 99b the functions Poissonλ and W ; n, Ψ represent the Poisson density with expected value λ and Wishart density with degrees of freedom n and scale matrix Ψ. The scalar δ controls the variance of the Wishart density in this study. For Figure 4: Extent estimation error for FFK, VB and ULL with respect to the optimal Bayesian solution. Solid lines show the mean error E X and shaded areas of respective colors illustrate the interval between the fifth and the ninety fifth percentiles. the j th MC run, m j measurements are generated according to m j = max, mj Poisson10, 100a y j,i N ; Hx0, sx 0 + R 100b s = 0.5, R = 100 I m and H = [I, 0. Matrices I and 0 are the identity matrix and zero matrix of size, respectively. The performance of the measurement updates are compared for various levels of accuracy of the initial predicted target density in view of the optimal Bayesian solution. To this end, the optimal solution is numerically computed via importance sampling using the predicted density as the proposal density. Total of 10 5 samples are generated from the predicted density and the normalized importance weights are calculated from the lielihood function given in 63. Using the importance weights, the posterior expected values of the inematic state x and extent state X are calculated. The resulting expected values are referred to as x opt and Xopt respectively. To change the level of accuracy for the predicted target density, the parameters α are selected on a linear grid of 40 values between 1 and 50 while values of δ were selected on a logarithmic grid of 40 values between and For each pair of α and δ out of the 40 pairs, N MC = 1000 runs have been made by selecting the parameters x j 1, P 1, ν j 1, V j 1 and Yj as above and the estimates xu,j and X U,j for U {FFK, VB, ULL} are calculated3. Using the results of the MC runs, the average error metrics E U x and EX U for U {FFK, VB, ULL} for the inematic state and the extent state, respectively, are calculated as follows: 1 Ex U 1 N MC Hx U,j dn xopt,j, 101 MC E U X 1 d N MC N MC trx U,j Xopt,j In each MC run 0 iterations were performed for the VB solution, although the VB solution would usually converge within the first 5 iterates.

10 y-coordinate [m Figure 5: Kinematic state estimation error for FFK, VB and ULL with respect to the optimal Bayesian solution. Solid lines show the mean error E x and shaded areas of respective colors illustrate the interval between the fifth and the ninety fifth percentiles. Since FFK and ULL use identical inematic state update for identical X 1 their respective graphs coincide x-coordinate [m Figure 6: Trajectory of an extended target. For every third scan, the true extended target and its center are shown. Table III: Comparison of measurement updates for ETT with respect to cycle time and estimation errors. The data are obtained from the single ETT simulation scenario. Since the VB update diverged several time, the errors exceeding 4m are set to 4m to mae the quantitative comparison meaningful. E x[m E X [m Cycle times [s Update Mean ± St.Dev. Mean ± St.Dev. Mean± St.Dev. FFK ± ± ± VB ± ± ± ULL ± ± ± The errors are illustrated in Fig. 4 and Fig. 5. Since FFK and ULL use identical inematic state update, their respective graphs coincide. VB has the lowest extent state estimation error. ULL has a lower estimation compared to FFK in the simulations scenario presented here. When the simulation is repeated with R = 50 I FFK performs better than ULL while VB maintains its superior estimation performance in terms of E X. B. Single extended target tracing scenario The performance of measurement update is evaluated in a single extended target tracing scenario an extended target follows a trajectory in a two dimensional space illustrated in Fig.6. In this simulation scenario, an elliptical extended target with diameters 340 and 80 meters moves with a constant velocity of about 50 m/h. The extended target s true initial inematic state x 0 1 and true initial extent state X 0 1 are as follows: x 0 1 = [0m, 0m, 9.8m/s, 9.8m/s T X 0 1 = E Diag[170 m, 400 m E T 103a 103b E is the matrix whose columns are the normalized eigenvectors of X 0 1 which are e 1 1 [ 1, 1 T and e 1 [1, 1 T, i.e., E [e 1, e. We conduct MC study to quantify the difference between different measurement updates in a single extended target tracing scenario. The measurement set Y j = {yj,i }mj i=1 for the Figure 7: Histogram of the root mean square error of the inematic state Ex U,j for U {FFK, VB, ULL} in a single ETT scenario. j th MC run is generated according to 100 s = 0.5, R = 0 I m and H = [I, 0 and In the filters, the inematic state vector consists of the position and the velocity x = [p x, p y, ṗ x, ṗ y T, which evolves according to the discrete-time constant velocity model in two dimensional Cartesian coordinates [0 px +1 x = N x +1 ; A x, Q and A = [ I τi, Q 0 I = σν [ τ 4 4 I τ 3 I τ 3 I τ I τ = 10s, σ v = 0.1m/s. In the filters the initial prior density for the inematic state and extent state are px 1, X 1 = N x 0 ; x j 1 0, P 1 0IWX 0 ; ν j 1 0, V j x j 1 0 N ; x0 0, P 0 α 0 P 1 0 = Diag[50, 50, 10, 10 ν j 1 0 = max7, νj Poisson10 V j 1 0 ν j 1 0 d νj 1 0 W ; δ 0, X0 0 δ 0, 105a 105b 105c 105d

11 11 Figure 8: Histogram of the average extent state estimation error for U {FFK, VB, ULL} in a single ETT scenario. E U,j X and α 0 = 10 and δ 0 = 5. 1 Time update: The expression for the time update of the target s inematic state in the filter follows the standard Kalman filter time update equations, x j +1 = A x j 106 P j +1 = A P j AT + Q. 107 The extent state is assumed to be slowly varying. Hence, exponential forgetting strategy [15, [1, [ can be used to account for possible changes in the parameters in time. In [3 it is shown that using the exponential forgetting factor will produce maximum entropy distribution in the time update for the processes which are slowly varying with unnown dynamics but bounded by a Kullbac-Leibler divergence KLD constraint. Forgetting factor is applied to the parameters of the inverse Wishart distribution as follows. ν j +1 = exp τ/τ 0ν j, 108 V j +1 = νj +1 d ν j d V j. 109 The exponential decay time constant was selected τ 0 = 15. Evaluation of filters: MC runs are performed the parameters x j 1 0, P 1 0, ν j 1 0, V j 1 0 and Yj for are selected as above. Using the results of the MC runs, the average error metrics Ex U,j and E U,j X for U {FFK, VB, ULL} and j th MC run for the target s inematic state and the extent state, respectively, are calculated as follows: E U,j x E U,j X 1 K Hx M,j dk x =1 1 K d trx M,j K X = , 110a. 110b The errors mean± standard deviation and the cycle times mean± standard deviation for evaluated measurement update rules are given in Table III. The histograms of distribution of the errors defined in 110 are given in Fig. 7 and Fig.8. The estimation errors of FFK and ULL are comparable but the proposed update ULL, is less computationally expensive. VI. CONCLUSION A Bayesian inference technique based on Taylor series approximation of the logarithm of the lielihood function is presented. The proposed approximation is devised for the case the prior distribution belongs to the exponential family of distribution and is continuous. The logarithm of the lielihood function is linearized with respect to the sufficient statistic of the prior distribution in exponential family such that the posterior obtains the same exponential family form as the prior. Taylor series approximation is used for linearization with respect to the sufficient statistic. However, Taylor series approximation is a local approximation which is used for approximation of the posterior over the entire support. In spite of this inherent weaness, the proposed algorithm performs well in the numerical simulations that are presented here for extended target tracing. Furthermore, there are numerous successful applications of extended Kalman filter, which is a special case of the proposed algorithm, in theory and practice. When a Bayesian posterior estimate is needed with limited computational budget which prohibits the use of Monte-Carlo methods or even variational Bayes approximation as in realtime filtering problems, the proposed algorithm offers a good alternative. The comparison of possible choices for the linearization point and linearization methods with respect to the sufficient statistic are among the future research problems. VII. ACKNOWLEDGMENTS The authors would lie to than Henri Nurminen for proof reading and providing comments on an earlier version of the manuscript. REFERENCES [1 M. Jordan, Learning in Graphical Models. MIT Press, [ M. I. Jordan, Z. Ghahramani, T. S. Jaaola, and L. K. Saul, An introduction to variational methods for graphical models, Mach. Learn., vol. 37, no., pp , Nov [Online. Available: [3 T. Mina, Expectation propagation for approximate Bayesian inference, in Proceedings of the Seventeenth Conference Annual Conference on Uncertainty in Artificial Intelligence UAI-01. San Francisco, CA: Morgan Kaufmann, 001, pp [4 H. Rue, S. Martino, and N. Chopin, Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, Journal of the Royal Statistical Society: Series B Statistical Methodology, vol. 71, no., pp , 009. [5 J. A. Nelder and R. W. M. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society. Series A General, vol. 135, no. 3, pp. pp , 197. [6 W. Hastings, Monte Carlo sampling methods using Marov chains and their applications, Biometria, vol. 57, pp , [7 S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp , [8 M. Wainwright and M. Jordan, Graphical Models, Exponential Families, and Variational Inference, ser. Foundations and trends in machine learning. Now Publishers, 008. [9 T. M. Cover and J. Thomas, Elements of Information Theory. John Wiley and Sons, 006. [10 W. Hennevogl, L. Fahrmeir, and G. Tutz, Multivariate Statistical Modelling Based on Generalized Linear Models, ser. Springer Series in Statistics. Springer New Yor, 001. [11 G. Smith, S. Schmidt, L. McGee, U. S. N. Aeronautics, and S. Administration, Application of Statistical Filter Theory to the Optimal Estimation of Position and Velocity on Board a Circumlunar Vehicle, ser. NASA technical report. National Aeronautics and Space Administration, 196. [1 M. I. Jordan, Lecture notes on conjugacy and exponential family, 014. [13 R. E. Kalman, A new approach to linear filtering and prediction problems, Transactions of the ASME Journal of Basic Engineering, vol. 8, no. Series D, pp , [14 S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, ser. Intelligent robotics and autonomous agents. MIT Press, 005.

12 1 [15 J. W. Koch, Bayesian approach to extended object and cluster tracing using random matrices, IEEE Trans. Aerosp. Electron. Syst., vol. 44, no. 3, pp , Jul [16 M. Feldmann, D. Fränen, and W. Koch, Tracing of extended objects and group targets using random matrices, IEEE Trans. Signal Process., vol. 59, no. 4, pp , Apr [17 A. K. Gupta and D. K. Nagar, Matrix variate distributions. Boca Raton, FL: Chapman & Hall/CRC, 000. [18 U. Orguner, A variational measurement update for extended target tracing with random matrices, IEEE Trans. Signal Process., vol. 60, no. 7, pp , 01. [19 M. Baum, M. Feldmann, D. Fraenen, U. D. Hanebec, and W. Koch, Extended object and group tracing: A comparison of random matrices and random hypersurface models. in GI Jahrestagung, ser. LNI, K.- P. Fähnrich and B. Franczy, Eds., vol GI, 010, pp [0 X. Li and V. Jilov, Survey of maneuvering target tracing. part i. dynamic models, IEEE Trans. Aerosp. Electron. Syst., vol. 39, no. 4, pp , Oct 003. [1 R. Kulhavy and M. B. Zarrop, On a general concept of forgetting, International Journal of Control, vol. 58, no. 4, pp , [ C. M. Carvalho and M. West, Dynamic matrix-variate graphical models, Bayesian Anal, vol., pp , 007. [3 E. Özan, V. Smidl, S. Saha, C. Lundquist, and F. Gustafsson, Marginalized adaptive particle filtering for non-linear models with unnown time-varying noise parameters, to appear in Automatica. A. Proof of Lemma APPENDIX The logarithm of the lielihood m N yj ; Hx, sx + R is given as + log N y j ; Hx, sx + R = + m log sx + R y j Hx T sx + R 1 y j Hx 111 =m log X + m log si + X 1/ RX 1/ + y j Hx T sx + R 1 y j Hx 11 =m log X + m log si + X 1/ RX 1/ + tr [ y j Hxy j Hx T sx + R the sign + = denotes an equality up to an additive constant. We approximate the right hand side of 113 by a function that is linear in the sufficient statistic T x, X = x, xx T, log X, X As aforementioned, the statistics xx T and log X are not invertible. The remedy we suggest here is to mae a first order Taylor series expansion of 113 with respect to the new variables Z X 1, 115a N j y j Hxy j Hx T, j = 1,..., m, 115b around the corresponding nominal values 1 Ẑ X 116a N j y j H ˆxy j H ˆx T, j = 1,..., m. 116b Writing 113 in terms of Z and N = [N 1,..., N m, we obtain log N y j ; Hx, sx + R = + m log Z 1 + m log si + Z 1/ RZ 1/ m + tr [ N j sz 1 + R If we mae a first order Taylor series expansion for the second and the third terms on the right hand side of 117 with respect to the variables Z and N around Ẑ and N, we obtain log N y j ; Hx, sx + R + m log Z 1 + m tr + + s sr 1 + Ẑ 1 Z tr [N j sẑ 1 + R 1 [Ẑ 1 tr sẑ 1 + R 1 N j sẑ 1 + R 1 Ẑ 1 Z represents an approximation up to an additive constant. The first order Taylor series approximations for the scalar valued functions of matrix variables of interest are given in Appendix B. We now substitute bac the relations 115 and 116 into 118 to obtain the approximation logn y j ; Hx, sx + R + m log X + m tr sr 1 + X 1 1 X 1 + tr [y j Hxy j Hx T s X + R 1 + s [ tr Xs X + R 1 y j H ˆxy j H ˆx T s X + R 1 1 XX 119 =m log X + m tr sr 1 + X 1 1 X 1 + y j Hx T s X + R 1 y j Hx + s tr Xs X + R 1 y j H ˆxy j H ˆx T s X + R 1 XX 1 10 Rearranging the terms in 10, dividing both sides by and then taing the exponential of both sides, we can see that m N y j ; Hx, sx + R m N y j ; Hx, s X + R IW X; m, M 11

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracing Karl Granström Division of Automatic Control Department of Electrical Engineering Linöping University, SE-58 83, Linöping,

More information

Extended Target Tracking with a Cardinalized Probability Hypothesis Density Filter

Extended Target Tracking with a Cardinalized Probability Hypothesis Density Filter Extended Target Tracing with a Cardinalized Probability Hypothesis Density Filter Umut Orguner, Christian Lundquist and Karl Granström Department of Electrical Engineering Linöping University 58 83 Linöping

More information

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES 2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES Simo Särä Aalto University, 02150 Espoo, Finland Jouni Hartiainen

More information

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION 1 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 3 6, 1, SANTANDER, SPAIN RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T

More information

Tracking of Extended Objects and Group Targets using Random Matrices A New Approach

Tracking of Extended Objects and Group Targets using Random Matrices A New Approach Tracing of Extended Objects and Group Targets using Random Matrices A New Approach Michael Feldmann FGAN Research Institute for Communication, Information Processing and Ergonomics FKIE D-53343 Wachtberg,

More information

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation Fisher Information Matrix-based Nonlinear System Conversion for State Estimation Ming Lei Christophe Baehr and Pierre Del Moral Abstract In practical target tracing a number of improved measurement conversion

More information

Extended Object and Group Tracking with Elliptic Random Hypersurface Models

Extended Object and Group Tracking with Elliptic Random Hypersurface Models Extended Object and Group Tracing with Elliptic Random Hypersurface Models Marcus Baum Benjamin Noac and Uwe D. Hanebec Intelligent Sensor-Actuator-Systems Laboratory ISAS Institute for Anthropomatics

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

1 Kalman Filter Introduction

1 Kalman Filter Introduction 1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation

More information

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS Saiat Saha and Gustaf Hendeby Linöping University Post Print N.B.: When citing this wor, cite the original article. 2014

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Probabilistic Fundamentals in Robotics Gaussian Filters Course Outline Basic mathematical framework Probabilistic models of mobile robots Mobile

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations PREPRINT 1 Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations Simo Särä, Member, IEEE and Aapo Nummenmaa Abstract This article considers the application of variational Bayesian

More information

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 Jinglin Zhou Hong Wang, Donghua Zhou Department of Automation, Tsinghua University, Beijing 100084, P. R. China Control Systems Centre,

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

arxiv: v1 [cs.sy] 12 Dec 2016

arxiv: v1 [cs.sy] 12 Dec 2016 Approximate Recursive Identification of Autoregressive Systems with Sewed Innovations arxiv:161.03761v1 [cs.sy] 1 Dec 016 Henri Nurminen Dept. of Automation Science and Engineering Tampere University of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Properties and approximations of some matrix variate probability density functions

Properties and approximations of some matrix variate probability density functions Technical report from Automatic Control at Linköpings universitet Properties and approximations of some matrix variate probability density functions Karl Granström, Umut Orguner Division of Automatic Control

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010 Probabilistic Fundamentals in Robotics Gaussian Filters Basilio Bona DAUIN Politecnico di Torino July 2010 Course Outline Basic mathematical framework Probabilistic models of mobile robots Mobile robot

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

A Bayesian Approach to Jointly Estimate Tire Radii and Vehicle Trajectory

A Bayesian Approach to Jointly Estimate Tire Radii and Vehicle Trajectory A Bayesian Approach to Jointly Estimate Tire Radii and Vehicle Trajectory Emre Özan, Christian Lundquist and Fredri Gustafsson Abstract High-precision estimation of vehicle tire radii is considered, based

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Previously on TT, Target Tracking: Lecture 2 Single Target Tracking Issues. Lecture-2 Outline. Basic ideas on track life

Previously on TT, Target Tracking: Lecture 2 Single Target Tracking Issues. Lecture-2 Outline. Basic ideas on track life REGLERTEKNIK Previously on TT, AUTOMATIC CONTROL Target Tracing: Lecture 2 Single Target Tracing Issues Emre Özan emre@isy.liu.se Division of Automatic Control Department of Electrical Engineering Linöping

More information

A Comparison of Particle Filters for Personal Positioning

A Comparison of Particle Filters for Personal Positioning VI Hotine-Marussi Symposium of Theoretical and Computational Geodesy May 9-June 6. A Comparison of Particle Filters for Personal Positioning D. Petrovich and R. Piché Institute of Mathematics Tampere University

More information

CLOSE-TO-CLEAN REGULARIZATION RELATES

CLOSE-TO-CLEAN REGULARIZATION RELATES Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

On the Reduction of Gaussian inverse Wishart Mixtures

On the Reduction of Gaussian inverse Wishart Mixtures On the Reduction of Gaussian inverse Wishart Mixtures Karl Granström Division of Automatic Control Department of Electrical Engineering Linköping University, SE-58 8, Linköping, Sweden Email: karl@isy.liu.se

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Extended Target Tracking Using a Gaussian- Mixture PHD Filter

Extended Target Tracking Using a Gaussian- Mixture PHD Filter Extended Target Tracing Using a Gaussian- Mixture PHD Filter Karl Granström, Christian Lundquist and Umut Orguner Linöping University Post Print N.B.: When citing this wor, cite the original article. IEEE.

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Motivation For continuous spaces: often no analytical formulas for Bayes filter updates

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Cross entropy-based importance sampling using Gaussian densities revisited

Cross entropy-based importance sampling using Gaussian densities revisited Cross entropy-based importance sampling using Gaussian densities revisited Sebastian Geyer a,, Iason Papaioannou a, Daniel Straub a a Engineering Ris Analysis Group, Technische Universität München, Arcisstraße

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Sensing Networking Leonidas Guibas Stanford University Computation CS428 Sensor systems are about sensing, after all... System State Continuous and Discrete Variables The quantities

More information

A Gaussian Mixture PHD Filter for Nonlinear Jump Markov Models

A Gaussian Mixture PHD Filter for Nonlinear Jump Markov Models A Gaussian Mixture PHD Filter for Nonlinear Jump Marov Models Ba-Ngu Vo Ahmed Pasha Hoang Duong Tuan Department of Electrical and Electronic Engineering The University of Melbourne Parville VIC 35 Australia

More information

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

Introduction: exponential family, conjugacy, and sufficiency (9/2/13) STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Mini-Course 07 Kalman Particle Filters. Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra

Mini-Course 07 Kalman Particle Filters. Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra Mini-Course 07 Kalman Particle Filters Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra Agenda State Estimation Problems & Kalman Filter Henrique Massard Steady State

More information

Sensor Fusion: Particle Filter

Sensor Fusion: Particle Filter Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS LECTURE : EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS HANI GOODARZI AND SINA JAFARPOUR. EXPONENTIAL FAMILY. Exponential family comprises a set of flexible distribution ranging both continuous and

More information

Estimating the Shape of Targets with a PHD Filter

Estimating the Shape of Targets with a PHD Filter Estimating the Shape of Targets with a PHD Filter Christian Lundquist, Karl Granström, Umut Orguner Department of Electrical Engineering Linöping University 583 33 Linöping, Sweden Email: {lundquist, arl,

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Learning in graphical models & MC methods Fall Cours 8 November 18

Learning in graphical models & MC methods Fall Cours 8 November 18 Learning in graphical models & MC methods Fall 2015 Cours 8 November 18 Enseignant: Guillaume Obozinsi Scribe: Khalife Sammy, Maryan Morel 8.1 HMM (end) As a reminder, the message propagation algorithm

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

A Sequential Monte Carlo Approach for Extended Object Tracking in the Presence of Clutter

A Sequential Monte Carlo Approach for Extended Object Tracking in the Presence of Clutter A Sequential Monte Carlo Approach for Extended Object Tracing in the Presence of Clutter Niolay Petrov 1, Lyudmila Mihaylova 1, Amadou Gning 1 and Dona Angelova 2 1 Lancaster University, School of Computing

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

A Matrix Theoretic Derivation of the Kalman Filter

A Matrix Theoretic Derivation of the Kalman Filter A Matrix Theoretic Derivation of the Kalman Filter 4 September 2008 Abstract This paper presents a matrix-theoretic derivation of the Kalman filter that is accessible to students with a strong grounding

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

CS281A/Stat241A Lecture 17

CS281A/Stat241A Lecture 17 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian

More information

Patterns of Scalable Bayesian Inference Background (Session 1)

Patterns of Scalable Bayesian Inference Background (Session 1) Patterns of Scalable Bayesian Inference Background (Session 1) Jerónimo Arenas-García Universidad Carlos III de Madrid jeronimo.arenas@gmail.com June 14, 2017 1 / 15 Motivation. Bayesian Learning principles

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

2D Image Processing (Extended) Kalman and particle filter

2D Image Processing (Extended) Kalman and particle filter 2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Incorporating Track Uncertainty into the OSPA Metric

Incorporating Track Uncertainty into the OSPA Metric 14th International Conference on Information Fusion Chicago, Illinois, USA, July 5-8, 211 Incorporating Trac Uncertainty into the OSPA Metric Sharad Nagappa School of EPS Heriot Watt University Edinburgh,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

On Spawning and Combination of Extended/Group Targets Modeled with Random Matrices

On Spawning and Combination of Extended/Group Targets Modeled with Random Matrices IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 6, NO. 3, FEBRUARY, 03 On Spawning and Combination of Extended/Group Targets Modeled with Random Matrices Karl Granström, Student Member, IEEE, and Umut Orguner,

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information