Robust Non-linear Smoother for State-space Models

Size: px
Start display at page:

Download "Robust Non-linear Smoother for State-space Models"

Transcription

1 Robust Non-linear Smoother for State-space Models Gabriel Agamennoni and Eduardo M. Nebot Abstract This paper presents a robust, non-linear smoothing algorithm state-space models driven by noise and external inputs. This algorithm is extremely robust to outliers and handles missing data and state-dependent noise. Its implementation is straightforward as it consists of two main components: (a) the Rauch- Tung-Striebel recursions (a..a. the Kalman smoother); and (b) a bac-tracing line search strategy. Since the algorithm preserves the underlying structure of the problem, its computational load is linear in the number of data. Global convergence to a local optimum is guaranteed under mild assumptions. I. INTRODUCTION A. State-space models and optimal estimation State Space Models (SSMs) are ubiquitous in many fields of engineering and applied sciences. Their popularity stems from their flexibility and expressive power. Formally, an SSM is a mathematical model of a dynamic system with inputs and outputs. It represents the system s dynamics and its inputoutput characteristics in terms of latent states. Unlie inputs and outputs, the states are hidden, i.e. they cannot be measured directly (e.g. with a sensor). Optimal estimation is the problem of determining the states given pairs of input-output data. Due to random fluctuations present in the input-output and state processes, this is not possible in general. The state estimates are inevitably afflicted with uncertainty. The random processes are often specified by conditional probability distributions. The most common distribution is the Gaussian, justified by the central limit theorem. It is favored for its convenient analytical properties, although it is seldom motivated by the nature of the actual data. Because it appears relatively frequently, there is an unfortunate tendency to invoe the Gaussian in situations where it is not applicable. In these cases there is a significant ris of drawing incorrect conclusions about the system. The Kalman Filter (KF) [1] is the precursor of many modern-day estimators []. It is optimal (in the least squares sense) for SSMs with linear dynamics and input-output relationships and Gaussian noise [3]. Unfortunately, the KF breas down in the presence of non- Gaussian noise since the sum-of-squares criterion is extremely sensitive to spurious observations [4]. 1 In addition, the KF does not apply to non-linear systems, which are the most common and most relevant in practical applications. The authors are with the Australian Centre for Field Robotics at the University of Sydney, New South Wales, Australia. 1 Since the conditional mean is an unbounded function of the residual, when a large discrepancy arises between the prior and the observation, the posterior distribution becomes an unrealistic compromise between the two. B. Robustness to outliers Outliers are common non-gaussian phenomena [], [6]. Intuitively, they are observations that do not agree with the rest of the data. Even though they may occur by chance in most distributions, outliers often stem from processes that are either unnown or are deliberately left out of the model, e.g. environmental disturbances, sensor failures or factors that are tedious or impractical to model. Systems that rely on high-quality sensor data tracing and control systems, autonomous vehicles, etc. may be sensitive to outliers. In some cases, they may fail catastrophically [7] [9] to the point that a full recovery is impossible. Hence the importance of estimators robust to outliers. C. Related wor In this paper we address a general class of robust estimation problems. Namely, we are concerned with non-linear systems with heavy-tailed and potentially heteroscedastic (i.e. statedependent) noise processes. To the best of the authors nowledge, no one has addressed all three aspects simultaneously. 3 The following wor is related to ours: a. Aravin et al. [11] formulated a robust smoother for nonlinear systems as an optimization problem, although the system has homoscedastic noise; b. Piché et al. [1] presented an outlier-robust filter/smoother for non-linear systems in the context of Assumed Density Filtering (ADF), again with state-independent noise; c. Särä and Nummenmaa [13] introduced a filter that tracs states and noise, albeit for linear systems with timevarying rather than state-dependent noise; d. Agamennoni et al. [14] developed robust filters and smoothers for simultaneously estimating states and noise, though again for linear, time-varying systems. D. Major contributions The major contributions in this paper are: a. A non-trivial generalization of the model introduced by Aravin et al. [11] to the heteroscedastic case; b. A computationally efficient and provably convergent algorithm for approximately solving the smoothing problem unlie [11], our approach does not require approximating the Hessian matrix; and c. A parametrization of the problem carefully designed to mae equations readily interpretable and assert the strong For instance, the Gaussian places over 99% of its probability mass within the interval μ ± 3σ. An outlier σ away from μ has less than one in a million chances of occurring. Although the possibility exists, it is unrealistically small. 3 The Extended Kalman Filter (EKF) [1] could be made robust with linear techniques. However, because it is relies on a first-order Taylor expansion, it would only suitable for mildly non-linear systems.

2 connection to the well-nown Rauch-Tung-Striebel (RTS) recursions; The potential of our approach is demonstrated via experiments on synthetic data from a highly non-linear system. All of the code used to generate the results in this paper is available from the authors web page. E. Outline of this paper Section II defines the smoothing problem and its robust version. Section III proposes and develops an approximation that renders the robust non-linear smoothing problem tractable. In section IV this approximation taes the form of an iterative optimization algorithm. Experiments on synthetic data validate our approach in section V. Finally, section VI concludes and outlines directions for future research. II. A. Definitions THE SMOOTHING PROBLEM Let X and Z be, in that order, the state and measurement sequences. Namely, X =(x 1,...,x n ) R d... R d Z =(z 1,...,z n ) R d1... R dn are real, finite sequences with n terms each. (1a) (1b) Let g and h be, respectively, the one-step prediction function and the observation function for the th term and let Q and R be their corresponding variance-covariance matrices. Specifically, g : R d R d h : R d R d Q : R d P d d R : R d P d d (a) (b) (c) (d) for = 1,...,n, with P d d denoting the cone of d d symmetric positive-definite matrices. Last of all, let {u } and {v } be white Gaussian noise processes that represent the random fluctuations driving the prediction and observation models. That is, u N(, I) v N(, I) (3a) (3b) for all, where N (μ, Σ) denotes a multi-variate Gaussian distribution with mean vector μ and variance-covariance matrix Σ. 4 4 Throughout this manuscript we will use the same symbols (e.g. u and v ) to denote both the random variable and its outcome, the random variate. Although this is a slight abuse of notation, it is for the sae of clarity and should cause no confusion. B. Non-linear smoothing Assume that the state and observation sequences are generated via the following processes: x = g (x 1 )+Q 1 (x 1 ) u (4a) z = h (x )+R 1 (x ) v (4b) for =1,...,n and x N(μ, Σ). Or equivalently, x x 1 N(g (x 1 ), Q (x 1 )) (a) z x N(h (x ), R (x )) (b) Then, the probability density function of the joint distribution over state and observation sequences is p (X, Z) =p (x ) p (x x 1 ) p (z x ) =1 Notice that, although states and observations are conditionally Gaussian, they are not jointly Gaussian in the general case. 6 Given Z, the non-linear smoothing problem consists of finding p (X Z). These distributions are non-gaussian and generally intractable, maing the problem an extremely challenging one with no closed-form solution. There is no choice but to see an approximation. C. Robust non-linear smoothing We define the robust non-linear smoothing problem as the one obtained by replacing (b) with z x T (h (x ), R (x ),s ) (6) where s > is nown and T (μ, Σ,ν) stands for a multivariate t distribution [1] with location vector μ, scale matrix Σ and ν degrees of freedom. D. The t distribution The t is a sub-exponential distribution, meaning that its tails fall off to zero at a less-than-exponential rate. Compared to the Gaussian, which is super-exponential, the t has much heavier tails. Their rate of decay is determined by the number of degrees of freedom: in the limit of ν the tails flatten and the t reduces to a Gaussian. For smaller and smaller ν the probability mass spreads more and more evenly across observation space and further away from the mode, assigning outliers a non-negligible probability. Placing a non-negligible probability on outliers is by no means a drawbac; it simply reflects reality. The Gaussian concentrates most of its probability mass within a small region around the mode, essentially ruling out the possibility that any observation is ever wrong. The t maes no such mistae. In (6) we acnowledge the fact that, occasionally, observations may be off. By imparting this information directly into our model we enable it to deal with outliers natively within the filtering/smoothing framewor. Consequently, there is no need Symbol A 1 denotes the lower-triangular Cholesy factor of the symmetric, positive-definite matrix A. 6 The only case where (X, Z) are jointly Gaussian is if g and h are affine and Q and R are constant for all =1,...,n. In any other case (X, Z) are non-gaussian.

3 for us to explicitly pre-process outliers (e.g. with a rejection threshold) or treat them separately because our model is now capable of explaining them. III. THE APPROXIMATE SMOOTHING PROBLEM A. The Gaussian-Gamma decomposition The t may be regarded a weighted sum of infinitely many Gaussian-distributed random variables with identical mean and proportional variance-covariance parameters [16]. Namely, (6) is equivalent to z x,w N(h (x ), R (x )/w ) (7a) w G(s /,s /) (7b) where G (α, β) denotes a Gamma distribution with shape and rate parameters α and β. Variable w is an ancillary variable that renders the observation z conditionally Gaussian. When marginalized out it yields the t distribution in (6). We will call w the weight of the th observation. Applying the Gaussian-Gamma decomposition in (7) leads to a joint probability density function of the form p (X, Z, W) =p (x ) p (x x 1 ) p (z x,w ) p (w ) =1 and to the following conditional probability density function: p (X Z) = p (X, W Z) dμ (W ) where W =(w 1,...,w n ) is the sequence of weights and μ is the product measure. The weight sequence that originates from this decomposition simplifies the smoothing problem considerably. Rather than approximating p (X Z) directly, we first find an approximation to p (X, W Z) and then marginalize out W to obtain our final result. B. The Kullbac-Leibler divergence Let q be an approximation to p (X, W Z). The Kullbac- Leibler (KL) divergence [17] from q to p is defined as q (X, W) KL [q p] = q (X, W)ln dμ (X) dμ (W ) p (X, W Z) The KL quantifies the distance between q and p, i.e. the error in the approximation of the true posterior. It is non-negative for all q and vanishes if and only if q = p. Our goal now is to find a mathematical form for the approximating distribution such that minimizing the KL divergence is analytically tractable. This involves a trade-off. On one hand, q should be flexible enough so that KL [q p] may be brought close to zero. On the other hand, q should be simple enough so that we can do this tractably. Resolving the inherent tension between flexibility and tractability is the ey to finding a good approximation. C. An approximate tractable posterior For reasons that will become clear shortly, we select an approximation q with the following mathematical form: q (X, W) = δ (x, ˆx ) γ (w ) (8) =1 =1 where δ is Kronecer s delta function and γ is is defined as γ (w )= βα Γ(α ) wα 1 exp ( β w ) (9) for w > and γ (w )=if w. 7 To obtain the best approximation in the KL sense amongst those of the form in (8), we solve min KL [q p] (1) ˆX,θ where ˆX =(ˆx 1,...,ˆx n ) and θ =(α 1,β 1,...,α n,β n ). The solution to this problem is a sequence ˆX of states which, owing to the structure we chose for q in (8), is the approximation to p (X Z) that we see. D. An approximate estimation algorithm In order to solve (1) we apply a coordinate-wise descent algorithm. Starting from an initial guess of ˆX and θ, we cycle between the following steps: a. Minimize KL [q p] w.r.t. ˆX eeping θ fixed; and b. Minimize KL [q p] w.r.t. θ eeping ˆX fixed; until they converge to a local minimum. Convergence is guaranteed because the divergence is non-negative and decreases or remains constant after each cycle. IV. AN APPROXIMATE SMOOTHING ALGORITHM A. The approximate posterior over weights Let us begin with step b. of the algorithm (we will deal with a. soon). Minimizing KL [q p] with respect to θ while eeping ˆX fixed yields 8 α = s + d (11a,b) β = s +(z h (x )) R (x ) 1 (z h (x )) for =1,...,n. Hence the mean of the approximate posterior distribution over w is ω = w γ (w ) dw = α /β (1) Note that the weight decreases as the normalized error increases, i.e. observations that lie far away from their forecast are down-weighted. 7 This is identical to the probability density function of a Gamma-distributed variable with shape and rate parameters α and β, respectively. 8 Although straightforward, the derivation of this result is fairly lengthy. The reader is encouraged to verify (11a,b) by replacing (8) in the definition of the KL divergence, evaluating the expectations with respect to X and dissecting out the term involving w, comparing it with (9) and matching terms.

4 B. The approximate posterior over states Let us now move on to step a. of the algorithm. Minimizing KL [q p] with respect to ˆX while holding θ constant equates to solving the following minimization problem: min b (X) (13) X The objective b is a quadratic-composite function 9 b (X) = t (x 1, x ) =1 u (x 1, x ) u (x 1, x ) =1 ω v (x ) v (x )+... =1 (14) where... denotes additive constants (i.e. terms independent of X) and the terms in the summations are given by t (x 1, x )=lndetq (x 1 )+lndetr (x ) u (x 1, x )=Q 1 (x 1 ) 1 (x g (x 1 )) v (x )=R 1 (x ) 1 (z h (x )) for =1,...,n. C. The sequential quadratic program Due to the quadratic-composite structure of (14), our minimization problem (13) lends itself to a special formulation nown as Sequential Quadratic Program (SQP) [18]. An SQP is an iterative method for breaing down the full problem and solving it as a sequence of sub-problems. Each SQP iteration computes a sequence Y =(y 1,...,y n ) of search directions together with a step size h and updates the state sequence X to X + hy, i.e. it adds an increment of hy. The sequence Y of search directions results from linearizing the objective function b around the terms t, u and v for = 1,...,n and evaluating them at the current state sequence X. Differentiation plus a bit of algebra reveals that Y is the solution to min f (y 1, y ) (1) y 1,...,y n =1 where f is defined in (6) and (7a) to (7d). 1 If we tae a close loo at (6) we realize that the mathematical form of (1) is almost the same as that of a linear, timevarying Kalman smoother. Thus we can easily solve (1) by running a slightly modified form of the Rauch-Tung-Striebel (RTS) recursions [1]. The additional terms q and r, which are caused by state-dependent noise, are readily accounted for by performing an extra correction step with fictitious zerovalued observations. 9 In other words, b, as a function of t, u and v, is quadratic even though it is not necessarily quadratic in x. 1 Note that (7a) to (7d) require evaluating derivatives of the Cholesy factor. The factor and its derivatives can be evaluated simultaneously [19] with the same order of complexity as the original Cholesy algorithm []. The step size h is chosen by performing a line search along the direction Y. In order to guarantee global convergence, 11 we only consider values of h that satisfy Armijo s rule [18]. Given a constant rejection threshold τ (, 1), we search for the largest h such that b (X + hy ) b (X)+τh b (X),Y (16) where is the gradient operator and, is the inner product. (The partial derivatives of b necessary to form the gradient may be found in appendix A.) Let H be the set of all h, h>, that satisfy (16) for a given Y. To find a suitable step size within this set, we apply a bac-tracing line search strategy, which is fast yet effective. Specifically, we compute h as h =max { λ j 1 H,j=1,,... } (17) where λ (, 1) is a constant step reduction factor. Starting from h =1, we chec to see whether h H. If so, we accept h as our step size; otherwise, we reduce h by a factor λ. 1 The SQP terminates upon convergence. At each iteration, we chec the condition b (X),Y nɛ where ɛ> is a constant termination tolerance. If this is true, we have arrived at a local optimum. D. Implementation and pseudo-code Algorithm 1 provides an implementation in pseudo-code of our robust non-linear smoother. Note that the weights (line a.) are re-evaluated once per iteration, i.e. we are performing only one SQP iteration per cycle of the coordinate-wise descent algorithm (sub-section III-D). This means we are not exactly minimizing KL [q p] with respect to ˆX but reducing it. Still, since Y is a descent direction, (16) implies that b (X + hy ) is strictly smaller than b (X) and consequently the divergence decreases after each cycle. We have implemented algorithm 1 in the MatLab language. The source code is available on-line and includes a test script. Interested readers may download the files from [] into their woring directories, compile the.c source code into.mex binaries with MatLab s built-in compiler and type TestRNLS() in the command prompt. Documentation and details of out implementation may be found in the help messages, by typing help RNLS and help RNSL.Estimate, as well as in the comments provided. V. EXPERIMENTAL VALIDATION A. The LX systems To test our algorithm we generated synthetic data from a high-dimensional and strongly non-linear system. The model introduced by Lorenz and Emanuel [3] simulates atmospheric phenomena at equally-spaced sites along a circle of latitude. 11 In the context of non-linear optimization, global convergence does not mean the method converges to the global optimum but that convergence to a local optimum is guaranteed no matter the starting point. 1 Provided that b is a smooth function, the set H is non-empty for all h>. Hence bac-tracing line search is always well defined and terminates finitely many iterations.

5 f (y 1, y )= 1 (y G (x 1 ) y 1 g (x 1 )+x ) Q (x 1 ) 1 (y G (x 1 ) y 1 g (x 1 )+x ) +q (x 1 ) y 1 + ω (z H (x ) y h (x )) R (x ) 1 (z H (x ) y h (x )) + r (x ) y +... [ ] (6) G (x 1, x )= i g (x 1 )+ i Q 1 (x 1 ) u (x 1, x ) (7a) [ ] H (x )= i h (x )+ i R 1 (x ) v (x ) (7b) [ ( ) ] q (x 1 )= tr Q 1 (x 1 ) 1 i Q 1 (x 1 ) (7c) [ ( ) ] r (x )= tr R 1 (x ) 1 i R 1 (x ) (7d) Fig. 1. The th term of the objective function of the linearized sub-problem (1). In (7a) and (7b) only the ith columns of matrices G and H are shown; in (7c) and (7d) only the ith elements of vectors q and r. Operator i computes the partial derivatives with respect to the ith state and tr is the trace operator. In: Observation sequence Z =(z 1,...,z n ), line search rejection threshold τ, step size reduction factor λ and termination tolerance ɛ. Out: State sequence X =(x 1,...,x n ). a. Initialize X. repeat b. Update the sequence (ω 1,...,ω n ) of weights according to (1) and (11). c. Compute the sequence Y =(y 1,...,y n ) of search directions by solving (1), where f is defined in (6) and (7), via modified RTS recursions. d. Keeping the weights fixed, find a step size h that satisfies (16) by way of (17). e. Update X as X + hy. until convergence Algorithm 1: The robust non-linear smoother. It comprises a system of differential equations with quadratic, linear and constant terms representing advection, dissipation and external forces. It has been studied extensively in the context of data assimilation [4], []. The model itself is parameterized by size. A size-d system has a total of d states which obey ẋ i =(x i+1 mod d x i modd ) x i 1 modd x i + u (18) for i =1,...,d, where mod is the modulus operator and u is the external driving force. B. Prediction and observation models Letẋ (t) =f (x (t)) denote the system of differential equations defined above (18). The one-step prediction function g is defined via the following Euler approximation: g (x 1 )=x 1 +Δtf (x (t)) (19a) where Δt > is the sampling period. The prediction uncertainty Q is defined as Q (x 1 )=Δt δ I (19b) with δ>. We assume that all of the states of the systems are directly observable, and hence h (x )=x (a) where H is a matrix that selects the non-missing observations from x and concatenates them into a vector. The observation uncertainty R is diagonal and is defined as ([ ( ) ]) R (x )=diag r (b) x (i) for all =1,...,n, where x (i) is the ith element of the th state and r is the following mapping: r : x ρ x + ɛ where ρ> is a gain parameter and ɛ> is a regularization constant. The observation uncertainty function in (b) represents a constant relative noise model. In other words, the absolute noise increases with the magnitude of the signal. C. Synthetic data Each set of data comprises a pair (X, Z) of sequences. The sequence X of states is generated by simulating the model according to (a), with g and Q given by (19a) and (19b), respectively, for n 1 steps. The first initial is sampled close to the attractor. 13 The sequence Z of observations is generated one observation at a time, sampling from (7b) and (7a) with h and R given by (a) and (b). Table I summarizes the values of the model and sampling parameters we used to generate the data for our experiments. A typical pair of state and observation sequences may be seen in figure. D. Results The metrics we selected for evaluating performance are the Root Mean Squared (RMS), Maximum (Max), Mean Absolute 13 To do so we first simulate the model for a given burn-in time.

6 TABLE I. Continuous-time model parameters Discrete-time model parameters Sampling parameters SUMMARY OF MODEL AND SAMPLING PARAMETERS Name Symbol Value Number of states d Driving force u Sampling period Δt 1/ Predictive uncertainty δ 3 Observation noise gain ρ 1/ Regularization constant ɛ 1 1 Number of data n zt xt RNLS xt t Fig. 3. The same sequences of states and observations as those in figure superimposed on the state estimates returned by the RNLS (abobe) and the sampler (below). The estimates are drawn in gray. zt t Fig.. Typical sequences of states (above) and observations (below) in the synthetic data. In the upper panel the states are plotted as a solid blac line. In the lower panel observations are depicted as blac dots. (MA) and Maximum Absolute (AMax) errors, given by RMS = 1 x ˆx n =1 Max = max x ˆx =1,...,n MA = 1 x ˆx n =1 AMax = max x ˆx =1,...,n respectively, where ˆx is the estimate of the th state in X, denotes the Euclidean and the Manhattan norm. We generated sets of data. For each set, we first ran a windowed median filter with a window size of. seconds to obtain an initial guess of the state sequence. Then we passed this initial guess to algorithm 1. Upon convergence, we too the estimates returned by the RNLS and fed them to a bloc component-wise Metropolis- Hastings () sampler [6]. The sampler was run for over 1 steps (including burn-in and pre-thinning), simulating a total of 1 samples from the posterior distribution over state trajectories. Figure 3 shows the same state and observation sequences as those in figure, plus the estimates returned by the RNLS and the sampler. For the RNLS, the estimates are depicted as 99% confidence intervals, which are derived by combining the mode (i.e. the sequence returned by the algorithm) with the variance-covariance parameters obtained during the RTS recursions. For the sampler, the estimates are simply the point clouds in the sample. Table II and figure 4 summarize the performance metrics we obtained from our experiment. The last column in the table shows statistics for the difference between the errors attained by the RNLS and the sampler. Note the scaling of the vertical axes in the figure Error TABLE II. SUMMARY OF PERFORMANCE METRICS RNLS - RNLS Mean Std. dev. Mean Std. dev. Mean Std. dev. RMS Max MA AMax Root mean squared Maximum Mean absolute Maximum absolute Fig. 4. Box plot of the performance metrics achieved by the RNLS and the sampler. Note the scaling of the vertical axes. E. Discussion Our RNLS is able to trac the states accurately despite the poor quality of the data. The sequence of observations in fig. bears little resemblance to the underlying state sequence. Regardless, fig. 3 shows that our algorithm successfully separates signal from noise. The confidence intervals shrin and widen according to the local noise level. The metrics for the RNLS are displayed alongside the sampler, not for comparison but as a baseline. What we want to show is that our algorithm performs almost on par with the best possible Bayesian estimator, or equivalently, that its excess ris is small. The advantage of the RNLS is its running time: with an average of 1.89 ±.793 seconds per sequence, it is almost 9 times faster than the sampler, which too an average of 17 ±.7 seconds.

7 To the best of the authors nowledge, no other algorithm in the literature simultaneously deals with non-linear systems, heavy-tailed noise and heteroscedasticity, within a fully deterministic framewor. VI. SUMMARY AND CONCLUSIONS The robust non-linear smoother has a great deal of potential for complex sequential estimation problems. It handles nonlinear dynamic and observation processes with state-dependent noise while remaining robust to to outliers and missing data. Thus far the authors are unaware of other algorithms with these capabilities. The core of the estimation algorithm is a coordinate-wise sequential quadratic program. Each quadratic sub-problem is solved in a numerically efficient and analytically interpretable way by a series of forward-bacward recursions analogous to the well-nown Raugh-Tung-Striebel smoother. The iterations are guaranteed to converge, provided the functions defining the model are smooth. At the moment we are looing into ways of extending our approach to allow for more general distributional assumptions. For instance, elliptical distributions [7] are attractive due to their generality and their compact parametrization 14. One could imagine a model parameterized by a pair of density generator functions that determine the rotational shape of the prediction and observation densities. In our experiments we ran a median filter in order to obtain an initial guess of the state estimates. There are many other possibilities. Studying the effects of different initialization schemes e.g. via an unscented KF [8], an extended KF [1] or its iterative variants [9] and assessing their relative merits would be an interesting direction to pursue in the future. APPENDIX A DERIVATIVES OF THE QUADRATIC-COMPOSITE FUNCTION The partial derivatives of b defined in (14) with respect to x evaluated at X are given by b (X) =Q 1 x (x 1) u (x 1, x ) G +1 (x, x +1 ) Q 1 +1 (x ) u +1 (x, x +1 ) +q +1 (x ) ω H (x ) R 1 (x ) v (x )+r (x ). where G, H, q and r are defined in (7a) to (7d). The gradient b (X) of b at the sequence X is the sequence whose th element is b/ x at X. REFERENCES [1] R. Kalman, A new approach to linear filtering and prediction theory, Transactions of the ASME Journal of Basic Engineering, Series D, vol. 8, pp. 3 4, 196. [] S. Roweis and Z. Ghahramani, A unifying review of linear-gaussian models, Neural Computation, vol. 11, no., pp. 3 34, [3] J. Morris, The Kalman filter: A robust estimator for some classes of linear quadratic problems, IEEE Transactions on Information Theory, vol., no., pp. 6 34, September In fact, the t distribution in (6) is elliptical with density generator g (x) = (1 + x/ν ) d for x. [4] P. Huber, Robust estimation of a location parameter, Annals of Mathematical Statistics, vol. 3, no. 1, pp , [] D. Moore and G. McCabe, Introduction to the Practice of Statistics. W.H. Freeman, [6] V. Barnett and T. Lewis, Outliers in Statistical Data. John Wiley & Sons, [7] J. Ting, A. D Souza, and S. Schaal, Automatic outlier detection: A Bayesian approach, in Proceedings of the IEEE International Conference on Robotics and Automation, 7. [8] T. Bailey and H. Durrant-Whyte, Simultaneous localization and mapping (SLAM): Part II, IEEE Robotics and Automation Magazine, vol. 13, no. 3, pp , September 6. [9] J. Loxam and T. Drummond, Student t mixture filter for robust, realtime visual tracing, in Proceedings of the 1th European Conference on Computer Vision: Part III, 8. [1] D. Simon, Optimal State Estimation. John Wiley & Sons, 6. [11] A. Aravin and G. Bure, J.V.and Pillonetto, Robust and trendfollowing student s t Kalman smoothers, optimization Online Preprint. [1] R. Piché, S. Särä, and J. Hartiainen, Recursive outlier-robust filtering and smoothing for non-linear systems using the multi-variate student-t distribution, in Proceedings of the IEEE International Conference on Machine Learning for Signal Processing, 1. [13] S. Sara and A. Nummenmaa, Recursive noise adaptive Kalman filtering by variational Bayesian approximations, IEEE Transactions on Automatic Control, vol. 4, no. 3, pp. 96 6, March 9. [14] G. Agamennoni, J. I. Nieto, and E. Nebot, Approximate inference in state-space models with heavy-tailed noise, IEEE Transactions on Signal Processing, vol. 6, no. 1, pp. 4 37, October 1. [1] B. Kibria and A. Joarder, A short review of the multivariate t- distribution, Journal of Statistical Research, vol. 4, no. 1, pp. 9 7, 6. [16] S. Kotz and S. Nadarajah, Multivariate t Distributions and their Applications. Cambridge University Press, 4. [17] S. Kullbac and R. Leibler, On information and sufficiency, Annals of Mathematical Statistics, vol., no. 1, pp , 191. [18] J. Nocedal and S. Wright, Numerical Optimization, P. Glynn and S. Robinson, Eds. Springer, [19] S. Smith, Differentiation of the cholesy algorithm, Journal of Computational and Graphical Statistics, vol. 4, no., pp , June 199. [] G. Golub and C. van Loan, Matrix Computation. The Johns Hopins University Press, [1] H. E. Rauch, C. T. Striebel, and F. Tung, Maximum lielihood estimates of linear dynamic systems, American Institute of Aeronautics and Astronautics Journal, vol. 3, no. 8, pp , 196. [] G. Agamennoni. Community profile at MathWors. [Online]. Available: [3] E. Lorenz and K. Emanuel, Optimal sites for supplementary weather observations: Simulation with a small model, Journal of the Atmospheric Sciences, vol., no. 3, pp , February [4] P. Saov, D. Oliver, and L. Bertino, An iterative EnKF for strongly nonlinear systems, Monthly Weather Review, vol. 14, no. 6, pp , June 1. [] G. Evensen, Data Assimilation: The Ensemble Kalman Filter. Springer- Verlag, 7. [6] R. Levine, Z. Yu, W. Hanley, and J. Nitao, Implementing componentwise hastings algorithms, Computational Statistics & Data Analysis, vol. 48, pp ,. [7] K.-T. Fang, S. Kotz, and K. Ng, Symmetric Multi-variate and Related Distributions. Chapman & Hall, [8] S. Julier and J. Uhlmann, A new extension of the Kalman filter to nonlinear systems, in International Symposium on Aerospace and Defense Sensing, Simulation and Control, [9] B. Bell and F. Cathey, The iterated Kalman filter update as a Gauss- Newton method, IEEE Transactions on Automatic Control, vol. 38, no., pp , February 1993.

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION 1 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 3 6, 1, SANTANDER, SPAIN RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T

More information

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES 2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES Simo Särä Aalto University, 02150 Espoo, Finland Jouni Hartiainen

More information

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations

Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations PREPRINT 1 Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations Simo Särä, Member, IEEE and Aapo Nummenmaa Abstract This article considers the application of variational Bayesian

More information

Adaptive ensemble Kalman filtering of nonlinear systems. Tyrus Berry and Timothy Sauer George Mason University, Fairfax, VA 22030

Adaptive ensemble Kalman filtering of nonlinear systems. Tyrus Berry and Timothy Sauer George Mason University, Fairfax, VA 22030 Generated using V3.2 of the official AMS LATEX template journal page layout FOR AUTHOR USE ONLY, NOT FOR SUBMISSION! Adaptive ensemble Kalman filtering of nonlinear systems Tyrus Berry and Timothy Sauer

More information

The Unscented Particle Filter

The Unscented Particle Filter The Unscented Particle Filter Rudolph van der Merwe (OGI) Nando de Freitas (UC Bereley) Arnaud Doucet (Cambridge University) Eric Wan (OGI) Outline Optimal Estimation & Filtering Optimal Recursive Bayesian

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Dan Cawford Manfred Opper John Shawe-Taylor May, 2006 1 Introduction Some of the most complex models routinely run

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Prediction of ESTSP Competition Time Series by Unscented Kalman Filter and RTS Smoother

Prediction of ESTSP Competition Time Series by Unscented Kalman Filter and RTS Smoother Prediction of ESTSP Competition Time Series by Unscented Kalman Filter and RTS Smoother Simo Särkkä, Aki Vehtari and Jouko Lampinen Helsinki University of Technology Department of Electrical and Communications

More information

2D Image Processing. Bayes filter implementation: Kalman filter

2D Image Processing. Bayes filter implementation: Kalman filter 2D Image Processing Bayes filter implementation: Kalman filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche

More information

2D Image Processing. Bayes filter implementation: Kalman filter

2D Image Processing. Bayes filter implementation: Kalman filter 2D Image Processing Bayes filter implementation: Kalman filter Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de

More information

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

EM-algorithm for Training of State-space Models with Application to Time Series Prediction EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

Application of the Ensemble Kalman Filter to History Matching

Application of the Ensemble Kalman Filter to History Matching Application of the Ensemble Kalman Filter to History Matching Presented at Texas A&M, November 16,2010 Outline Philosophy EnKF for Data Assimilation Field History Match Using EnKF with Covariance Localization

More information

A new unscented Kalman filter with higher order moment-matching

A new unscented Kalman filter with higher order moment-matching A new unscented Kalman filter with higher order moment-matching KSENIA PONOMAREVA, PARESH DATE AND ZIDONG WANG Department of Mathematical Sciences, Brunel University, Uxbridge, UB8 3PH, UK. Abstract This

More information

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 Jinglin Zhou Hong Wang, Donghua Zhou Department of Automation, Tsinghua University, Beijing 100084, P. R. China Control Systems Centre,

More information

A Comparison of Particle Filters for Personal Positioning

A Comparison of Particle Filters for Personal Positioning VI Hotine-Marussi Symposium of Theoretical and Computational Geodesy May 9-June 6. A Comparison of Particle Filters for Personal Positioning D. Petrovich and R. Piché Institute of Mathematics Tampere University

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Extended Object and Group Tracking with Elliptic Random Hypersurface Models

Extended Object and Group Tracking with Elliptic Random Hypersurface Models Extended Object and Group Tracing with Elliptic Random Hypersurface Models Marcus Baum Benjamin Noac and Uwe D. Hanebec Intelligent Sensor-Actuator-Systems Laboratory ISAS Institute for Anthropomatics

More information

in a Rao-Blackwellised Unscented Kalman Filter

in a Rao-Blackwellised Unscented Kalman Filter A Rao-Blacwellised Unscented Kalman Filter Mar Briers QinetiQ Ltd. Malvern Technology Centre Malvern, UK. m.briers@signal.qinetiq.com Simon R. Masell QinetiQ Ltd. Malvern Technology Centre Malvern, UK.

More information

Ergodicity in data assimilation methods

Ergodicity in data assimilation methods Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation

More information

Simultaneous state and input estimation with partial information on the inputs

Simultaneous state and input estimation with partial information on the inputs Loughborough University Institutional Repository Simultaneous state and input estimation with partial information on the inputs This item was submitted to Loughborough University's Institutional Repository

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Smoothers: Types and Benchmarks

Smoothers: Types and Benchmarks Smoothers: Types and Benchmarks Patrick N. Raanes Oxford University, NERSC 8th International EnKF Workshop May 27, 2013 Chris Farmer, Irene Moroz Laurent Bertino NERSC Geir Evensen Abstract Talk builds

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Lecture 9. Time series prediction

Lecture 9. Time series prediction Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Information Formulation of the UDU Kalman Filter

Information Formulation of the UDU Kalman Filter Information Formulation of the UDU Kalman Filter Christopher D Souza and Renato Zanetti 1 Abstract A new information formulation of the Kalman filter is presented where the information matrix is parameterized

More information

Riccati difference equations to non linear extended Kalman filter constraints

Riccati difference equations to non linear extended Kalman filter constraints International Journal of Scientific & Engineering Research Volume 3, Issue 12, December-2012 1 Riccati difference equations to non linear extended Kalman filter constraints Abstract Elizabeth.S 1 & Jothilakshmi.R

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Dual Estimation and the Unscented Transformation

Dual Estimation and the Unscented Transformation Dual Estimation and the Unscented Transformation Eric A. Wan ericwan@ece.ogi.edu Rudolph van der Merwe rudmerwe@ece.ogi.edu Alex T. Nelson atnelson@ece.ogi.edu Oregon Graduate Institute of Science & Technology

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

A student's t filter for heavy tailed process and measurement noise

A student's t filter for heavy tailed process and measurement noise A student's t filter for heavy tailed process and measurement noise Michael Roth, Emre Ozan and Fredri Gustafsson Linöping University Post Print N.B.: When citing this wor, cite the original article. Original

More information

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation

More information

CLOSE-TO-CLEAN REGULARIZATION RELATES

CLOSE-TO-CLEAN REGULARIZATION RELATES Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

On the Conditional Distribution of the Multivariate t Distribution

On the Conditional Distribution of the Multivariate t Distribution On the Conditional Distribution of the Multivariate t Distribution arxiv:604.0056v [math.st] 2 Apr 206 Peng Ding Abstract As alternatives to the normal distributions, t distributions are widely applied

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Ellida M. Khazen * 13395 Coppermine Rd. Apartment 410 Herndon VA 20171 USA Abstract

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation Vu Malbasa and Slobodan Vucetic Abstract Resource-constrained data mining introduces many constraints when learning from

More information

2D Image Processing (Extended) Kalman and particle filter

2D Image Processing (Extended) Kalman and particle filter 2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz

More information

Data Assimilation Research Testbed Tutorial

Data Assimilation Research Testbed Tutorial Data Assimilation Research Testbed Tutorial Section 2: How should observations of a state variable impact an unobserved state variable? Multivariate assimilation. Single observed variable, single unobserved

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

SLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada

SLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada SLAM Techniques and Algorithms Jack Collier Defence Research and Development Canada Recherche et développement pour la défense Canada Canada Goals What will we learn Gain an appreciation for what SLAM

More information

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracing Karl Granström Division of Automatic Control Department of Electrical Engineering Linöping University, SE-58 83, Linöping,

More information

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Probabilistic Fundamentals in Robotics Gaussian Filters Course Outline Basic mathematical framework Probabilistic models of mobile robots Mobile

More information

Mini-Course 07 Kalman Particle Filters. Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra

Mini-Course 07 Kalman Particle Filters. Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra Mini-Course 07 Kalman Particle Filters Henrique Massard da Fonseca Cesar Cunha Pacheco Wellington Bettencurte Julio Dutra Agenda State Estimation Problems & Kalman Filter Henrique Massard Steady State

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

4. DATA ASSIMILATION FUNDAMENTALS

4. DATA ASSIMILATION FUNDAMENTALS 4. DATA ASSIMILATION FUNDAMENTALS... [the atmosphere] "is a chaotic system in which errors introduced into the system can grow with time... As a consequence, data assimilation is a struggle between chaotic

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing 1 The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing Greene Ch 4, Kennedy Ch. R script mod1s3 To assess the quality and appropriateness of econometric estimators, we

More information

The Kalman Filter ImPr Talk

The Kalman Filter ImPr Talk The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation Fisher Information Matrix-based Nonlinear System Conversion for State Estimation Ming Lei Christophe Baehr and Pierre Del Moral Abstract In practical target tracing a number of improved measurement conversion

More information

A Note on the Particle Filter with Posterior Gaussian Resampling

A Note on the Particle Filter with Posterior Gaussian Resampling Tellus (6), 8A, 46 46 Copyright C Blackwell Munksgaard, 6 Printed in Singapore. All rights reserved TELLUS A Note on the Particle Filter with Posterior Gaussian Resampling By X. XIONG 1,I.M.NAVON 1,2 and

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

A New Nonlinear Filtering Method for Ballistic Target Tracking

A New Nonlinear Filtering Method for Ballistic Target Tracking th International Conference on Information Fusion Seattle, WA, USA, July 6-9, 9 A New Nonlinear Filtering Method for Ballistic arget racing Chunling Wu Institute of Electronic & Information Engineering

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Incorporating Track Uncertainty into the OSPA Metric

Incorporating Track Uncertainty into the OSPA Metric 14th International Conference on Information Fusion Chicago, Illinois, USA, July 5-8, 211 Incorporating Trac Uncertainty into the OSPA Metric Sharad Nagappa School of EPS Heriot Watt University Edinburgh,

More information

L11. EKF SLAM: PART I. NA568 Mobile Robotics: Methods & Algorithms

L11. EKF SLAM: PART I. NA568 Mobile Robotics: Methods & Algorithms L11. EKF SLAM: PART I NA568 Mobile Robotics: Methods & Algorithms Today s Topic EKF Feature-Based SLAM State Representation Process / Observation Models Landmark Initialization Robot-Landmark Correlation

More information

Numerical Methods for PDE-Constrained Optimization

Numerical Methods for PDE-Constrained Optimization Numerical Methods for PDE-Constrained Optimization Richard H. Byrd 1 Frank E. Curtis 2 Jorge Nocedal 2 1 University of Colorado at Boulder 2 Northwestern University Courant Institute of Mathematical Sciences,

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

Answers to Selected Exercises in Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Answers to Selected Exercises in Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Answers to Selected Exercises in Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall This section provides answers to selected exercises in the chapters

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Data assimilation in high dimensions

Data assimilation in high dimensions Data assimilation in high dimensions David Kelly Courant Institute New York University New York NY www.dtbkelly.com February 12, 2015 Graduate seminar, CIMS David Kelly (CIMS) Data assimilation February

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Ensemble Data Assimilation and Uncertainty Quantification

Ensemble Data Assimilation and Uncertainty Quantification Ensemble Data Assimilation and Uncertainty Quantification Jeff Anderson National Center for Atmospheric Research pg 1 What is Data Assimilation? Observations combined with a Model forecast + to produce

More information

Particle Filters; Simultaneous Localization and Mapping (Intelligent Autonomous Robotics) Subramanian Ramamoorthy School of Informatics

Particle Filters; Simultaneous Localization and Mapping (Intelligent Autonomous Robotics) Subramanian Ramamoorthy School of Informatics Particle Filters; Simultaneous Localization and Mapping (Intelligent Autonomous Robotics) Subramanian Ramamoorthy School of Informatics Recap: State Estimation using Kalman Filter Project state and error

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Outline lecture 6 2(35)

Outline lecture 6 2(35) Outline lecture 35 Lecture Expectation aximization E and clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 13-1373 Office: House

More information

Gaussian Mixture Distance for Information Retrieval

Gaussian Mixture Distance for Information Retrieval Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Cross entropy-based importance sampling using Gaussian densities revisited

Cross entropy-based importance sampling using Gaussian densities revisited Cross entropy-based importance sampling using Gaussian densities revisited Sebastian Geyer a,, Iason Papaioannou a, Daniel Straub a a Engineering Ris Analysis Group, Technische Universität München, Arcisstraße

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models $ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information