arxiv: v1 [cs.sy] 12 Dec 2016

Size: px

Start display at page:

Download "arxiv: v1 [cs.sy] 12 Dec 2016"

Bennett Evans
6 years ago
Views:

1 Approximate Recursive Identification of Autoregressive Systems with Sewed Innovations arxiv: v1 [cs.sy] 1 Dec 016 Henri Nurminen Dept. of Automation Science and Engineering Tampere University of Technology Tampere, Finland henri.nurminen@tut.fi Abstract Tohid Ardeshiri Department of Engineering University of Cambridge Cambridge, UK ta417@cam.ac.u We propose a novel recursive system identification algorithm for linear autoregressive systems with sewed innovations. The algorithm is based on the variational Bayes approximation of the model with a multivariate normal prior for the model coefficients, multivariate sew-normally distributed innovations, and matrix-variatenormal inverse-wishart prior for the parameters of the innovation distribution. The proposed algorithm simultaneously estimates the model coefficients as well as the parameters of the innovation distribution, which are both allowed to be slowly time-varying. Through computer simulations, we compare the proposed method with a variational algorithm based on the normally-distributed innovations model, and show that modelling the sewness can provide improvement in identification accuracy. 1 Introduction Many systems produce datasets with sewed noise distribution. Sewness means asymmetry. Positive sewness, for example, intuitively means producing large positive deviations from the median value more frequently than large negative deviations. For instance, some financial data sets show negative sewness because large drops tend to be more frequent than large upsurges [1,, 3, 4]. Wireless networ based positioning often uses time delay measurement as a distance, but non-line-of-sight can produce large positive outliers, so the error distribution becomes positively sewed [5, 6]. One statistical model for sewed error distributions is the sew normal distribution and its multivariate generalisation [7, 8]. The posterior distribution of a normal prior and sew-normal measurement noise model is not analytically tractable. However, the distribution admits a hierarchical formulation whose favorable conjugacy properties enable efficient parameter estimation using the expectation maximisation (EM) algorithm [9, 10, 11] and approximate Bayesian time-series filtering and smoothing based on the variational Bayes (VB) approximation [1, 13]. This paper studies autoregressive (AR) models, where the measurement is modelled to be a linear function of n AR (the model order) previous measurements plus an independent random noise term referred to as the innovation. When the AR coefficients and/or the conditionally sew-normal innovation distribution s statistics are time-varying or they need to be identified online, recursive identification methods are used [14]. In this paper we propose a novel recursive system identification algorithm for AR models with sew-normally distributed measurement noise with unnown possibly slowly time-varying scale and sewness. The proposed approximation is based on a VB approximation. 30th Conference on Neural Information Processing Systems (NIPS 016), Barcelona, Spain.

2 Problem formulation Sew normal distribution is an asymmetric generalization of the normal distribution originally proposed by Azzalini [7]. Its multivariate version was later introduced by Azzalini and Dalla Valle [8]. The version that is used in this report is the canonical fundamental sew normal distribution (CFUSN) introduced by Arellano Valle and Genton [15]. However, we adopt a different parametrization following the guidelines of the canonical fundamental sew t-distribution s parametrization in [16] to obtain a suitable analytical tractability. The probability density function (PDF) of this sew normal distribution z SN(µ, R, ) is p(z) = nz N(z; µ π 1, Ω) F N( T Ω 1 (z µ + π 1); 0, I T Ω 1 ), (1) where 1 is a vector of ones, µ is a location parameter, Ω = R + T, and F N is the cumulative distribution function of the multivariate normal distribution. R R nz nz (symmetric positivedefinite, spd) and R nz nz are shape matrices that determine the scale and sewness, and the sign and structure of determine the direction of sewness as explained in [16]. Examples of the PDF in negatively sewed, symmetric, and positively sewed cases are given in Fig. 1. The moments of this multivariate sew normal distribution given the shape matrices are E[z] = µ, V[z] = R + π T. () Compared to the formulation of [16], we shift the distribution with /π1 so that the mean of the distribution does not depend on. This ensures that the proposed algorithm identifies as a measure of sewness, not as a measure of location. PDF negative sew normal positive sew z Figure 1: The PDFs of negatively-sewed, symmetric, and positively-sewed normal distributions. Each distribution has mean zero and variance one. We formulate the AR coefficient estimation problem as the linear state-space model with the measurement noise being sew-normally distributed conditional on the unnown slowly-varying noise parameters R and p(x 1 ) = N(x 1 ; x 1 0, P 1 0) (3a) x = x 1 + w 1, z = C x + e, w 1 iid N(0, Q 1 ) e iid SN(µ, R, ), where x R nar is the vector of AR coefficients, Q R nar nar (spd) is the process noise covariance matrix that is assumed nown and is thus an algorithm parameter, z R nz is the measurement, C R nz nar = [ z 1 z z nar ] is the measurement model matrix given by n AR previous measurements, and {w R nar } K =1 and {e R nz } K =1 are mutually independent process and measurement noise sequences. 3 Proposed algorithm 3.1 Measurement update Conditional on the parameters R and, the sew-normal random variable e R, SN(µ, R, ) has the hierarchical formulation [9] e u, R, N(µ + (u π 1), R ) (4a) u N + (0, I), (3b) (3c) (4b)

3 where N + is the multivariate normal distribution truncated into positive orthant. To obtain the necessary conjugacy properties, let us assign the matrix-variate-normal inverse-wishart (MVNIW) prior distribution to the joint random variable (R, ): p(r, ) = N( ; 1, R V 1 ) IW(R ; Ψ 1, ν 1 ), (5) where 1 R nz nz, V 1 R nz nz (spd), Ψ 1 R nz nz (spd), and ν 1 (n z, ) are parameters of the prior distribution. N(X; M, U V ) is the PDF of the matrix-variate normal distribution with mean M, and variance parameters U (among-row) and V (among-column) [17, Ch. ], and IW(X; Ψ, ν) is PDF of the inverse-wishart distribution with scale-matrix Ψ and ν degrees of freedom [17, Ch. 3]. The filtering posterior distribution p(x, u, R, z 1: ) of the model defined by (3) and (5) is not analytically tractable. Our solution is to use a variational Bayesian approximation, where we find the functions q x,u (x, u ) and q R, (R, ) such that the reversed Kullbac Leibler divergence (KLD) D KL ( qx,u (x, u ) q R, (R, ) p(x, u, R, z 1: ) ) (6) is minimised, where D KL (q p)= q(x) log( q(x) p(x) ) dx. In general there is no exact analytical solution for (q x,u, q R, ), but the iteration of log q x,u (x, u ) E qr,,[log p(z, x, u, R, z 1: 1 )] + c x,u (7a) log q R, (R, ) E qx,u [log p(z, x, u, R, z 1: 1 )] + c R, (7b) always reduces the KLD (6) and for many models gives a sequence that converges towards the optimal functions (q x,u, q R, ) [18, Chapter 10][19]. The expected values on the right hand sides of (7) are taen with respect to the current q x,u and q R,, and c x,u and c R, are constants with respect to the variables (x, u ), and (R, ), respectively. Thans to the chosen prior distribution structure (5), the update (7b) has a closed form solution that preserves the functional form of the prior, and the moments of the distribution required by other computations are also analytically tractable. The analytical solution of the update (7a) is a multivariate normal distribution truncated by multiple linear constraints. The mean and covariance matrix of this distribution can be approximated using the sequential truncation algorithm [0, 1, 13]. The distribution q x,u (x, u ) is then approximated by the unconstrained multivariate normal distribution with the obtained moments q x,u (x, u ) N ( [ x u ] ; ξ, Ξ ), (8) where ξ and Ξ are the mean and covariance matrix given by the sequential truncation algorithm. Normal marginal posterior approximation for x guarantees that we get a recursive algorithm. The approximative filtering posterior of (R, ) is the MVNIW distribution q R, (R, ) = N( ;, R V ) IW(R ; Ψ, ν ), (9) whose required moments are analytically tractable when ν >n z as shown in Appendix A. 3. Time update The marginal distribution of the AR coefficient vector x in the posterior approximation N ( [ x ) u ] ; ξ, Ξ qr, (R, ) is a normal distribution and the state transition (3b) is linear and Gaussian. Thus, the filter prediction becomes the standard Kalman filter prediction and the prediction distribution is normal. The dynamical model of the model parameters p(r, R 1, 1 ) is typically unnown and/or intractable. Therefore, we adopt the forgetting factor update, which provides the maximum-entropy solution given the KLD from the previous posterior [, 3]. Thus, the used prediction density given the MVNIW approximation of the previous posterior and the forgetting factor γ (0, 1] is ˆp(R, y 1: 1 ) N( ; 1 1, R 1 γ V 1 1) IW(R ; γψ 1 1, γ ν (1 γ) n z ). (10) where the term (1 γ) n z guarantees that the resulting inverse-wishart distribution is well-defined and has an expectation value. The details of the proposed recursive identification algorithm including the prediction equations implied by the time update (10) are given in Appendix B. 3

4 4 Simulated example We simulated 1000 Monte Carlo replications of the AR model with 5 AR coefficients with n z = dimensional sew-normally distributed innovations with parameters R=0.1 I and =[ 1 0 ]. Thus, the true distribution has high positive sewness. The true coefficients were simulated by generating the zeros of the characteristic polynomial from the uniform distribution unif( 1, 1). The number of AR coefficients was assumed nown. The initial prior covariance matrix for the AR coefficient vector was given by the 1st order stable spline ernel [P 1 0 ] i,j = max(i 1,j 1), and the process noise covariance was chosen as [Q 1 ] i,j = ( 1 γ 1) max(diag(p 1 1)) 0.5 max(i 1,j 1) to preserve the stable ernel form of the prior [4, 5]. The proposed method is compared with the Gaussian variational Bayes filter for slowly-drifting noise proposed by Agamennoni et al in [6]. The sew-normal based identification method was given the positive direction of the sewness by using the initial prior π 1 p(r 1, 1 ) = N( 1 ; I, R 1 I) IW(R 1 ; ν I, ν 1 0 ), (11) where ν 1 0 = That is, the variance is divided equally between the symmetric and sewed component in the sense that E[R1 1 ] 1 = E[ 1 ] π = 1 I. The normal distribution based method was given the initial prior p(r 1 ) = IW(R 1 ; (ν 1 0 3)I, ν 1 0 ). (1) The forgetting factor value used with both the methods was γ = 0.975, and the number of VB iterations was 10. Fig. shows the relative difference of the identification error ɛ = nar i=1 (x (x ) true ) (13) as a function of the fed number of measurements. The figure shows that the sew-normal based identification method gives a lower median error than the normal distribution based, and the relative differences increase as the number of measurements increase. Fig. shows that after measurements, the sew-normal based method is more accurate in about 95 % of the cases and gives at least 5 % lower identification error in most of the simulations. 100*(error sew -error normal )/error normal #measurements median 5% / 75% quant. 5% / 95% quant. Figure : The proposed algorithm is more accurate than the normal distribution based algorithm in 95 % of the simulations, and in most of the simulations the error (13) is reduced by more than 5 %. 5 Conclusions We proposed a novel recursive estimation algorithm for identifying the model coefficients and innovation distribution parameters of autoregressive models with sew-normally distributed innovations. Both model coefficients and innovation distribution parameters can be slowly time-varying. Our computer simulation showed that modelling sewness can improve the accuracy of identification. 4

5 Acnowledgments H. Nurminen receives funding from Tampere University of Technology Graduate School, Noia Technologies Oy, the Foundation of Noia Corporation, and Teniian edistämissäätiö. T. Ardeshiri receives funding from Swedish research council s (VR) project Scalable Kalman filters, and from Jaguar Land Rover (JLR), Whitley, UK. References [1] C. R. Harvey and A. Siddique, Autoregressive conditional sewness, The Journal of Financial and Quantitative Analysis, vol. 34, pp , December [] E. Jondeau and M. Rocinger, Conditional volatility, sewness, and urtosis: existence, persistence, and comovements, Journal of Economic Dynamics and Control, vol. 7, pp , 003. [3] P. Christofferssen, S. Heston, and K. Jacobs, Option valuation with conditional sewness, Journal of Econometrics, vol. 131, pp , 006. [4] G. Tsiotas, On generalised asymmetric stochastic volatility models, Computational Statistics and Data Analysis, vol. 56, pp , 01. [5] K. Kaemarungsi and P. Krishnamurthy, Analysis of WLAN s received signal strength indication for indoor location fingerprinting, Pervasive and Mobile Computing, vol. 8, no., pp , 01. Special Issue: Wide-Scale Vehicular Sensor Networs and Mobile Sensing. [6] H. Nurminen, T. Ardeshiri, R. Piché, and F. Gustafsson, A NLOS-robust TOA positioning filter based on a sew-t measurement noise model, in International Conference on Indoor Positioning and Indoor Navigation (IPIN), pp. 1 7, October 015. [7] A. Azzalini, A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, vol. 1, no., pp , [8] A. Azzalini and A. Dalla Valle, The multivariate sew-normal distribution, Biometria, vol. 83, no. 4, pp , [9] T. I. Lin, Maximum lielihood estimation for multivariate sew normal mixture models, Journal of Multivariate Analysis, vol. 100, pp , 009. [10] S. Lee and G. J. McLachlan, Finite mixtures of multivariate sew t-distributions: some recent and new results, Statistics and Computing, vol. 4, no., pp , 014. [11] H. J. Ho, S. Pyne, and T. I. Lin, Maximum lielihood inference for mixtures of sew student-t-normal distributions through practical EM-type algorithms, Statistics and Computing, vol., pp , 01. [1] H. Nurminen, T. Ardeshiri, R. Piché, and F. Gustafsson, Robust inference for state-space models with sewed measurement noise, IEEE Signal Processing Letters, vol., pp , November 015. [13] H. Nurminen, T. Ardeshiri, R. Piché, and F. Gustafsson, Sew-t filter and smoother with improved covariance matrix approximation. Available online at August 016. [14] L. Ljung, Recursive identification algorithms, Circuits, Systems, and Signal Processing, vol. 1, no. 1, pp , 00. [15] R. B. Arellano-Valle and M. G. Genton, On fundamental sew distributions, Journal of Multivariate Analysis, no. 96, pp , 005. [16] S. X. Lee and G. J. McLachlan, Finite mixtures of canonical fundamental sew t-distributions the unification of the restricted and unrestricted sew t-mixture models, Statistics and Computing, no. 6, pp , 016. [17] A. K. Gupta and D. K. Nagar, Matrix variate distributions. Boca Raton, FL: Chapman & Hall/CRC, 000. [18] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 007. [19] D. G. Tzias, A. C. Lias, and N. P. Galatsanos, The variational approximation for Bayesian inference, IEEE Signal Processing Magazine, vol. 5, pp , Nov

6 [0] T. Perälä and S. Ali-Löytty, Kalman-type positioning filters with floor plan information, in 6th International Conference on Advances in Mobile Computing and Multimedia (MoMM), pp , 008. [1] D. J. Simon and D. L. Simon, Constrained Kalman filtering via density function truncation for turbofan engine health estimation, International Journal of Systems Science, vol. 41, no., pp , 010. [] M. Kárný and K. Dedecius, Approximate Bayesian recursive estimation: On approximation errors, tech. rep., ÚTIA AV ČR, January 01. [3] E. Özan, V. Šmídl, S. Saha, C. Lundquist, and F. Gustafsson, Marginalized adaptive particle filtering for nonlinear models with unnown time-varying noise parameters, Automatica, vol. 49, 013. [4] G. Pillonetto and G. De Nicolao, A new ernel-based approach for linear system identification, Automatica, vol. 46, pp , 010. [5] T. Chen, H. Ohlsson, and L. Ljung, On the estimation of transfer functions, regularizations and Gaussian processes revisited, Automatica, vol. 48, pp , 01. [6] G. Agamennoni, J. Nieto, and E. Nebot, Approximate inference in state-space models with heavy-tailed noise, IEEE Transactions on Signal Processing, vol. 60, pp , October 01. 6

7 Appendices A Variational solution of the measurement update Our variational solution uses this hierarchical formulation of the measurement noise model: z x, u, R, N(C x + (u π 1), R ), u N + (0, I), R N( 1, R V 1 ), R IW(Ψ 1, ν 1 ), (14a) (14b) (14c) (14d) where z R nz is the measurement, u R nz is the sewness variable vector, and 1 R nz nz, V 1 R nz nz (spd), Ψ 1 R nz nz (spd), and ν 1 > n z are the parameters of the joint prior distribution of and R. The prior of and R is implied by the previous filtering posterior and the time update step (filter prediction) that is explained in section 3.. The derivations for the variational solution (7) are given in Sections A.1 and A.. For brevity all constant values are denoted by c in the derivation. The logarithm of the full filtering distribution which is needed for the derivations is log p(z, x, u, R, z 1: 1 ) = 1 (z C x (u π 1))T R 1 1 (x x 1 ) T P 1 1 (x x 1 ) 1 ut u 1 Tr{( 1 )V 1 1 ( 1 ) T R 1 } ν (z C x (u π 1)) log det(r ) 1 Tr{Ψ 1R 1 } + c, u 0, (15) where x 1 and P 1 are the mean and covariance matrix of the current predictive distribution, and Tr{ } is the matrix trace. A.1 Derivations for q x,u Using equation (7a) we obtain log q x,u (x, u ) = 1 E q R, [ (z C x (u π 1))T R 1 (z C x (u π 1))] 1 (x x 1 ) T P 1 1 (x x 1 ) 1 ut u + c (16) = 1 (z C x (u π 1))T R 1 (z C x (u π 1)) 1 (u π 1)T (E qr, [ T R 1 ] T R 1 )(u π 1) 1 (x x 1 ) T P 1 1 (x x 1 ) 1 ut u + c u 0, (17) where (R, ) (E qr, [R 1 ] 1, E qr, [ ]) as well as the identity E qr, [R 1 ] = R 1 are derived in Section A.. The inequality u 0 denotes that each element of the vector u is required to be greater or equal than zero. Further, in Section A. it is proved that 7

8 E qr, [ T R 1 ] = n z V + T R 1, so Eq. (17) becomes log q x,u (x, u ) = 1 (z C x (u π 1)T V (u n z (u π 1))T R 1 π 1) (z C x (u π 1)) 1 (x x 1 ) T P 1 1 (x x 1 ) 1 ut u + c (18) = 1 (z + π 1 [ C ] [ x u ]) T R 1 (z + π 1 [ C ] [ x u ]) 1 ([ x u ] ξ 1 ) T Ξ 1 1 ([ x u ] ξ 1 ) + c, u 0, (19) where [ ] P 1 O Ξ 1 = O (I + n z V ) 1, (0) [ ] ξ 1 = x 1 n z π (I + n zv ) 1 V 1. (1) Hence, q x,u (x, u ) N(z + π 1; [ C ] [ x u ], R ) N([ x u ] ; ξ 1, Ξ 1 ) [u 0] () N([ x u ] ; ξ, Ξ ) [u 0], (3) where [ ] is the Iverson bracet, and ξ and Ξ are the outputs of the Kalman filter update C = [ C ], (4) K = Ξ 1 CT ( C Ξ 1 CT + R ) 1, (5) ξ = ξ 1 + K (z + π 1 C ξ 1 ), (6) Ξ = (I K C )Ξ 1. (7) To mae the algorithm recursive, we approximate q x,u with a multivariate normal distribution q x,u (x, u ) = N([ x u ] ; ξ, Ξ ) [u 0] (8) N([ x u ] ; ξ, Ξ ), (9) whose approximate mean and covariance matrix ξ and Ξ are obtained through approximate moment-matching. Our approach for approximating the moments is the sequential truncation algorithm [0, 1][13, Table I]. Let us denote the approximate distribution with q x,u (x, u ) N([ x u ] ; ξ, Ξ ). In Section A., certain moments of q x,u are required. They are approximated as x E qxu [x ] = [ξ ] 1:nx, (30) P V qxu [x ] = [Ξ ] 1:nx,1:n x, (31) u E qxu [u ] = [ξ ] nx+(1:n z), (3) U V qxu [u ] = [Ξ ] nx+(1:n z),n x+(1:n z), (33) Υ E qxu [x u T ] x u T = [Ξ ] 1:nx,n x+(1:n z), (34) where n x +(1 : n z ) denotes (n x +1) : (n x +n z ). 8

9 A. Derivations for q R, Using equation (7b) and the approximation (9) we obtain logq R, (R, ) = E qx,u [log N(z ; C x + (u ] π 1), R ) + log N( ; 1, R V 1 ) + log IW(R ; Ψ 1, ν 1 ) + c (35) = 1 log det(r ) 1 { ] } Tr E qx,u [(z C x (u π 1))(z C x (u π 1))T R 1 n z log det(r ) 1 { } Tr ( 1 )V 1 1 ( 1 ) T R 1 ν 1 = ν 1 + n z + 1 log det(r ) 1 Tr { Ψ 1 R 1 log det(r ) 1 Tr {( ( U + (u ( z (u ( z (u π 1)(u π 1)T C (Υ + x (u } + c (36) π 1)T) T π 1)T ) ) T π 1)T C (Υ + x (u π 1)T ) ) T ) } R 1 + (z C x )(z C x ) T + C P C T 1 {( ) } Tr ( 1 )V 1 1 ( 1 ) T + Ψ 1 R 1 + c (37) = ν 1 + n z + 1 log det(r ) 1 Tr {( ( U + (u ( z (u ( z (u π 1)(u π 1)T + V 1 π 1)T C (Υ + x (u π 1)T C (Υ + x (u 1) T 1) T ) T π 1)T ) + 1 V 1 π 1)T ) + 1 V (z C x )(z C x ) T + C P C T + 1 V 1 1 T 1 + Ψ 1 = n z log det(r ) 1 Tr{( )V 1 ( ) T R 1 } ν log det(r ) 1 Tr{Ψ R 1 } (39) where V = ( U + (u π 1)(u π 1)T + V 1) 1, (40) = ( (z C x )(u π 1)T C Υ + 1 V 1) V, (41) ν =ν 1 + 1, (4) Therefore, Ψ = 1 V 1 1 T 1 V 1 T + (z C x )(z C x ) T ) R 1 } (38) + C P C T + Ψ 1. (43) q R, (R, ) = N( ;, R V ) IW(R ; Ψ, ν ). (44) 9

10 The following moments are required for the derivations of Section A.1: E qr, [ ] =, (45) R E qr, [R 1 1 ] 1 = ν n z 1 Ψ. (46) Eq. (46) follows from the fact that R IW(Ψ, ν ) implies that R 1 is Wishart-distributed with shape matrix Ψ 1 and ν n z 1 degrees of freedom [17, Ch. 3.4]. Furthermore, and E qr, [R 1 ] = R 1 N( ;, R V ) IW(R ; Ψ, ν ) d dr (47) = R 1 IW(R ; Ψ, ν ) dr (48) = (ν n z 1)Ψ 1 (49) = R 1 (50) E qr, [ T R 1 ] = T R 1 N( ;, R V ) IW(R ; Ψ, ν ) d dr (51) = (Tr{R R 1 }V + T R 1 ) IW(R ; Ψ, ν ) dr (5) = n z V + (ν n z 1) T Ψ 1 (53) = n z V + T R 1, (54) where (5) follows from the matrix-variate normal identity E[X T AX] = Tr{UA T }V + M T AM for X N(M, U V ) [17, Ch..3]. 10

11 B Recursive Identification Algorithm for Linear Systems with Sewed Innovations 1: Inputs: x 1 0, P 1 0, 1 0, V 1 0, Ψ 1 0, ν 1 0, Q 1:K, C 1:K, z 1:K, γ : for = 1 to K do Initialize 3: x x 1 4: u u 1 5: 1 6: V V 1 7: Ψ Ψ 1 8: ν ν : repeat 10: 1 R ν n Ψ z 1 Update q x,u (x, u ) N([ x [ u ] ; ξ, Ξ ) x 1 ] 11: ξ 1 n z /π(inz +n zv ) 1 V 1 1: Ξ 1 blocdiag(p 1, (I + n z V ) 1 ) 13: C [ C ] 14: K Ξ 1 CT ( C Ξ 1 CT + R ) 1 15: ξ ξ 1 + K (z C ξ 1 + π 1) 16: Ξ Ξ 1 K C K T 17: [ξ, Ξ ] seq_trunc( ξ, Ξ, {n AR +1,..., n AR +n z }) See [13, Table I] 18: x [ξ ] 1:nAR 19: P [Ξ ] 1:nAR,1:n AR 0: ũ [ξ ] nar+(1:n z) π 1 1: U [Ξ ] nar+(1:n z),n AR+(1:n z) : Υ [Ξ ] 1:nAR,n AR+(1:n z) Update q R, (R, ) = N( ;, R V ) IW(R ; R, ν ) 3: V ( U + ũ ũ T + V 1 ) 1 1 4: ( (z C x )ũ T C Υ + 1 V 1) V 5: Ψ 1 V 1 1 T 1 V 1 T 6: +(z C x )(z C x ) T + C P C T + Ψ 1 7: until converged Predict 8: x +1 x 9: P +1 P + Q 30: +1 31: V +1 1 γ V 3: Ψ +1 γψ 33: ν +1 γ ν + (1 γ) n z 34: end for 35: Outputs: x and P for = 1,..., K 11

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

1 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 3 6, 1, SANTANDER, SPAIN RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T