On Sparsity by NUV-EM, Gaussian Message Passing, and Kalman Smoothing

Size: px
Start display at page:

Download "On Sparsity by NUV-EM, Gaussian Message Passing, and Kalman Smoothing"

Transcription

1 IA 216 On Sparsity y UV-EM, Gaussian Message Passing, and Kalman Smoothing Hans-Andrea Loeliger, Luas Bruderer, Hampus Malmerg, Federico Wadehn, and our Zalmai EH Zurich, Dept. of Information echnology & Electrical Engineering Astract ormal priors with unnown variance UV have long een nown to promote sparsity and to lend well with parameter learning y expectation maximization EM. In this paper, we advocate this approach for linear state space models for applications such as the estimation of impulsive signals, the detection of localized events, smoothing with occasional jumps in the state space, and the detection and removal of outliers. he actual computations oil down to multivariate-gaussian message passing algorithms that are closely related to Kalman smoothing. We give improved tales of Gaussian-message computations from which such algorithms are easily synthesized, and we point out two preferred such algorithms. I. IRODUCIO his paper is aout two topics: 1 A particular approach to modeling and estimating sparse parameters ased on zero-mean normal priors with unnown variance UV. 2 Multivariate-Gaussian message passing variations of Kalman smoothing in such models. he main point of the paper is that these two things go very well together and comine to a versatile toolox. his is not entirely new, of course, and the ody of related literature is large. onetheless, the specific perspective of this paper has not, as far as nown to these authors, een advocated efore. Concerning the second topic, linear state space models continue to e an essential tool for a road variety of applications, cf. [1] [4]. he primary algorithms for such models are variations and generalizations of Kalman filtering and smoothing, or, equivalently, multivariate-gaussian message passing in the corresponding factor graph [5], [6] or similar graphical model [1]. A variety of such algorithms can easily e synthesized from tales of message computations as in [6]. In this paper, we give a new version of these tales with many improvements over those in [6], and we point out two preferred such algorithms. Concerning the first topic, UV priors zero-mean normal priors with unnown variance originated in Bayesian inference [7] [9]. he sparsity-promoting nature of such priors is the asis of automatic relevance determination ARD and sparse Bayesian learning developed y eal [9], ipping [], [11], Wipf et al. [12], [13], and others. he asic properties of UV priors are illustrated y the following simple example. Let U e a variale or parameter of interest, which we model as a zero-mean real scalar Gaussian random variale with unnown variance s 2. Assume that we oserve Y = U + Z, where the noise Z is zero-mean Gaussian with nown variance σ 2 and independent of U. he maximum lielihood ML estimate of s 2 from a single sample Y = µ R is easily determined: ŝ 2 1 = /2s argmax 2 +σ 2 s 2 2πs2 + σ 2 e µ2 1 = max{, µ 2 σ 2 }. 2 In a second step, for s 2 fixed to ŝ 2 as in 2, the MAP/MMSE/LMMSE estimate of U is û = µ { = ŝ 2 ŝ 2 + σ 2 3 µ µ2 σ 2 µ 2 if µ 2 > σ 2, otherwise. Equations 1 4 continue to hold if the scalar oservation Y is generalized to an oservation Y R such that, for fixed Y = y, the lielihood function py u is Gaussian up to a scale factor with mean µ and variance σ 2. In fact, this is all we need to now in this paper aout UV priors per se. he estimate 4 has some pleasing properties: first, it promotes sparsity and can thus e used to select features or relevant parameters; second, it has no a priori preference as to the scale of U, and large values of U are not scaled down. ote that the latter property is lost if ML estimation of s 2 is replaced y MAP estimation ased on a proper prior on s 2. In this paper, we will stic to asic UV regularization as aove, with no prior on the unnown variances: variales or parameters of interest are modeled as independent Gaussian random variales, each with its own unnown variance that is estimated exactly or approximately y maximum lielihood. We will advocate the use of UV regularization in linear state space models, for applications such as the estimation of impulsive signals, the detection of localized events, smoothing with occasional jumps in the state space, and the detection and removal of outliers. Concerning the actual computations, estimating the unnown variances is not sustantially different from learning other parameters of state space models and can e carried out y expectation maximization EM [14] [17] and other methods in such a way that the actual computations essentially amount to Gaussian message passing. he paper is structured as follows. In Section II, we egin with a quic loo at UV regularization in a standard linear model. Estimation of the unnown variances is addressed in Section III. Factor graphs and state space models are reviewed 4

2 in Sections IV and V, respectively, and UV regularization in such models is addressed in Section VI. he new tales of Gaussian-message computations are given in Appendix A. II. SUM OF GAUSSIAS AD LEAS SQUARES WIH UV REGULARIZAIO We egin with an elementary linear model a special case of a relevance vector machine [] as follows. For 1,..., K R n \ {}, let K Y = U + Z 5 =1 where U 1,..., U K are independent zero-mean real scalar Gaussian random variales with unnown variances σ1, 2..., σk 2, and where the noise Z is Rn -valued zero-mean Gaussian with covariance matrix σ 2 I and independent of U 1,..., U K. For a given oservation Y = y R n, we wish to estimate, first, σ1, 2..., σk 2 y maximum lielihood, and second, U 1,..., U K with σ1, 2..., σk 2 fixed. In the first step, we achieve sparsity: if the ML estimate of σ 2 is zero, then U = is fixed in the second step. he second step the estimation of U 1,..., U K for fixed σ1, 2..., σk 2 is a standard Gaussian estimation prolem where MAP estimation, MMSE estimation, and LMMSE estimation coincide and amount to minimizing 1 y σ 2 u σ 2 u 2, 6 K + K + where K + denotes the set of those indices {1,..., K} for which σ 2 >. A closed-form solution of this minimization is with û = σ 2 W y 7 K 1 W = σ 2 + σ I 2, 8 =1 as may e otained from standard least-squares equations see also [11]. An alternative proof will e given in Appendix B, where we also point out how W can e computed without a matrix inversion. In 2 and 4, the estimate is zero if and only if y 2 σ 2. wo different generalizations of this condition to the setting of this section are given in the following theorem. Let py,... denote the proaility density of Y and any other variales according to 5. heorem. Let σ 1,..., σ K e fixed at a local maximum or at a saddle point of py σ1, 2..., σk 2. hen σ2 = if and only if W y 2 W 9 with W = K 1 σl 2 l l + σ 2 I σ 2. l=1 Moreover, with W as in 8, we have W y 2 W, 11 with equality if σ 2 >. he proof will e given in Appendix B. he matrices W and W are oth positive definite. he former depends on, ut not on σ 2; the latter depends also on σ2, ut not on. III. VARIACE ESIMAIO Following a standard approach, the unnown variances σ1, 2..., σk 2 in Section II and analogous quantities in later sections can e estimated y an EM algorithm as follows. 1 Begin with an initial guess of σ1, 2..., σk 2. 2 Compute the mean m U and the variance σu 2 of the Gaussian posterior distriution pu y, σ1, 2..., σk 2 with σ1, 2..., σk 2 fixed. 3 Update σ1, 2..., σk 2 according to 13 elow. 4 Repeat steps 2 and 3 until convergence, or until some pragmatic stopping criterion is met. 5 Optionally update σ1, 2..., σk 2 according to 16 elow. he standard EM update for the variances is σ 2 E [ U 2 σ1, 2..., σk 2 ] 12 = m 2 U + σu he required quantities m 2 U and σu 2 are given y 85 and 88, respectively. With this update, asic EM theory guarantees that the lielihood py σ1, 2..., σk 2 cannot decrease and will normally increase in step 3 of the algorithm. he stated EM algorithm is safe, ut the convergence can e slow. he following alternative update rule, due to MacKay [], often converges much faster: m2 U σ 2 1 σu 2 /σ 2 14 However, this alterative update rule comes without guarantees; sometimes, it is too agressive and the algorithm fails completely. An individual variance σ 2 can also e estimated y a maximum-lielihood step as in 2: σ 2 argmax py σ1, 2..., σk 2 15 σ 2 = max {, m U 2 σu 2 }, 16 he mean m U is given y 4 and the variance σu 2 is given y 95. However, for parallel updates simultaneously for all {1,..., K}, as in step 3 of the algorithm aove, the rule 16 is normally too agressive and fails. Later on, the same algorithm will e used for estimating parameters or variales in linear state space models. In this case, we have no useful analytical expressions for the analogs of m U and σu 2, ut these quantities are easily computed y Gaussian message passing. IV. O FACOR GRAPHS AD GAUSSIA MESSAGE PASSIG From now on, we will heavily use factor graphs, oth for reasoning and for descriing algorithms. We will use factor graphs as in [5], [6], where nodes/oxes represent factors and

3 σ 1 σ 2 σ K U 1 1 U U K K, σ 2 I X = Ũ 1 + Ũ 2 Ũ K Z X 1 X K Y = y Fig. 1. Cycle-free factor graph of 5 with UV regularization. edges represent variales. By contrast, factor graphs as in [18] have oth variale nodes and factor nodes. Figure 1, for example, represents the proaility density py, z, u 1,..., u K σ 1,..., σ K of the model 5 with auxiliary variales Ũ = U and X = X 1 +Ũ with X =. he nodes laeled represent zero-mean normal densities with variance 1; the node laeled, σ 2 I represents a zeromean multivariate normal density with covariance matrix σ 2 I. All other nodes in Figure 1 represent deterministic constraints. For fixed σ 1,..., σ K, Figure 1 is a cycle-free linear Gaussian factor graph and MAP/MMSE/LMMSE estimation of any variales can e carried out y Gaussian message passing, as descried in detail in [6]. Interestingly, in this particular example, most of the message passing can e carried out symolically, i.e., as a technique to derive closed-form expressions for the estimates. Every message in this paper is a scalar or multivariate Gaussian distriution, up to a scale factor. Sometimes, we also allow a degenerate limit of a Gaussian, such as a Gaussian with variance zero or infinity, ut we will not discuss this in detail. Scale factors can e ignored in this paper. Messages can thus e parameterized y a mean vector and a covariance matrix. For example, m X and V X denote the mean vector and the covariance matrix, respectively, of the message traveling forward on the edge X in Figure 1, while m X and V X denote the mean vector and the covariance matrix, respectively, of the message traveling acward on the edge X. Alternatively, messages can e parameterized y the precision matrix W X = the inverse of the covariance matrix V X and the precision-weighted mean vector ξx = W XmX. 17 Again, the acward message along the same edge will e denoted y reversed arrows. In a directed graphical model without cycles as in Figure 1, forward messages represent priors while acward messages represent lielihood functions up to a scale factor. In addition, we also wor with marginals of the posterior distriution i.e., the product of forward message and acward message along the same edge [5], [6]. For example, m X and V X denote the posterior mean vector and the posterior covariance matrix, respectively, of X. An important role in this paper is played y the alternative parameterization with the dual precision matrix W X and the dual mean vector ξ X = V X + 1 V X 18 = W X m X m X. 19 Message computations with all these parameterizations are given in ales I VI in Appendix A, which contain numerous improvements over the corresponding tales in [6]. V. LIEAR SAE SPACE MODELS Consider a standard linear state space model with state X R n and oservation Y R L evolving according to X = AX 1 + BU 2 Y = CX + Z 21 with A R n n, B R n m, C R L n, and where U with values in R m and Z with values in R L are independent zero-mean white Gaussian noise processes. We will usually assume, first, that L = 1, and second, that the covariance matrix of U is an identity matrix, ut these assumptions are not essential. A cycle-free factor graph of such a model is shown in Figure 2. In Section VI, we will vary and augment such models with UV priors on various quantities. Inference in such a state space model amounts to Kalman filtering and smoothing [1], [2] or, equivalently, to Gaussian message passing in the factor graph of Figure 2 [5], [6]. Estimating the input U is not usually considered in the Kalman filter literature, ut it is essential for signal processing, cf. [19], [2]. With the tales in the appendix, it is easy to put together a large variety of such algorithms. he relative merits of different such algorithms depend on the particulars

4 U B X 1 A + =, σ 2 I Z C Ỹ + X Y = y Fig. 2. One section of the factor graph of the linear state space model 2 and 21. he whole factor graph consists of many such sections and optional initial and/or terminal conditions. he dashed loc will e varied in Section VI. of the prolem. However, we find the following two algorithms usually to e the most advantageous, oth in terms of computational complexity and in terms of numerical staility. If oth the input U and output Y are scalar or can e decomposed into multiple scalar inputs and outputs, neither of these two algorithms requires a matrix inversion. he first of these algorithms is essentially the Modified Bryson Frazier MBF smoother [21] augmented with input-signal estimation. MBF Message Passing: 1 Perform forward message passing with m X and V X using II.1, II.2, III.1, III.2, V.1, V.2. his is the standard Kalman filter. 2 Perform acward message passing with ξ X and W X, eginning with ξ X = and W X = at the end of the horizon, using II.6, II.7, III.7, III.8, and either V.4, V.6, V.8 or V.5, V.7, V.9. 3 Inputs U may then e estimated using II.6, II.7, III.7, III.8, IV.9, IV he posterior mean m X and covariance matrix V X of any state X or of an individual component thereof may e otained from IV.9 and IV.13 5 Outputs Ỹ = CX may then very oviously e estimated using I.5, I.6, III.5, III.6. In step 2, the initialization with W X = corresponds to the typical situation with no a priori information aout the state X at the end of the horizon. MBF message passing is especially attractive for input signal estimation as in step 3 aove, without steps 4 and 5. he second algorithm is an exact dual to MBF message passing and especially attractive for state estimation and output signal estimation i.e., for standard Kalman smoothing, without steps 4 and 5 elow. his algorithm acward recursion with time-reversed information filter, forward recursion with marginals BIFM does not seem to e widely nown. BIFM Message Passing: 1 Perform acward message passing with ξ X and W X using I.3, I.4, III.3, III.4, and VI.1, VI.2 with the changes for the reverse direction stated in ale VI. his is a time-reversed version of the standard information filter. 2 Perform forward message passing with m X and V X using I.5, I.6, III.5, III.6, and either VI.4, VI.6, VI.8 or VI.5, VI.7, VI.9. 3 Outputs Ỹ may then very oviously e estimated using I.5, I.6, III.5, III.6. 4 he dual means ξ X and the dual precision matrices W X may e otained from IV.3 and IV.7. 5 Inputs U may then e estimated using II.6, II.7, III.7, III.8, IV.9, IV.13. VI. SPARSIY BY UV I SAE SPACE MODELS Sparse input signals are easily introduced: simply replace the normal prior on U in 2 and in Figure 2 y a UV prior, as shown in Figure 3. his approach was used in [22] to estimate the input signal U 1, U 2,... itself. However, we may also e interested in the clean output signal Ỹ = CX. For example, consider the prolem of approximating some given signal y 1, y 2,... R y constant segments, as illustrated in Figure 8. he constant segments can e represented y the simplest possile state space model with n = 1, A = C = 1, and no input. For the occasional jumps etween the constant segments, we use a sparse input signal U 1, U 2,... with a UV prior and with B = = 1 as in Figure 3. he sparsity level i.e., the numer of constant segments can e controlled y the assumed oservation noise σ 2. he sparse scalar input signal of Figure 3 can e generalized in several different directions. A first ovious generalization is to comine a primary white-noise input with a secondary sparse input as shown in Figure 4. For example, the constant segments in Figure 8 are thus generalized to random-wal segments as in Figure 9. Another generalization of Figure 8 is shown in Figure, where the constant-level segments are replaced y straight-line segments, which can e represented y a state space model of order n = 2. he corresponding input loc, with two separate sparse scalar input signals, is shown in Figure 5; the first input, U,1, affects the magnitude and the second input, U,2, affects the slope of the line model. he further generalization to polynomial segments is ovious. Continuity can e enforced y omitting the input U,1, and continuity of derivatives can e enforced liewise. More generally, Figure 5 generalized to an aritrary numer of sparse scalar input signals can e used to allow occasional jumps in individual components of the state of aritrary state space models.

5 σ... X =... C U... + X... Z Ỹ + + Z σ Fig. 3. Alternative input loc to replace the dashed ox in Figure 2 for a sparse scalar input signal U 1, U 2,.... U,1 B X... σ U,2 Fig. 4. Input loc with oth white noise and additional sparse scalar input. σ,1 U, X... σ,2 U,2 2 Fig. 5. Input loc with two separate sparse scalar inputs for two degrees of freedom such as in Figure. U,1 Ũ,1 U,2 B X... Fig. 6. Input loc allowing general sparse pulses, each with its own signature, in addition to full-ran white noise. y Fig. 7. Alternative output loc for scalar signal with outliers Fig. 8. Estimating or fitting a piecewise constant signal. Fig. 9. Estimating a random wal with occasional jumps Fig.. Approximation with straight-line segments Fig. 11. Outlier removal according to Figure

6 In all these examples, the parameters σ 2 or σ2,l can e learned as descried in Section III, and the required quantities m U and σu 2 or m U,l and σu 2,l, respectively can e computed y message passing in the pertinent factor graph as descried in Section V. A more sustantial generalization of Figure 3 is shown in Figure 6, with σ of Figure 3 generalized to R n. We mention without proof that this generalized UV prior on Ũ,1 = U,1 still promotes sparsity and can e learned y EM provided that BB has full ran [23]. his input model allows quite general events to happen, each with its own signature. he estimated nonzero vectors ˆ 1, ˆ 2,... may e viewed as features of the given signal y 1, y 2,... that can e used for further analysis. Finally, we turn to the output loc in Figure 2. A simple and effective method to detect and to remove outliers from the scalar output signal of a state space model is to replace 21 with Y = CX + Z + Z 22 with sparse Z, as shown in Figure 7 [24]. Again, the parameters σ can e estimated y EM essentially as descried in Section III, and the required quantities m Z and σ 2 Z can e computed y message passing as descried in Section V. An example of this method is shown in Figure 11 for some state space model of order n = 4 with details that are irrelevant for this paper. VII. COCLUSIO We have given improved tales of Gaussian-message computations for estimation in linear state space models, and we have pointed out two preferred message passing algorithms: the first algorithm is essentially the Modified Bryson-Frazier smoother, the second algorithm is a dual of it. In addition, we have advocated UV priors together with EM algorithms from sparse Bayesian learning for introducing sparsity into linear state space models and outlined several applications. In this paper, all factor graphs were cycle-free so that Gaussian message passing yields exact marginals. he use of UV regularization in factor graphs with cycles, and its relative merits in comparison with, e.g., AMP [25], remains to e investigated. APPEDIX A ABULAED GAUSSIA-MESSAGE COMPUAIOS ales I VI are improved versions of the corresponding tales in [6]. he notation for the different parameterizations of the messages was defined in Section IV. he main novelties of this new version are the following: 1 ew notation ξ = W m and ξ = W m. 2 Introduction of the dual marginal ξ IV.1 with pertinent new expressions in ales I V, and new expressions with the dual precision matrix, especially V.4 V.9. hese results from [2] are used oth in Appendix B and in the two preferred algorithms in Section V. ABLE I GAUSSIA MESSAGE PASSIG HROUGH A EQUALIY-COSRAI. X = Y Z Constraint X = Y = Z, expressed y factor δz x δy x ξz = ξ X + ξ Y W Z = W X + W Y ξx = ξ Y + ξ Z W X = W Y + W Z I.1 I.2 I.3 I.4 m X = m Y = m Z I.5 V X = V Y = V Z I.6 ξ X = ξ Y + ξ Z ABLE II GAUSSIA MESSAGE PASSIG HROUGH A ADDER ODE. X + Y Constraint Z = X + Y, expressed y factor δz x + y Z m Z = m X + m Y V Z = V X + V Y m X = m Z m Y V X = V Z + V Y m Z = m X + m Y I.7 II.1 II.2 II.3 II.4 II.5 ξ X = ξ Y = ξ Z II.6 W X = W Y = W Z II.7 3 ew expressions VI.4 VI.9 for the marginals, which are essential for the BIFM Kalman smoother in Section V. he proofs elow are given only for the new expressions; for the other proofs, we refer to [6]. Proof of I.7: Using IV.3, I.3, I.4, and I.5, we have ξ X = W X m X ξ X 23

7 ABLE III GAUSSIA MESSAGE PASSIG HROUGH A MARIX MULIPLIER ODE WIH ARBIRARY REAL MARIX A. ABLE V GAUSSIA MESSAGE PASSIG HROUGH A OBSERVAIO BLOCK. X A Y Constraint Y = AX, expressed y factor δy Ax X = Z A m Y = A m X V Y = A V X A ξx = A ξ Y W X = A W Y A m Y = Am X V Y = AV X A ξ X = A ξy W X = A WY A III.1 III.2 III.3 III.4 III.5 III.6 III.7 III.8 ABLE IV GAUSSIA SIGLE-EDGE MARGIALS m, V AD HEIR DUALS ξ, W. ξ X = W X m X m X IV.1 = ξ X W X m X IV.2 = W X m X ξ X IV.3 W X = V X + V X 1 IV.4 = W X V X WX IV.5 = W X W X V X WX IV.6 = W X W X V X WX IV.7 m X = V X ξ X + ξ X IV.8 = m X V X ξx IV.9 = m X + V X ξx IV. V X = W X + W X 1 IV.11 = V X WX V X IV.12 = V X V X WX V X IV.13 = V X V X WX V X IV.14 = WY + W Z mx ξy + ξ Z = WY m Y ξ Y + WZ m Z ξ Z = ξ Y + ξ Z. 26 Proof of II.6: We first note m X m X = m X + m Y m Z 27 = m Z m Z, 28 Y m Z = m X + V X A G m Y A m X V Z = V X V X A GA V X with G = V Y + A V X A 1 ξ X = F ξz + A W Y A mz m Y V.1 V.2 V.3 V.4 = F ξz + A G A m X m Y V.5 W X = F WZ F + A W Y AF V.6 = F WZ F + A GA V.7 with F = I V Z A W Y A V.8 = I V X A GA V.9 For the reverse direction, replace m Z y m X, V Z y V X, m X y m Z, V X y V Z, exchange ξ X and ξ Z, exchange W X and W Z, and change + to in V.4 and V.5. and II.6 follows from II.7. Proof of III.7: Using [6, eq. III.9], we have ξ X = W X m X m X 29 Proof of IV.9 and IV.2: have = W X mx W X mx 3 = A WY A m X A WY my 31 = A WY m Y m Y. 32 Using IV.13 and IV.12, we m X = V X ξx + V X ξx 33 V = X ξx V X WX V X + V X WX V X ξx 34 = m X V mx X WX m X 35 = m X V X ξx, 36 and IV.2 follows y multiplication with W X. Proof of V.9: From I.2 and III.4, we have W Z = W X + A W Y A, 37 from which we otain W X = W Z A W Y A 38 = W Z F. 39

8 ABLE VI GAUSSIA MESSAGE PASSIG HROUGH A IPU BLOCK. X + Z A Y ξz = ξ X + W X AH ξ Y A ξ X W Z = W X W X AHA W X with H = WY + A W X A 1 m X = F m Z + A V Y A ξ Z ξ Y = F m Z + AH A ξ X ξ Y V X = F V Z F + AV Y A F = F V Z F + AHA with F = I W Z A V Y A VI.1 VI.2 VI.3 VI.4 VI.5 VI.6 VI.7 VI.8 = I W X AHA VI.9 For the reverse direction, replace ξ Z y ξ X, W Z y W X, ξx y ξ Z, W X y W Z, exchange m X and m Z, exchange V X and V Z, and replace ξ Y y ξ Y. hus V Z WX = F and On the other hand, we have V Z = F V X. 4 V Z = I V X A GA V X 41 from V.2, and F = I V X A GA follows. Proof of V.4: Using I.7, III.7, and IV.3, we have ξ X = ξ Z + A ξy 42 = ξ Z + A WY m Y W Y my. 43 Using III.5 and IV.9, we further have m Y = Am Z 44 and inserting 45 into 43 yields V.4. = A mz V Z ξz, 45 Proof of V.5: We egin with m X = m Z. Using IV.9, we have m X V X ξx = m Z V Z ξz 46 = m X + V X A G m Y A m X V X F ξz, 47 where the second step uses V.1 and V Z = F V X = F V X from 4. Sutracting m X and multiplying y V 1 X yields V.5. Proof of V.7: We egin with V X = V Z. Using IV.13, we have V X V X WX V X = V Z V Z WZ V Z 48 = V X V X A GA V X V X F WZ F V X, 49 where the second step uses V.2 and 4. Sutracting V X and multiplying y V 1 X yields V.7. Proof of V.6: As we have already estalished V.7, we only need to prove A GA = A W Y AF. Using V.9, we have A GA = W X I F 51 = W X V Z A W Y A 52 = A W Y A V Z WX, 53 where the last step follows from A GA = A GA. Inserting 39 then yields. Proof of VI.9: From II.2 and III.2, we have V Z = V X + A V Y A, 54 from which we otain V X = V Z A V Y A 55 hus W Z V X = F and = V Z F. 56 W Z = F W X. 57 On the other hand, we have W Z = I W X AHA WX 58 from VI.2, and F = I W X AHA follows. Proof of VI.4: Using II.3, III.5, and IV.9, we have m X = m Z Am Y 59 = m Z A my V Y ξy. 6 Using II.6, III.7, and IV.2, we further have ξ Y = A ξz 61 = A ξz W Z m Z, 62 and inserting 62 into 6 yields VI.4. Proof of VI.5: We egin with ξ X = ξ Z from II.6. Using IV.2, we have ξx W X m X = ξ Z W Z m Z 63 = ξ X + W X AH ξ Y A ξ X W X F m Z, 64

9 where the second step uses VI.1 and W Z = F W X from 57. Sutracting ξ X and multiplying y V X yields VI.5. Proof of VI.7: We egin with W X = W Z from II.7. Using IV.6, we have W X W X V X WX = W Z W Z V Z WZ 65 = W X W X AHA W X W X F V Z F WX, 66 where the second step uses VI.2 and 57. Sutracting W X and multiplying y V X yields VI.7. Proof of VI.6: Since we have already estalished VI.7, we only need to prove Using VI.9, we have AHA = A V Y A F. 67 AHA = V X I F 68 = V X WZ A V Y A 69 = A V Y A W Z V X, 7 where the last step follows from AHA = AHA. Inserting 57 then yields 67. APPEDIX B MESSAGE PASSIG I FIGURE 1 AD PROOFS In this appendix, we demonstrate how all the quantities pertaining to computations mentioned in Sections II and III, as well as the proof of the theorem in Section II, are otained y symolic message passing using the tales in Appendix A. he ey ideas of this section are from [6, Section V.C]. hroughout this section, σ 1,..., σ K are fixed. A. Key Quantities ξ X and W X he pivotal quantities of this section are the dual mean vector ξũ and the dual precision matrix WŨ. Concerning the former, we have ξũ = ξ X = ξ X = ξ Y 71 = W Y y, 72 for = 1,..., K, where 71 follows from II.6, and 72 follows from since m Y =. Concerning W X, we have ξ Y = W Y m Y m Y 73 = W Y y 74 WŨ = W X = W X = W Y 75 = W as defined in 8 76 for = 1,..., K, where 75 follows from II.7, and 76 follows from W Y = V Y + V Y 1 77 with V Y = and V Y = K σ 2 + σ 2 I. 78 =1 he matrix W can e computed without matrix inversion as follows. First, we note that W X = V X + V X 1 79 = + V X 1 8 = W X. 81 Second, using VI.2, the matrix W X can e computed y the acward recursion W X 1 = W X W X σ 2 + W X 1 W X 82 starting from W XK = σ 2 I. he complexity of this alternative computation of W is On 2 K; y contrast, the direct computation of 8 using Gauss-Jordan elimination for the matrix inversion has complexity On 2 K + n 3. B. Posterior Distriution and MAP estimate of U For fixed σ 1,..., σ K, the MAP estimate of U is the mean m U of the Gaussian posterior of U. From IV.9 and III.7, we have and 72 yields m U = m U V U ξu 83 = σ 2 ξũ, 84 m U = σ 2 W y, 85 which proves 7. For re-estimating the variance σ 2 as in Section III, we also need the variance σu 2 of the posterior distriution of U. From IV.13 and III.8, we have and 76 yields σ 2 U = V U V U WU V U 86 = σ 2 σ 2 WŨ σ 2, 87 σ 2 U = σ 2 σ 2 W σ C. Lielihood Function and Bacward Message of U We now consider the acward message along the edge U, which is the lielihood function py u, σ 1,..., σ K, for fixed y and fixed σ 1,..., σ K, up to a scale factor. For use in Section B-D elow, we give two different expressions oth for the mean m U and for the variance σ 2 U of this message. As to the latter, we have W U from III.4, and thus = WŨ 89 σ 2 U = WŨ 1. 9

10 We also note from II.4 that WŨ = V X 1 + V X 1 91 Alternatively, we have = W as defined in. 92 σ 2 1 U = W U V U 93 = WŨ 1 σ 2 94 = W 1 σ 2, 95 where we used IV.4, III.8, and 76. As to the mean m U, we have ξu = ξũ 96 = WŨmŨ 97 = WŨ m X m X 1 98 = WŨ y 99 from III.3 and II.3, and thus m U = σu 2 ξu = 1 WŨ WŨ y 1 from 9. Alternatively, we have m U = 1 m U W ξu U 2 = WŨ 1 ξũ 3 = W 1 W y, 4 where we used IV.1, III.8, III.7, 72, and 76. D. Proof of the heorem in Section II Let σ 1,..., σ K e fixed at a local maximum or at a saddle point of the lielihood py σ 1,..., σ K. hen and σ = argmax py σ 1,..., σ K 5 σ σ 2 = max{, m 2 U σ 2 U } 6 from 2. From 1 and 9, we have mu 2 σu 2 WŨ y 2 1 = 2 7 WŨ WŨ With 92, it is ovious that mu 2 σu 2 if and only if 9 holds. As to 11, we have m 2 U σ 2 U = W y 2 W 2 1 W + σ 2 8 from 4 and 95. We now distinguish two cases. If σ 2 >, 6 and 8 together imply W y 2 W 2 1 W =. 9 On the other hand, if σ 2 =, 6 and 8 imply 2 W y 1 2 W W. 1 Comining these two cases yields 11. REFERECES [1] S. Roweis and Z. Ghahramani, A unifying review of linear Gaussian models, eural Computation, vol. 11, pp , Fe [2]. Kailath, A. H. Sayed, and B. Hassii, Linear Estimation. Prentice Hall, J, 2. [3] J. Durin and S. J. Koopman, ime Series Analysis y State Space Methods. Oxford Univ. Press, 212. [4] C. Bishop, Pattern Recognition and Machine Learning. Springer, 26. [5] H.-A. Loeliger, An introduction to factor graphs, IEEE Signal Proc. Mag., Jan. 24, pp [6] H.-A. Loeliger, J. Dauwels, Junli Hu, S. Korl, Li Ping, and F. R. Kschischang, he factor graph approach to model-ased signal processing, Proceedings of the IEEE, vol. 95, no. 6, pp , June 27. [7] D. J. C MacKay, Bayesian interpolation, eural Comp., vol. 4, n. 3, pp , [8] S. Gull, Bayesian inductive inference and maximum entropy, in Maximum-entropy and Bayesian Methods in Science and Engineering, G. J. Ericson and C. R. Smith, eds., Kluwer 1988, pp [9] R. M. eal, Bayesian Learning for eural etwors, ew Yor: Springer Verlag, [] M. E. ipping, Sparse Bayesian learning and the relevance vector machine, J. Machine Learning Research, vol. 1, pp , 21. [11] M. E. ipping and A. C. Faul, Fast marginal lielihood maximisation for sparse Bayesian models, Proc. 9th Int. Worshop on Artificial Intelligence and Statistics, 23. [12] D. Wipf and S. agarajan, A new view of automatic relevance determination, Advances in eural Information Processing Systems, pp , 28. [13] D. P. Wipf and B. D. Rao, Sparse Bayesian learning for asis selection, IEEE rans. Signal Proc., vol. 52, no. 8, Aug. 24, pp [14] A. P. Dempster,. M. Laird, and D. B. Ruin, Maximum lielihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, vol. 39, Series B, pp. 1 38, [15] P. Stoica and Y. Selén, Cyclic minimizers, majorization techniques, and the expectation-maximization algorithm: a refresher, IEEE Signal Proc. Mag., January 24, pp [16] Z. Ghahramani and G. E. Hinton, Parameter Estimation for Linear Dynamical Systems. echn. Report CRG-R-96-2, Univ. of oronto, [17] J. Dauwels, A. Ecford, S. Korl, and H.-A. Loeliger, Expectation maximization as message passing Part I: principles and Gaussian messages, arxiv: [18] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and the sum-product algorithm, IEEE rans. Information heory, vol. 47, pp , Fe. 21. [19] L. Bruderer and H.-A. Loeliger, Estimation of sensor input signals that are neither andlimited nor sparse, 214 Information heory & Applications Worshop IA, San Diego, CA, Fe. 9 14, 214. [2] L. Bruderer, Input Estimation and Dynamical System Identification: ew Algorithms and Results. PhD thesis at EH Zurich o 22575, 215. [21] G. J. Bierman, Factorization Methods for Discrete Sequential Estimation. ew Yor: Academic Press, [22] L. Bruderer, H. Malmerg, and H.-A. Loeliger, Deconvolution of wealy-sparse signals and dynamical-system identification y Gaussian message passing, 215 IEEE Int. Symp. on Information heory ISI, Hong Kong, June 14 19, 215. [23]. Zalmai, H. Malmerg, and H. A. Loeliger, Blind deconvolution of sparse ut filtered pulses with linear state space models, 41th IEEE Int. Conf. on Acoustics, Speech and Signal Processing ICASSP, Shanghai, China, March 2 25, 216. [24] F. Wadehn, L. Bruderer, V. Sahdeva, and H.-A. Loeliger, Outlierinsensitive Kalman smoothing and marginal message passing, in preparation. [25] D. L. Donoho, A. Malei, and A. Montanari, Message-passing algorithms for compressed sensing, Proc. ational Academy of Sciences, vol. 6, no. 45, pp , 29.

Outlier-insensitive Kalman Smoothing and Marginal Message Passing

Outlier-insensitive Kalman Smoothing and Marginal Message Passing Outlier-insensitive Kalman Smoothing and Marginal Message Passing Federico Wadehn, Vijay Sahdeva, Luas Bruderer, Hang Yu, Justin Dauwels and Hans-Andrea Loeliger Department of Information Technology and

More information

Outlier-insensitive Kalman Smoothing and Marginal Message Passing

Outlier-insensitive Kalman Smoothing and Marginal Message Passing Outlier-insensitive Kalman Smoothing and Marginal Message Passing Federico Wadehn, Luas Bruderer, Justin Dauwels, Vijay Sahdeva, Hang Yu, and Hans-Andrea Loeliger Dept. of Information Technology and Electrical

More information

Gaussian Message Passing on Linear Models: An Update

Gaussian Message Passing on Linear Models: An Update Int. Symp. on Turbo Codes & Related Topics, pril 2006 Gaussian Message Passing on Linear Models: n Update Hans-ndrea Loeliger 1, Junli Hu 1, Sascha Korl 2, Qinghua Guo 3, and Li Ping 3 1 Dept. of Information

More information

Gaussian message passing on linear models: an update

Gaussian message passing on linear models: an update University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part Faculty of Engineering and Information Sciences 2006 Gaussian message passing on linear models: an

More information

A STATE-SPACE APPROACH FOR THE ANALYSIS OF WAVE AND DIFFUSION FIELDS

A STATE-SPACE APPROACH FOR THE ANALYSIS OF WAVE AND DIFFUSION FIELDS ICASSP 2015 A STATE-SPACE APPROACH FOR THE ANALYSIS OF WAVE AND DIFFUSION FIELDS Stefano Maranò Donat Fäh Hans-Andrea Loeliger ETH Zurich, Swiss Seismological Service, 8092 Zürich ETH Zurich, Dept. Information

More information

arxiv:cs/ v2 [cs.it] 1 Oct 2006

arxiv:cs/ v2 [cs.it] 1 Oct 2006 A General Computation Rule for Lossy Summaries/Messages with Examples from Equalization Junli Hu, Hans-Andrea Loeliger, Justin Dauwels, and Frank Kschischang arxiv:cs/060707v [cs.it] 1 Oct 006 Abstract

More information

ON CONVERGENCE PROPERTIES OF MESSAGE-PASSING ESTIMATION ALGORITHMS. Justin Dauwels

ON CONVERGENCE PROPERTIES OF MESSAGE-PASSING ESTIMATION ALGORITHMS. Justin Dauwels ON CONVERGENCE PROPERTIES OF MESSAGE-PASSING ESTIMATION ALGORITHMS Justin Dauwels Amari Research Unit, RIKEN Brain Science Institute, Wao-shi, 351-0106, Saitama, Japan email: justin@dauwels.com ABSTRACT

More information

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers Module 9: Further Numers and Equations Lesson Aims The aim of this lesson is to enale you to: wor with rational and irrational numers wor with surds to rationalise the denominator when calculating interest,

More information

Solving Systems of Linear Equations Symbolically

Solving Systems of Linear Equations Symbolically " Solving Systems of Linear Equations Symolically Every day of the year, thousands of airline flights crisscross the United States to connect large and small cities. Each flight follows a plan filed with

More information

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2 Least squares: Mathematical theory Below we provide the "vector space" formulation, and solution, of the least squares prolem. While not strictly necessary until we ring in the machinery of matrix algera,

More information

Steepest descent on factor graphs

Steepest descent on factor graphs Steepest descent on factor graphs Justin Dauwels, Sascha Korl, and Hans-Andrea Loeliger Abstract We show how steepest descent can be used as a tool for estimation on factor graphs. From our eposition,

More information

Particle Methods as Message Passing

Particle Methods as Message Passing Particle Methods as Message Passing Justin Dauwels RIKEN Brain Science Institute Hirosawa,2-1,Wako-shi,Saitama,Japan Email: justin@dauwels.com Sascha Korl Phonak AG CH-8712 Staefa, Switzerland Email: sascha.korl@phonak.ch

More information

SAMPLING JITTER CORRECTION USING FACTOR GRAPHS

SAMPLING JITTER CORRECTION USING FACTOR GRAPHS 19th European Signal Processing Conference EUSIPCO 11) Barcelona, Spain, August 9 - September, 11 SAMPLING JITTER CORRECTION USING FACTOR GRAPHS Lukas Bolliger and Hans-Andrea Loeliger ETH Zurich Dept.

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Depth versus Breadth in Convolutional Polar Codes

Depth versus Breadth in Convolutional Polar Codes Depth versus Breadth in Convolutional Polar Codes Maxime Tremlay, Benjamin Bourassa and David Poulin,2 Département de physique & Institut quantique, Université de Sherrooke, Sherrooke, Quéec, Canada JK

More information

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances Journal of Mechanical Engineering and Automation (): 6- DOI: 593/jjmea Cramér-Rao Bounds for Estimation of Linear System oise Covariances Peter Matiso * Vladimír Havlena Czech echnical University in Prague

More information

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line Chapter 27: Inferences for Regression And so, there is one more thing which might vary one more thing aout which we might want to make some inference: the slope of the least squares regression line. The

More information

Critical value of the total debt in view of the debts. durations

Critical value of the total debt in view of the debts. durations Critical value of the total det in view of the dets durations I.A. Molotov, N.A. Ryaova N.V.Pushov Institute of Terrestrial Magnetism, the Ionosphere and Radio Wave Propagation, Russian Academy of Sciences,

More information

Math 216 Second Midterm 28 March, 2013

Math 216 Second Midterm 28 March, 2013 Math 26 Second Midterm 28 March, 23 This sample exam is provided to serve as one component of your studying for this exam in this course. Please note that it is not guaranteed to cover the material that

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson

PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS Yngve Selén and Eri G Larsson Dept of Information Technology Uppsala University, PO Box 337 SE-71 Uppsala, Sweden email: yngveselen@ituuse

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

An EM algorithm for Gaussian Markov Random Fields

An EM algorithm for Gaussian Markov Random Fields An EM algorithm for Gaussian Markov Random Fields Will Penny, Wellcome Department of Imaging Neuroscience, University College, London WC1N 3BG. wpenny@fil.ion.ucl.ac.uk October 28, 2002 Abstract Lavine

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Latent Tree Approximation in Linear Model

Latent Tree Approximation in Linear Model Latent Tree Approximation in Linear Model Navid Tafaghodi Khajavi Dept. of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 Email: navidt@hawaii.edu ariv:1710.01838v1 [cs.it] 5 Oct 2017

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA. Cross River State, Nigeria

ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA. Cross River State, Nigeria ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA Thomas Adidaume Uge and Stephen Seastian Akpan, Department Of Mathematics/Statistics And Computer

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

FinQuiz Notes

FinQuiz Notes Reading 9 A time series is any series of data that varies over time e.g. the quarterly sales for a company during the past five years or daily returns of a security. When assumptions of the regression

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Real option valuation for reserve capacity

Real option valuation for reserve capacity Real option valuation for reserve capacity MORIARTY, JM; Palczewski, J doi:10.1016/j.ejor.2016.07.003 For additional information aout this pulication click this link. http://qmro.qmul.ac.uk/xmlui/handle/123456789/13838

More information

Simple Examples. Let s look at a few simple examples of OI analysis.

Simple Examples. Let s look at a few simple examples of OI analysis. Simple Examples Let s look at a few simple examples of OI analysis. Example 1: Consider a scalar prolem. We have one oservation y which is located at the analysis point. We also have a ackground estimate

More information

Algorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model

Algorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model BULGARIAN ACADEMY OF SCIENCES CYBERNEICS AND INFORMAION ECHNOLOGIES Volume No Sofia Algorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model sonyo Slavov Department of Automatics

More information

Probabilistic Modelling and Reasoning: Assignment Modelling the skills of Go players. Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham

Probabilistic Modelling and Reasoning: Assignment Modelling the skills of Go players. Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham Mechanics Proailistic Modelling and Reasoning: Assignment Modelling the skills of Go players Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham Due in 16:00 Monday 7 th March 2016. Marks: This

More information

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS Justin Dauwels Dept. of Information Technology and Electrical Engineering ETH, CH-8092 Zürich, Switzerland dauwels@isi.ee.ethz.ch

More information

Vector Spaces. EXAMPLE: Let R n be the set of all n 1 matrices. x 1 x 2. x n

Vector Spaces. EXAMPLE: Let R n be the set of all n 1 matrices. x 1 x 2. x n Vector Spaces DEFINITION: A vector space is a nonempty set V of ojects, called vectors, on which are defined two operations, called addition and multiplication y scalars (real numers), suject to the following

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO The Annals of Statistics 2006, Vol. 34, No. 3, 1436 1462 DOI: 10.1214/009053606000000281 Institute of Mathematical Statistics, 2006 HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO BY NICOLAI

More information

Steepest Descent as Message Passing

Steepest Descent as Message Passing In the roc of IEEE ISOC ITW2005 on Coding Compleity; editor MJ Dinneen; co-chairs U Speidel D Taylor; pages 2-6 Steepest Descent as Message assing Justin Dauwels, Sascha Korl, Hans-Andrea Loeliger Dept

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Multiscale Systems Engineering Research Group

Multiscale Systems Engineering Research Group Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden

More information

Technical Appendix: Imperfect information and the business cycle

Technical Appendix: Imperfect information and the business cycle Technical Appendix: Imperfect information and the usiness cycle Farice Collard Harris Dellas Frank Smets March 29 We would like to thank Marty Eichenaum, Jesper Lindé, Thomas Luik, Frank Schorfheide and

More information

arxiv: v1 [math.nt] 27 May 2018

arxiv: v1 [math.nt] 27 May 2018 SETTLIG SOME SUM SUPPOSITIOS TAAY WAKHARE AD CHRISTOPHE VIGAT arxiv:180510569v1 matht] 7 May 018 Astract We solve multiple conjectures y Byszewsi and Ulas aout the sum of ase digits function In order to

More information

A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network

A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network Chandrashekhar Thejaswi PS Douglas Cochran and Junshan Zhang Department of Electrical Engineering Arizona State

More information

Essential Maths 1. Macquarie University MAFC_Essential_Maths Page 1 of These notes were prepared by Anne Cooper and Catriona March.

Essential Maths 1. Macquarie University MAFC_Essential_Maths Page 1 of These notes were prepared by Anne Cooper and Catriona March. Essential Maths 1 The information in this document is the minimum assumed knowledge for students undertaking the Macquarie University Masters of Applied Finance, Graduate Diploma of Applied Finance, and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES

NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES 2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING NON-LINEAR NOISE ADAPTIVE KALMAN FILTERING VIA VARIATIONAL BAYES Simo Särä Aalto University, 02150 Espoo, Finland Jouni Hartiainen

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Structuring Unreliable Radio Networks

Structuring Unreliable Radio Networks Structuring Unreliale Radio Networks Keren Censor-Hillel Seth Gilert Faian Kuhn Nancy Lynch Calvin Newport March 29, 2011 Astract In this paper we study the prolem of uilding a connected dominating set

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

Dual Estimation and the Unscented Transformation

Dual Estimation and the Unscented Transformation Dual Estimation and the Unscented Transformation Eric A. Wan ericwan@ece.ogi.edu Rudolph van der Merwe rudmerwe@ece.ogi.edu Alex T. Nelson atnelson@ece.ogi.edu Oregon Graduate Institute of Science & Technology

More information

IN this paper, we consider the estimation of the frequency

IN this paper, we consider the estimation of the frequency Iterative Frequency Estimation y Interpolation on Fourier Coefficients Elias Aoutanios, MIEEE, Bernard Mulgrew, MIEEE Astract The estimation of the frequency of a complex exponential is a prolem that is

More information

1Number ONLINE PAGE PROOFS. systems: real and complex. 1.1 Kick off with CAS

1Number ONLINE PAGE PROOFS. systems: real and complex. 1.1 Kick off with CAS 1Numer systems: real and complex 1.1 Kick off with CAS 1. Review of set notation 1.3 Properties of surds 1. The set of complex numers 1.5 Multiplication and division of complex numers 1.6 Representing

More information

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam Symmetry and Quantum Information Feruary 6, 018 Representation theory of S(), density operators, purification Lecture 7 Michael Walter, niversity of Amsterdam Last week, we learned the asic concepts of

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Mathematics Background

Mathematics Background UNIT OVERVIEW GOALS AND STANDARDS MATHEMATICS BACKGROUND UNIT INTRODUCTION Patterns of Change and Relationships The introduction to this Unit points out to students that throughout their study of Connected

More information

Extending Monte Carlo Methods to Factor Graphs with Negative and Complex Factors

Extending Monte Carlo Methods to Factor Graphs with Negative and Complex Factors ITW 202 Extending Monte Carlo Methods to Factor Graphs with Negative and Complex Factors Mehdi Molkaraie and Hans-Andrea Loeliger ETH Zurich Dept. of Information Technology & Electrical Engineering 8092

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Chapter 2 Canonical Correlation Analysis

Chapter 2 Canonical Correlation Analysis Chapter 2 Canonical Correlation Analysis Canonical correlation analysis CCA, which is a multivariate analysis method, tries to quantify the amount of linear relationships etween two sets of random variales,

More information

On the Slow Convergence of EM and VBEM in Low-Noise Linear Models

On the Slow Convergence of EM and VBEM in Low-Noise Linear Models NOTE Communicated by Zoubin Ghahramani On the Slow Convergence of EM and VBEM in Low-Noise Linear Models Kaare Brandt Petersen kbp@imm.dtu.dk Ole Winther owi@imm.dtu.dk Lars Kai Hansen lkhansen@imm.dtu.dk

More information

RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK

RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK TRNKA PAVEL AND HAVLENA VLADIMÍR Dept of Control Engineering, Czech Technical University, Technická 2, 166 27 Praha, Czech Republic mail:

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

LET us consider the classical linear dynamical stochastic

LET us consider the classical linear dynamical stochastic IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 8, AUGUST 2006 2957 Kalman Filtering in Triplet Markov Chains Boujemaa Ait-El-Fquih and François Desbouvries, Member, IEEE Abstract Let x = x IN be

More information

STOCHASTIC STABILITY FOR MODEL-BASED NETWORKED CONTROL SYSTEMS

STOCHASTIC STABILITY FOR MODEL-BASED NETWORKED CONTROL SYSTEMS Luis Montestruque, Panos J.Antsalis, Stochastic Stability for Model-Based etwored Control Systems, Proceedings of the 3 American Control Conference, pp. 49-44, Denver, Colorado, June 4-6, 3. SOCHASIC SABILIY

More information

Travel Grouping of Evaporating Polydisperse Droplets in Oscillating Flow- Theoretical Analysis

Travel Grouping of Evaporating Polydisperse Droplets in Oscillating Flow- Theoretical Analysis Travel Grouping of Evaporating Polydisperse Droplets in Oscillating Flow- Theoretical Analysis DAVID KATOSHEVSKI Department of Biotechnology and Environmental Engineering Ben-Gurion niversity of the Negev

More information

PERIODIC KALMAN FILTER: STEADY STATE FROM THE BEGINNING

PERIODIC KALMAN FILTER: STEADY STATE FROM THE BEGINNING Journal of Mathematical Sciences: Advances and Applications Volume 1, Number 3, 2008, Pages 505-520 PERIODIC KALMAN FILER: SEADY SAE FROM HE BEGINNING MARIA ADAM 1 and NICHOLAS ASSIMAKIS 2 1 Department

More information

Comments on A Time Delay Controller for Systems with Uncertain Dynamics

Comments on A Time Delay Controller for Systems with Uncertain Dynamics Comments on A Time Delay Controller for Systems with Uncertain Dynamics Qing-Chang Zhong Dept. of Electrical & Electronic Engineering Imperial College of Science, Technology, and Medicine Exhiition Rd.,

More information

The optimal filtering of a class of dynamic multiscale systems

The optimal filtering of a class of dynamic multiscale systems Science in China Ser. F Information Sciences 2004 Vol.47 No.4 50 57 50 he optimal filtering of a class of dynamic multiscale systems PAN Quan, ZHANG Lei, CUI Peiling & ZHANG Hongcai Department of Automatic

More information

1. Define the following terms (1 point each): alternative hypothesis

1. Define the following terms (1 point each): alternative hypothesis 1 1. Define the following terms (1 point each): alternative hypothesis One of three hypotheses indicating that the parameter is not zero; one states the parameter is not equal to zero, one states the parameter

More information

Sub-Gaussian Model Based LDPC Decoder for SαS Noise Channels

Sub-Gaussian Model Based LDPC Decoder for SαS Noise Channels Sub-Gaussian Model Based LDPC Decoder for SαS Noise Channels Iulian Topor Acoustic Research Laboratory, Tropical Marine Science Institute, National University of Singapore, Singapore 119227. iulian@arl.nus.edu.sg

More information

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof)

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof) JOURNAL OF MODERN DYNAMICS VOLUME 4, NO. 4, 010, 637 691 doi: 10.3934/jmd.010.4.637 STRUCTURE OF ATTRACTORS FOR (a, )-CONTINUED FRACTION TRANSFORMATIONS SVETLANA KATOK AND ILIE UGARCOVICI (Communicated

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

FIR Filters for Stationary State Space Signal Models

FIR Filters for Stationary State Space Signal Models Proceedings of the 17th World Congress The International Federation of Automatic Control FIR Filters for Stationary State Space Signal Models Jung Hun Park Wook Hyun Kwon School of Electrical Engineering

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Parameter Estimation in a Moving Horizon Perspective

Parameter Estimation in a Moving Horizon Perspective Parameter Estimation in a Moving Horizon Perspective State and Parameter Estimation in Dynamical Systems Reglerteknik, ISY, Linköpings Universitet State and Parameter Estimation in Dynamical Systems OUTLINE

More information

Lecture 6 January 15, 2014

Lecture 6 January 15, 2014 Advanced Graph Algorithms Jan-Apr 2014 Lecture 6 January 15, 2014 Lecturer: Saket Sourah Scrie: Prafullkumar P Tale 1 Overview In the last lecture we defined simple tree decomposition and stated that for

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables Minimizing a convex separale exponential function suect to linear equality constraint and ounded variales Stefan M. Stefanov Department of Mathematics Neofit Rilski South-Western University 2700 Blagoevgrad

More information

DEPARTMENT OF ECONOMICS

DEPARTMENT OF ECONOMICS ISSN 089-64 ISBN 978 0 7340 405 THE UNIVERSITY OF MELBOURNE DEPARTMENT OF ECONOMICS RESEARCH PAPER NUMBER 06 January 009 Notes on the Construction of Geometric Representations of Confidence Intervals of

More information

For final project discussion every afternoon Mark and I will be available

For final project discussion every afternoon Mark and I will be available Worshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

DIFFUSION NETWORKS, PRODUCT OF EXPERTS, AND FACTOR ANALYSIS. Tim K. Marks & Javier R. Movellan

DIFFUSION NETWORKS, PRODUCT OF EXPERTS, AND FACTOR ANALYSIS. Tim K. Marks & Javier R. Movellan DIFFUSION NETWORKS PRODUCT OF EXPERTS AND FACTOR ANALYSIS Tim K Marks & Javier R Movellan University of California San Diego 9500 Gilman Drive La Jolla CA 92093-0515 ABSTRACT Hinton (in press) recently

More information

Optimal Routing in Chord

Optimal Routing in Chord Optimal Routing in Chord Prasanna Ganesan Gurmeet Singh Manku Astract We propose optimal routing algorithms for Chord [1], a popular topology for routing in peer-to-peer networks. Chord is an undirected

More information

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION

RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T DISTRIBUTION 1 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 3 6, 1, SANTANDER, SPAIN RECURSIVE OUTLIER-ROBUST FILTERING AND SMOOTHING FOR NONLINEAR SYSTEMS USING THE MULTIVARIATE STUDENT-T

More information

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional

More information