Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )
|
|
- Emmeline Small
- 5 years ago
- Views:
Transcription
1 Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning the LDS with ASOS June 24, / 21
2 The Linear Dynamical System model model of vector-valued time-series {y t R Ny } T t=1 James Martens (U of T) Learning the LDS with ASOS June 24, / 21
3 The Linear Dynamical System model model of vector-valued time-series {y t R Ny } T t=1 widely-applied due to predictable behavior, easy inference, etc James Martens (U of T) Learning the LDS with ASOS June 24, / 21
4 The Linear Dynamical System model model of vector-valued time-series {y t R Ny } T t=1 widely-applied due to predictable behavior, easy inference, etc vector-valued hidden states ({x t R Nx } T t=1 ) evolve via linear dynamics, x t+1 = Ax t + ɛ t A R Nx Nx ɛ t N(0, Q) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
5 The Linear Dynamical System model model of vector-valued time-series {y t R Ny } T t=1 widely-applied due to predictable behavior, easy inference, etc vector-valued hidden states ({x t R Nx } T t=1 ) evolve via linear dynamics, x t+1 = Ax t + ɛ t A R Nx Nx ɛ t N(0, Q) linearly generated observations: y t = Cx t + δ t C R Ny Nx δ t N(0, R) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
6 Learning the LDS Expectation Maximization (EM) finds local optimum of log-likelihood pretty slow - convergence requires lots of iterations and E-step is expensive James Martens (U of T) Learning the LDS with ASOS June 24, / 21
7 Learning the LDS Expectation Maximization (EM) finds local optimum of log-likelihood pretty slow - convergence requires lots of iterations and E-step is expensive Subspace identification hidden states estimated directly from the data, and the parameters from these asymptotically unbiased / consistent non-iterative algorithm, but solution not optimal in any objective good way to initialize EM or other iterative optimizers James Martens (U of T) Learning the LDS with ASOS June 24, / 21
8 Our contribution accelerate the EM algorithm by reducing its per-iteration cost to be constant time w.r.t. T (length of the time-series) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
9 Our contribution accelerate the EM algorithm by reducing its per-iteration cost to be constant time w.r.t. T (length of the time-series) key idea: approximate the inference done in the E-step James Martens (U of T) Learning the LDS with ASOS June 24, / 21
10 Our contribution accelerate the EM algorithm by reducing its per-iteration cost to be constant time w.r.t. T (length of the time-series) key idea: approximate the inference done in the E-step E-step approximation is unbiased and asymptotically consistent also convergences exponentially with L, where L is a meta-parameter that trades off approximation quality with speed (notation change: L is k lim from the paper) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
11 Learning via E.M. the Algorithm E.M. Objective Function At each iteration we maximize the following objective where θ n is the current parameter estimate: Q n (θ) = E θn [log p(x, y) y] = p(x y, θ n ) log p(x, y θ) x James Martens (U of T) Learning the LDS with ASOS June 24, / 21
12 Learning via E.M. the Algorithm E.M. Objective Function At each iteration we maximize the following objective where θ n is the current parameter estimate: Q n (θ) = E θn [log p(x, y) y] = p(x y, θ n ) log p(x, y θ) x E-Step E-Step computes expectation of log p(x, y θ) under p(x y, θ n ) uses the classical Kalman filtering/smoothing algorithm James Martens (U of T) Learning the LDS with ASOS June 24, / 21
13 Learning via E.M. the Algorithm (cont.) M-Step maximize objective Q n (θ) w.r.t. to θ, producing a new estimate θ n+1 θ n+1 = arg max Q n (θ) θ very easy - similar to linear-regression James Martens (U of T) Learning the LDS with ASOS June 24, / 21
14 Learning via E.M. the Algorithm (cont.) M-Step maximize objective Q n (θ) w.r.t. to θ, producing a new estimate θ n+1 θ n+1 = arg max Q n (θ) θ very easy - similar to linear-regression Problem EM can get very slow for when we have lots of data mainly due to call to expensive Kalman filter/smoother in the E-step O(N 3 x T ) where T = length of the training time-series, N x = hidden state dim. James Martens (U of T) Learning the LDS with ASOS June 24, / 21
15 the Kalman filter/smoother estimates hidden-state means and covariances: x k t E θn [ x t y k ] V k t,s Cov θn [ x t, x s y k ] for each t = {1,..., T } and s = t, t + 1. James Martens (U of T) Learning the LDS with ASOS June 24, / 21
16 the Kalman filter/smoother estimates hidden-state means and covariances: x k t E θn [ x t y k ] V k t,s Cov θn [ x t, x s y k ] for each t = {1,..., T } and s = t, t + 1. these are summed over time to obtain the statistics required for M-step, e.g.: T 1 E θn [x t+1 x t y k ] = (x T, x T ) 1 + where (a, b) k T k t=1 a t+kb t t=1 V T t+1,t James Martens (U of T) Learning the LDS with ASOS June 24, / 21
17 the Kalman filter/smoother estimates hidden-state means and covariances: x k t E θn [ x t y k ] V k t,s Cov θn [ x t, x s y k ] for each t = {1,..., T } and s = t, t + 1. these are summed over time to obtain the statistics required for M-step, e.g.: T 1 E θn [x t+1 x t y k ] = (x T, x T ) 1 + where (a, b) k T k t=1 a t+kb t t=1 V T t+1,t but we only care about the M-statistics, not the individual inferences for each time-step so let s estimate these directly! James Martens (U of T) Learning the LDS with ASOS June 24, / 21
18 Steady-state first we need a basic tool from linear systems/control theory: steady-state James Martens (U of T) Learning the LDS with ASOS June 24, / 21
19 Steady-state first we need a basic tool from linear systems/control theory: steady-state the covariance terms, and the filtering and smoothing matrices (denoted K t and J t ) do not depend on the data y - only the current parameters James Martens (U of T) Learning the LDS with ASOS June 24, / 21
20 Steady-state first we need a basic tool from linear systems/control theory: steady-state the covariance terms, and the filtering and smoothing matrices (denoted K t and J t ) do not depend on the data y - only the current parameters and they rapidly converge to steady-state values: V T t,t, V T t,t 1, J t, K t Λ 0, Λ 1, J, K as min(t, T t) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
21 we can approximate the Kalman filter/smoother equations using the steady-state matrices James Martens (U of T) Learning the LDS with ASOS June 24, / 21
22 we can approximate the Kalman filter/smoother equations using the steady-state matrices this gives the highly simplified recurrences x t = Hx t 1 + Ky t x T t = Jx T t+1 + Px t where x t x t t E θn [ x t y t ], H A KCA and P I JA James Martens (U of T) Learning the LDS with ASOS June 24, / 21
23 we can approximate the Kalman filter/smoother equations using the steady-state matrices this gives the highly simplified recurrences x t = Hx t 1 + Ky t x T t = Jx T t+1 + Px t where x t x t t E θn [ x t y t ], H A KCA and P I JA these don t require any matrix multiplications or inversions James Martens (U of T) Learning the LDS with ASOS June 24, / 21
24 we can approximate the Kalman filter/smoother equations using the steady-state matrices this gives the highly simplified recurrences x t = Hx t 1 + Ky t x T t = Jx T t+1 + Px t where x t x t t E θn [ x t y t ], H A KCA and P I JA these don t require any matrix multiplications or inversions we apply the approximate filter/smoother everywhere except first and last i time-steps yields a run-time of O(N 2 x T + N 3 x i). James Martens (U of T) Learning the LDS with ASOS June 24, / 21
25 ASOS: Approximated Second-Order Statistics steady-state makes covariance terms easy to estimate in time independent of T James Martens (U of T) Learning the LDS with ASOS June 24, / 21
26 ASOS: Approximated Second-Order Statistics steady-state makes covariance terms easy to estimate in time independent of T we want something similar for sum-of-products of means terms like (x T, x T ) 0 t x T t (x T t ) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
27 ASOS: Approximated Second-Order Statistics steady-state makes covariance terms easy to estimate in time independent of T we want something similar for sum-of-products of means terms like (x T, x T ) 0 t x T t (x T t ) such sums we will call 2 nd -order statistics. The ones-needed for the M-step are the M-statistics James Martens (U of T) Learning the LDS with ASOS June 24, / 21
28 ASOS: Approximated Second-Order Statistics steady-state makes covariance terms easy to estimate in time independent of T we want something similar for sum-of-products of means terms like (x T, x T ) 0 t x T t (x T t ) such sums we will call 2 nd -order statistics. The ones-needed for the M-step are the M-statistics idea #1: derive recursions and equations that relate the 2 nd -order statistics of different time-lags time-lag refers to the value of k in (a, b) k T k t=1 a t+kb t James Martens (U of T) Learning the LDS with ASOS June 24, / 21
29 ASOS: Approximated Second-Order Statistics steady-state makes covariance terms easy to estimate in time independent of T we want something similar for sum-of-products of means terms like (x T, x T ) 0 t x T t (x T t ) such sums we will call 2 nd -order statistics. The ones-needed for the M-step are the M-statistics idea #1: derive recursions and equations that relate the 2 nd -order statistics of different time-lags time-lag refers to the value of k in (a, b) k T k t=1 a t+kb t idea #2: evaluate these efficiently using approximations James Martens (U of T) Learning the LDS with ASOS June 24, / 21
30 Deriving the 2 nd -order recursions/equations: An example suppose we wish to find the recursion for (x, y) k James Martens (U of T) Learning the LDS with ASOS June 24, / 21
31 Deriving the 2 nd -order recursions/equations: An example suppose we wish to find the recursion for (x, y) k steady-state Kalman recursion for x t+k is: x t+k = Hx t+k 1 + Ky t+k right-multiply both sides by y t and sum over t T k T (x, y) k xt+k y k t = (Hxt+k 1 y t + Ky t+k y t) t=1 t=1 James Martens (U of T) Learning the LDS with ASOS June 24, / 21
32 Deriving the 2 nd -order recursions/equations: An example suppose we wish to find the recursion for (x, y) k steady-state Kalman recursion for x t+k is: x t+k = Hx t+k 1 + Ky t+k right-multiply both sides by y t and sum over t factor out matrices H and K T k T (x, y) k xt+k y k t = (Hxt+k 1 y t + Ky t+k y t) t=1 = H T k t=1 t=1 x t+k 1 y t + K T k t=1 y t+k y t James Martens (U of T) Learning the LDS with ASOS June 24, / 21
33 Deriving the 2 nd -order recursions/equations: An example suppose we wish to find the recursion for (x, y) k steady-state Kalman recursion for x t+k is: x t+k = Hx t+k 1 + Ky t+k right-multiply both sides by y t and sum over t factor out matrices H and K finally, re-write everything using our special notation for 2 nd -order statistics: (a, b) k T k t=1 a t+kb t T k T (x, y) k xt+k y k t = (Hxt+k 1 y t + Ky t+k y t) t=1 = H T k t=1 t=1 x t+k 1 y t + K T k t=1 y t+k y t = H( (x, y) k 1 x T y T k+1 ) + K (y, y) k James Martens (U of T) Learning the LDS with ASOS June 24, / 21
34 The complete list (don t bother to memorize this) The recursions: The equations: (y, x ) k = (y, x ) k+1 H + ((y, y) k y 1+k y 1 )K + y 1+k x 1 (x, y) k = H((x, y) k 1 x T y T k+1 ) + K(y, y) k (x, x ) k = (x, x ) k+1 H + ((x, y) k x 1+k y 1 )K + x 1+k x 1 (x, x ) k = H((x, x ) k 1 x T x T k+1 ) + K(y, x ) k (x T, y) k = J(x T, y) k+1 + P((x, y) k x T y T k ) + x T T y T k (x T, x ) k = J(x T, x ) k+1 + P((x, x ) k x T x T k ) + x T T x T k (x T, x T ) k = ((x T, x T ) k 1 x T k xt 1 )J + (x T, x ) k P (x T, x T ) k = J(x T, x T ) k+1 + P((x, x T ) k xt xt T k ) + x T T xt T k (x, x ) k = H(x, x ) k H + ((x, y) k x 1+k y 1 )K Hx T x T k H + K(y, x ) k+1 H + x 1+k x 1 (x T, x T ) k = J(x T, x T ) k J + P((x, x T ) k xt xt T k ) Jx T k+1 x1 T J + J(x T, x ) k+1 P + xt T xt T k James Martens (U of T) Learning the LDS with ASOS June 24, / 21
35 noting that statistics of time-lag T + 1 are 0 by definition we can start the 2 nd -order recursions at t = T but this doesn t get us anywhere - would be even more expensive than the usual Kalman recursions on the 1 st -order terms James Martens (U of T) Learning the LDS with ASOS June 24, / 21
36 noting that statistics of time-lag T + 1 are 0 by definition we can start the 2 nd -order recursions at t = T but this doesn t get us anywhere - would be even more expensive than the usual Kalman recursions on the 1 st -order terms instead, start the recursions at time-lag L with unbiased approximations ( ASOS approximations ) (y, x ) L+1 CA ( (x, x ) L x T x T L ), (x T, x ) L (x, x ) L, (x T, y) L (x, y) L James Martens (U of T) Learning the LDS with ASOS June 24, / 21
37 noting that statistics of time-lag T + 1 are 0 by definition we can start the 2 nd -order recursions at t = T but this doesn t get us anywhere - would be even more expensive than the usual Kalman recursions on the 1 st -order terms instead, start the recursions at time-lag L with unbiased approximations ( ASOS approximations ) (y, x ) L+1 CA ( (x, x ) L x T x T L ), (x T, x ) L (x, x ) L, (x T, y) L (x, y) L we also need xt T for t {1, 2,..., L} {T L, T L+1,..., T } but these can be approximated by a separate procedure (see paper) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
38 Why might this be reasonable? 2 nd -order statistics with large time lag quantify relationships between variables that are far apart in time weaker and less important than relationships between variables that are close in time James Martens (U of T) Learning the LDS with ASOS June 24, / 21
39 Why might this be reasonable? 2 nd -order statistics with large time lag quantify relationships between variables that are far apart in time weaker and less important than relationships between variables that are close in time in steady-state Kalman recursions, information is propagated via multiplication by H and J: x t = Hx t 1 + Ky t x T t = Jx T t+1 + Px t both of these have spectral radius (denoted σ( )) less than 1, and so they decay the signal exponentially James Martens (U of T) Learning the LDS with ASOS June 24, / 21
40 Procedure for estimating the M-statistics how do we compute an estimate of the M-statistics consistent with the 2nd-order recursions/equations and the approximations? James Martens (U of T) Learning the LDS with ASOS June 24, / 21
41 Procedure for estimating the M-statistics how do we compute an estimate of the M-statistics consistent with the 2nd-order recursions/equations and the approximations? essentially it is just a large linear system of dimension O(N 2 x L) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
42 Procedure for estimating the M-statistics how do we compute an estimate of the M-statistics consistent with the 2nd-order recursions/equations and the approximations? essentially it is just a large linear system of dimension O(N 2 x L) but using a general solver would be far too expensive: O(N 6 x L 3 ) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
43 Procedure for estimating the M-statistics how do we compute an estimate of the M-statistics consistent with the 2nd-order recursions/equations and the approximations? essentially it is just a large linear system of dimension O(N 2 x L) but using a general solver would be far too expensive: O(N 6 x L 3 ) fortunately, using the special structure of this system, we have developed a (non-trivial) algorithm which is much more efficient equations can be solved using an efficient iterative algorithm we developed for a generalization of the Sylvester equation evaluating recursions is then straightforward James Martens (U of T) Learning the LDS with ASOS June 24, / 21
44 Procedure for estimating the M-statistics how do we compute an estimate of the M-statistics consistent with the 2nd-order recursions/equations and the approximations? essentially it is just a large linear system of dimension O(N 2 x L) but using a general solver would be far too expensive: O(N 6 x L 3 ) fortunately, using the special structure of this system, we have developed a (non-trivial) algorithm which is much more efficient equations can be solved using an efficient iterative algorithm we developed for a generalization of the Sylvester equation evaluating recursions is then straightforward the cost is then just O(N 3 x L) after (y, y) k t y t+ky t has been pre-computed for k = 0,..., L James Martens (U of T) Learning the LDS with ASOS June 24, / 21
45 First convergence result: our intuition confirmed First result: For a fixed θ the l 2 -error in the M-statistics is bounded by a quantity proportional to L 2 λ L 1, where λ = σ(h) = σ(j) < 1 (σ( ) denotes the spectral radius) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
46 First convergence result: our intuition confirmed First result: For a fixed θ the l 2 -error in the M-statistics is bounded by a quantity proportional to L 2 λ L 1, where λ = σ(h) = σ(j) < 1 (σ( ) denotes the spectral radius) so as L grows, the estimation error for the M-statistics will decay exponentially James Martens (U of T) Learning the LDS with ASOS June 24, / 21
47 First convergence result: our intuition confirmed First result: For a fixed θ the l 2 -error in the M-statistics is bounded by a quantity proportional to L 2 λ L 1, where λ = σ(h) = σ(j) < 1 (σ( ) denotes the spectral radius) so as L grows, the estimation error for the M-statistics will decay exponentially but, λ might be close enough to 1 so that we need to make L too big James Martens (U of T) Learning the LDS with ASOS June 24, / 21
48 First convergence result: our intuition confirmed First result: For a fixed θ the l 2 -error in the M-statistics is bounded by a quantity proportional to L 2 λ L 1, where λ = σ(h) = σ(j) < 1 (σ( ) denotes the spectral radius) so as L grows, the estimation error for the M-statistics will decay exponentially but, λ might be close enough to 1 so that we need to make L too big fortunately we have a 2nd result which provides a very different type of guarantee James Martens (U of T) Learning the LDS with ASOS June 24, / 21
49 Second convergence result Second result: updates produced by ASOS procedure are asymptotically consistent with the usual EM updates in the limit as T assumes data is generated from the model James Martens (U of T) Learning the LDS with ASOS June 24, / 21
50 Second convergence result Second result: updates produced by ASOS procedure are asymptotically consistent with the usual EM updates in the limit as T assumes data is generated from the model first result didn t make strong use any property of the approx. (could use 0 for each and result would still hold) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
51 Second convergence result Second result: updates produced by ASOS procedure are asymptotically consistent with the usual EM updates in the limit as T assumes data is generated from the model first result didn t make strong use any property of the approx. (could use 0 for each and result would still hold) this second one is justified in the opposite way strong use of the approximation follows from convergence of 1 T -scaled expected l 2 error of approx. towards zero result holds for any value of L value James Martens (U of T) Learning the LDS with ASOS June 24, / 21
52 Experiments we considered 3 real datasets of varying sizes and dimensionality each algorithm initialized from same random parameters latent dimension N x determined by trial-and-error Experimental parameters Name length (T ) N y N x evaporator motion capture warship sounds James Martens (U of T) Learning the LDS with ASOS June 24, / 21
53 Experimental results (cont.) Name length (T ) N y N x evaporator motion capture warship sounds Negative Log Likelihood ASOS EM 5 ( s) ASOS EM 10 ( s) ASOS EM 20 ( s) ASOS EM 35 ( s) ASOS EM 75 ( s) SS EM ( s) EM ( s) Iterations James Martens (U of T) Learning the LDS with ASOS June 24, / 21
54 Experimental results (cont.) Name length (T ) N y N x evaporator motion capture warship sounds Negative Log Likelihood x ASOS EM 20 ( s) ASOS EM 30 ( s) ASOS EM 50 ( s) SS EM ( s) EM ( s) Iterations James Martens (U of T) Learning the LDS with ASOS June 24, / 21
55 Experimental results (cont.) Name length (T ) N y N x evaporator motion capture warship sounds x 10 6 Negative Log Likelihood ASOS EM 150 ( s) ASOS EM 300 ( s) ASOS EM 850 ( s) ASOS EM 1700 ( s) SS EM ( s) Iterations James Martens (U of T) Learning the LDS with ASOS June 24, / 21
56 Conclusion we applied steady-state approximations to derive a set of 2nd-order recursions and equations James Martens (U of T) Learning the LDS with ASOS June 24, / 21
57 Conclusion we applied steady-state approximations to derive a set of 2nd-order recursions and equations approximated statistics of time-lag L James Martens (U of T) Learning the LDS with ASOS June 24, / 21
58 Conclusion we applied steady-state approximations to derive a set of 2nd-order recursions and equations approximated statistics of time-lag L produced an efficient algorithm for solving the resulting system Per-iteration run-times: EM SS-EM ASOS-EM O(Nx 3 T ) O(Nx 2 T + Nx 3 i) O(Nx 3 k lim ) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
59 Conclusion we applied steady-state approximations to derive a set of 2nd-order recursions and equations approximated statistics of time-lag L produced an efficient algorithm for solving the resulting system Per-iteration run-times: gave 2 formal convergence results EM SS-EM ASOS-EM O(Nx 3 T ) O(Nx 2 T + Nx 3 i) O(Nx 3 k lim ) James Martens (U of T) Learning the LDS with ASOS June 24, / 21
60 Conclusion we applied steady-state approximations to derive a set of 2nd-order recursions and equations approximated statistics of time-lag L produced an efficient algorithm for solving the resulting system Per-iteration run-times: gave 2 formal convergence results EM SS-EM ASOS-EM O(Nx 3 T ) O(Nx 2 T + Nx 3 i) O(Nx 3 k lim ) demonstrated significant performance benefits for learning with long time-series James Martens (U of T) Learning the LDS with ASOS June 24, / 21
61 Thank you for your attention James Martens (U of T) Learning the LDS with ASOS June 24, / 21
Learning the Linear Dynamical System with ASOS
James Martens University of Toronto, Ontario, M5S A, Canada JMARTENS@CS.TORONTO.EDU Abstract We develop a new algorithm, based on EM, for learning the Linear Dynamical System model. Called the method of
More information1 Proof of Claim Proof of Claim Proof of claim Solving The Primary Equation 5. 5 Psuedo-code for ASOS 7
Contents 1 Proof of Claim 2 1 2 Proof of Claim 3 3 3 Proof of claim 4 4 4 Solving he Primary Equation 5 5 Psuedo-code for ASOS 7 1 Proof of Claim 2 Claim. Given a fixed setting of the parameters θ there
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationMachine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017
1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationLearning about State. Geoff Gordon Machine Learning Department Carnegie Mellon University
Learning about State Geoff Gordon Machine Learning Department Carnegie Mellon University joint work with Byron Boots, Sajid Siddiqi, Le Song, Alex Smola What s out there?...... ot-2 ot-1 ot ot+1 ot+2 2
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationLeast Squares Estimation Namrata Vaswani,
Least Squares Estimation Namrata Vaswani, namrata@iastate.edu Least Squares Estimation 1 Recall: Geometric Intuition for Least Squares Minimize J(x) = y Hx 2 Solution satisfies: H T H ˆx = H T y, i.e.
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationCSC2541 Lecture 5 Natural Gradient
CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient 1 / 12 Motivation Two classes of optimization procedures used throughout ML (stochastic) gradient descent,
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationStabilization and Acceleration of Algebraic Multigrid Method
Stabilization and Acceleration of Algebraic Multigrid Method Recursive Projection Algorithm A. Jemcov J.P. Maruszewski Fluent Inc. October 24, 2006 Outline 1 Need for Algorithm Stabilization and Acceleration
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationThe K-FAC method for neural network optimization
The K-FAC method for neural network optimization James Martens Thanks to my various collaborators on K-FAC research and engineering: Roger Grosse, Jimmy Ba, Vikram Tankasali, Matthew Johnson, Daniel Duckworth,
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationThe Kalman Filter ImPr Talk
The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationParameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets
Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationAn analogy from Calculus: limits
COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationStatistics 910, #15 1. Kalman Filter
Statistics 910, #15 1 Overview 1. Summary of Kalman filter 2. Derivations 3. ARMA likelihoods 4. Recursions for the variance Kalman Filter Summary of Kalman filter Simplifications To make the derivations
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationAsymptotic and finite-sample properties of estimators based on stochastic gradients
Asymptotic and finite-sample properties of estimators based on stochastic gradients Panagiotis (Panos) Toulis panos.toulis@chicagobooth.edu Econometrics and Statistics University of Chicago, Booth School
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationRegression with correlation for the Sales Data
Regression with correlation for the Sales Data Scatter with Loess Curve Time Series Plot Sales 30 35 40 45 Sales 30 35 40 45 0 10 20 30 40 50 Week 0 10 20 30 40 50 Week Sales Data What is our goal with
More informationOutline lecture 6 2(35)
Outline lecture 35 Lecture Expectation aximization E and clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 13-1373 Office: House
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1
Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters Lecturer: Drew Bagnell Scribe:Greydon Foil 1 1 Gauss Markov Model Consider X 1, X 2,...X t, X t+1 to be
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More information1 Kalman Filter Introduction
1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationTechnical Details about the Expectation Maximization (EM) Algorithm
Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationInternal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky.
Batch Normalization Devin Willmott University of Kentucky October 23, 2017 Overview 1 Internal Covariate Shift 2 Batch Normalization 3 Implementation 4 Experiments Covariate Shift Suppose we have two distributions,
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationParticle Filtering for Data-Driven Simulation and Optimization
Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationChapter 3. Tohru Katayama
Subspace Methods for System Identification Chapter 3 Tohru Katayama Subspace Methods Reading Group UofA, Edmonton Barnabás Póczos May 14, 2009 Preliminaries before Linear Dynamical Systems Hidden Markov
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationWhy should you care about the solution strategies?
Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationOPTIMIZATION METHODS IN DEEP LEARNING
Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationTowards a Bayesian model for Cyber Security
Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More information