Networks. Dynamic. Bayesian. A Whirlwind Tour. Johannes Traa. Computational Audio Lab, UIUC

Size: px
Start display at page:

Download "Networks. Dynamic. Bayesian. A Whirlwind Tour. Johannes Traa. Computational Audio Lab, UIUC"

Transcription

1 Dynamic Bayesian Networks A Whirlwind Tour Johannes Traa Computational Audio Lab, UIUC

2 Sequential data is everywhere Speech waveform Bush s approval rating EEG brain signals Financial trends

3 What s a DBN? Dynamic Bayesian Network: Probabilistic graphical model for analyzing time series data Also called: o Times series model o Dynamic belief network o State space model (SSM) Useful for: o Tracking (e.g. sound source tracking, control systems) o Prediction (e.g. stock market forecasting, collision prevention) o Interpolation (e.g. sample recovery in audio/video) o Sequence classification (e.g. speech recognition) o Sequence clustering o And more

4 Roadmap Some common DBN architectures: o One layer: Markov Model/Autoregressive (AR) Model o Two Layer: Hidden Markov Model (HMM)/Linear Dynamical System (LDS) o Three Layer: Switching and Factorial HMM/LDS Problems we can solve for DBNs: o Evaluation (data likelihood) o Inference (hidden states) Viterbi algorithm Sequential inference (Kalman and particle filters) Variational inference Gibbs sampling o Learning (DBN parameters) o Structure Learning (DBN architecture)

5 Common Network Architectures One Layer Two Layer Three Layer

6 Markov Model Common DBNs Graphical Model z t 1 z t z t+1 State Transition Diagram a 11 a A = Transition matrix 2 3 a 11 a 12 a 13 4a 21 a 22 a 23 5 a 31 a 32 a 33 System equation a 31 3 DSP people like this ML people like this z t = A z t 1 Discrete state space or z t P (Az t 1 )

7 Markov Model State sequence = path through trellis a 11 a 12 a 13 a 11 a 12 a 13 a 11 a 12 a 13 4a 21 a 22 a a 21 a 22 a a 21 a 22 a 23 5 a 31 a 32 a 33 a 31 a 32 a 33 a 31 a 32 a 33 Common DBNs 2 a 11 a 12 3 a 13 4a 21 a 22 a 23 5 a 31 a 32 a t 1 t t +1 t +2

8 Markov Model Common DBNs Monophonic piano Markov model: o State = piano note o Time = spectrogram frame A = Meaningless without units , = o Sampled state (note) sequence: o What it sounds like when the notes are played: 2 = low 3 = middle 1 = high

9 Vector Auto- Regressive (VAR) Model Common DBNs Graphical Model z t 1 z t z t+1 System equation 2 px z t = A i z t i + u t () 6 4 i=1 u t N (0, ) z t z t 1. z t p = 2 3 A 1 A 2 A p 2 I I z t 1 z t 2 z t. p 3 2 u t Continuous state space z t = Ã z t () 1 + ũ t or z t N Ã zt 1,

10 Hidden Markov Model (HMM)/ Linear Dynamical System (LDS) Common DBNs Graphical Model z t 1 z t z t+1 Hidden state sequence x t x t 1 x t+1 Observation sequence System equations z t = A z t 1 + u t z t P (Az t 1, u ) or x t = B z t + v t x t P (Bz t, v ) Discrete- (Gaussian) for HMM, Gaussian Gaussian for LDS

11 Hidden Markov Model (HMM) Common DBNs Piano HMM o State = piano note (3) o Observation = spectrum o Time = spectrogram frame A = , = o Sampled state (note) sequence: o Observation sequence: (next slide)

12 Hidden Markov Model (HMM) Common DBNs o Observation sequence: Frequency Time

13 Graphical Model Switching HMM/LDS s t 1 s t s t+1 Common DBNs Hidden regime sequence z t 1 z t z t+1 Hidden state sequence x t x t 1 x t+1 Observed sequence System equations s t = C s t 1 z t = A st z t 1 + u t,st x t = B st z t + v t,st or s t P (Cs t 1 ) z t P (A st z t 1, u,st ) x t P (B st z t, v,st )

14 Switching HMM/LDS Common DBNs Switching HMM = HMM with special structure Global transition matrix à = 2 3 C 11 A 1 C 12 A 12 C 1K A 1K C 21 A 21 C 22 A 2 C 2K A 2K C K1 A K1 C K2 A K2 C KK A K Global emission matrix B = B 1 B 2 B K

15 Switching HMM Common DBNs Piano-Violin SHMM o Regime = instrument o State = piano/violin note (8 each) o Observation = spectrum o Time = spectrogram frame o Regime transition matrix C = apple 1 1, = o State transition matrices A = , =5 10 3

16 Switching HMM Common DBNs o Observation sequence: Frequency Time

17 Factorial HMM (FHMM) Common DBNs Graphical Model Hidden state sequence #1 z 1 t 1 z 1 t z 1 t+1 Hidden state sequence #2 z 2 t 1 z 2 t z 2 t+1 x t 1 x t x t+1 Observed sequence System equations z 1 t = A 1 z 1 t 1 z 2 t = A 2 z 2 t 1 x t = f B 1 z 1 t, B 2 z 2 t + u t or z 1 t P A 1 z 1 t 1 z 2 t P A 2 z 2 t 1 x t P f B 1 z 1 t, B 2 z 2 t, Emission density reflects interaction between state chains

18 Factorial HMM (FHMM) Common DBNs Factorial HMM = HMM with special structure o Global transition matrix is M 2 x M 2 (all state pairs) State sequence = path through big trellis (2, 1) (1, 1) (2, 2) (2, 2) (2, 2) (1, 2) (1, 3) (3, 3) (2, 3) (3, 1) (3, 1) (3, 1) (2, 1) (2, 1) (3, 2) (1, 1) (1, 2) (3, 2) (3, 3) (1, 1) (1, 2) (2, 3) (2, 3) (1, 3) (1, 3) (3, 2) (3, 3) t 1 t t +1

19 Factorial HMM (FHMM) Common DBNs o Observation sequence: Frequency Time

20 Many more DBNs to choose from Common DBNs N-gram model o Discrete version of AR model Gaussian mixture HMM o GMM in emission Gaussian sum LDS o GMM in state Auto-regressive HMM, Input-Output HMM o Emissions are inter-dependent Hierarchical HMM o Each state contains another HMM Mixture of HMMs o Samples are sequences Infinite HMM o Number of states/emissions can increase over time Non-negative dynamical system o Multiplicative noise in system equations Figures from Dynamic Bayesian Networks (Murphy)

21 Hierarchical HMM Common DBNs Figure from Murphy s thesis

22 Common DBNs General network structure Components of a DBN: o Nodes (variables): Observed Hidden Parameters o Directed edges (interactions) o Probability distributions (of the variables) Model fully specified by system equations: o (1) State transition dynamics o (2) Emission dynamics o (3) Initial conditions

23 Problems we can solve for DBNs Evaluation Inference Learning

24 We got problems Evaluation o What is the likelihood that the data was generated by my DBN? o Solution: sum-product (forward pass) Inference o What is(are) the most likely state sequence(s) given the data? o Solutions: max-product (e.g. Viterbi) filtering (e.g. Kalman/particle filters) forward-backward (e.g. Kalman smoother) variational methods sampling methods (e.g. Gibbs, Metropolis-Hastings) Same thing for a Bayesian Learning o What are the most likely parameters given the data? o Solutions: EM (e.g. Baum-Welch) (uses inference as sub-routine) method of moments (e.g. spectral learning) Structure learning o What is the most likely graph given the data? o Solutions:

25 #1: Evaluation #1: Evaluation Compute data likelihood (recursively) o Use conditional independence properties in directed graph o Sum-product (forward pass, average over states) P (x 1:T )= X z 1:T P (x 1:T, z 1:T ) = 1 T diag (o T ) A diag (o T 1 ) A diag (o 1 ) = X P (x T z T )P (z T z T 1 )P (x T 1 z T 1 ) P (z 2 z 1 )P (x 1 z 1 )P (z 1 ) z 1:T 2 = X P (x T z T ) X 4P (z T z T 1 )P (x T 1 z T 1 ) X " # 3 X P (z 2 z 1 )P (x 1 z 1 )P (z 1 ) 5 z T z T 1 zt 2 z 1 z t 1 z t z t+1 x t x t 1 x t+1

26 Find most likely state sequence or Find posterior distribution of state sequence Hard vs Soft decision Figure from Murphy s thesis

27 Off- line inference for the HMM Off- line Viterbi algorithm o Find most likely state sequence o Two passes on chain: Max-product (forward pass, remember most likely ancestors) Back-track (re-trace steps of ancestors) dz 1:T = argmax P (x 1:T z 1:T ) z 1:T P ( dz 1:T ) = max z T P (x T z T ) max z T 1 apple P (z T z T 1 )P (x T 1 z T 1 ) max z T 2 Sum replaced with max apple max z 1 P (z 2 z 1 )P (x 1 z 1 )P (z 1 ) o Observations only enter in as likelihoods Emission model can be GMM, neural network, etc.

28 Sequential inference: filtering Filtering Basic idea: use system model and observations to maintain probabilistic estimate of state Simple LDS: o Object tends to move in straight lines at constant speed o Measurement is state + noise Prediction Kalman Filter Predict Posterior Correct Prior Observation

29 Sequential inference: filtering Filtering Predict: Correct: Prediction Posterior ( Prior at time t+1) P (z t x 1:t 1 ) = Filtering Equations Z P (z t z t 1 )P (z t 1 x 1:t 1 ) dz t 1 z t 1 Z P (z t, z t 1 x 1:t 1 ) dz t 1 = z t 1 State transition P (z t x 1:t ) = P (z t x t, x 1:t 1 ) Prior at time t = P (x t z t, x 1:t 1 ) P (z t x 1:t 1 ) P (x t x 1:t 1 ) / P (x t z t )P (z t x 1:t 1 ) Emission

30 Filtering for the LDS: Kalman Filter Kalman Filter How can we track a noisy sinusoid? o Brownian motion: z t = z t 1 + u t u t N 0, x t = z t + v t v t N 0, 2 u 2 v o Issue: tracking will lag behind (wrong dynamics model) o Sinusoidal emission: z t = az t 1 + u t x t = sin (z t )+v t u t N 0, v t N 0, 2 u 2 v o Issue: Non-linear dynamics (intractable filtering equations)

31 Filtering for the LDS: Kalman Filter Kalman Filter System equations State describes rotating vector apple zt,1 z t,2 = apple cos ( ) sin ( ) sin( ) cos ( ) apple zt 1,1 z t 1,2 + apple ut,1 u t,2 x t = 1 0 apple z t,1 z t,2 + v t u t N (0, u ) v t N 0, 2 v Observation = sinusoid value We can apply the Kalman filter to a system with nonlinear-looking behavior by choosing the model wisely

32 Filtering for the LDS: Kalman Filter Kalman Filter Optimal filtering for sinusoidal model o Filter matches LDS 6 4 Clean sinusoid Measurement Filtered state

33 Filtering for the LDS: Kalman Filter Kalman Filter Filter assumes Brownian motion LDS o No knowledge of sinusoidal behavior causes lag 6 4 Clean sinusoid Measurement Filtered state

34 Filtering for the LDS: Kalman Filter Kalman Filter Correct model assumption, but over-estimate state noise covariance o Filter pays most attention to noisy measurements 6 4 Clean sinusoid Measurement Filtered state

35 Filtering for the LDS: Kalman Filter Kalman Filter Correct model assumption, but over-estimate observation noise variance o Filter pays most attention to state transition prior 6 4 Clean sinusoid Measurement Filtered state

36 Filtering for the LDS: Kalman Filter Kalman Filter Harmonic model for a piano note o State = multiple, decaying rotating vectors (stacked) o Emission = sum of harmonically-related sinusoids 2 3 a 1 R a 2 R 2 0 z t = z 2 t 1 + ũ t, ũ t N 0, ui 0 0 a M R M y t = z t + v t, v t N 0, 2 v Clean note Clean Noisy Denoised Noisy note Filtered note x 10 4

37 Intractable filter equations Filtering What if the system equations are crazy? o Discretization Partition state space into cells o Linearization Approximate nonlinearities via 1 st -order Taylor series expansion o Extended Kalman filter (EKF) o Moment-matching Approximate filtered distribution with a Gaussian o Unscented Kalman filter (UKF) o Switching Kalman Filter (SKF) o Variational approximations (deterministic) Approximate DBN by breaking edges in graph o Gibbs sampling (stochastic) Approximate inference with local sampling Very general, very awesome o Particle filter (PF) (stochastic) Approximate #&@% filtered distribution with point masses (sequential importance sampling)

38 Switching Kalman Filter Example: cockroach tracking Particle Filter Switching LDS s t Mult (Cs t 1 ) z t N (A st z t 1, st ) x t N (Bz t, st ) C = A 1 = A 2 = , A 3 = A 4 = = , 2 = , 3 = , 4 = State: z = h x dx dt y i > dy dt B = apple Switch values correspond to: (1) Stay still (2) Brownian motion (3) ~ Constant velocity (4) Sudden dash k = 2 k I, 2 1 = =1 2 3 =3 2 4 = 10

39 Switching Kalman Filter Example: cockroach tracking Filtering Switching Kalman filter (aka mixture of Kalman filters) Switch Predict Correct State Observation Collapse Predict Correct

40 Switching Kalman Filter Example: cockroach tracking Particle Filter

41 Sequential Inference for DBNs Particle Filtering Particle Filter Useful when: o Filtering equations are analytically intractable o Linear/Gaussian approximations fail o Computation power is plentiful Basic idea: o Replace filtered state distribution with weighted point masses o Particle = guess for state o Weight = confidence in guess o As L à, approximation weight is perfect particle J o State statistics (μ, Σ, etc.) easily computed from particles bp (z t x 1:t )= LX l=1! (l) t z (l) t L!1! P (zt x 1:t )

42 Importance Sampling Particle Filter o Tricky integral (expectation with respect to complicated distribution P(x)) o Approximate P(x) with proposal Q(x) o Weight compensates for mismatch o Sample from Q(x) to estimate integral Z f (x)p (x) d x = Z Z f (x) P (x) Q (x) Q (x) d x = f (x)w (x) Q (x) d x Filtered state distribution LX f x l w x l, x l Q (x) l=1 State transition/ emission densities Normalized weights o Apply this sequentially and we have the particle filter

43 Importance Sampling Particle Filter Example: n th moment of von Mises random variable o We want: E[x n ]= Z x n vm (x ; µ, apple) dx von Mises wrapped Gaussian uniform o Use Monte Carlo estimate: E[x n ] 1 L LX l=1 x l n o But can t sample from vm directly , x l / vm (x ; µ, apple) o Instead, find similar wrapped Gaussian (the closer, the better) o Approximate integral with importance sampling: LX E[x n ] w l x l n, x l 2 WN x ; µ,, w l / vm xl ; µ, apple WN (x l ; µ, 2 ) l=1

44 Importance Sampling Particle Filter Average absolute error in mean estimate (L = 10): WG: Unif: WG: Unif: WG: Unif: von Mises wrapped Gaussian uniform κ = κ = κ = 3

45 Sequential Importance Sampling Particle Filter o Merged filtering equations: Z P (z t x 1:t ) / P (x t z t )P (z t z t 1 )P (z t 1 x 1:t 1 ) dz t 1 o Approximate with importance sampling: P (z t x 1:t ) LX l=1 w l t z l t z l t Q z l t z l t 1, x t wt l / P x t z l t P z l t z l t 1 Q z l t z l t 1, x t w l t 1 o Optimal Q is typically hard to sample from o Common choice is transition density: Q z l t z l t 1, x t = P z l t z l t 1 w l t / P x t z l t w l t 1 o Shouldn t ignore emission density Use EKF/UKF to approximate optimal Q

46 Basic Particle Filter Prior state representation Particle Filter Predict: propagate particles through transition distribution Correct: update weights via emission density Compute statistics before resampling Resample: draw fresh particles from updated set Posterior state representation

47 Particle Filter Particle filtering Example: Multiple DBN tracking with scrambled observations 8i 2 [1,,K] z i t = f z i t 1, u i t x i t = g z i t, v i t X t = x 1 t,, x K t Multiple DBNs active Each DBN emits an observation We observe permuted bag of emission Nightmare to invert the generative model!!! Easy with PF Particle filter w/ mixture model in the state o Probabilistic Data Association (PDA) Observation-to-cluster association Particle-to-cluster association

48 Multiple LDS tracking: o GMM captures multimodal state distribution o Gaussians hold the particles in tight clusters o Collisions handled gracefully by probabilistic assignments Particle filtering Particle Filter Example: Multiple DBN tracking with scrambled observations

49 Particle filtering Example: grasshopper tracking Particle Filter LDS with indicator functions!!! z t = B z t u t C 0 5A, u t N 0, ui x t = apple z t + v t, v t N 0, 2 vi State: z = h x dx dt y dy dt i > Bounce function: (z) Thanks to Taylan Cemgil (for the grasshopper DBN)

50 Particle filtering Example: grasshopper tracking Particle Filter

51 Variational Inference Variational Basic Idea: o Inference is too hard (intractable, expensive) o Approximate DBN true with DBN simple o Set (variational) parameters of DBN simple to match DBN true o Amounts to breaking edges in the graphical model The math: o True joint distribution of observed and hidden variables: P (X, Z) o Variational approximation of joint of hidden nodes: Implies broken edges in graph Q (Z) = MY i=1 Q i (Z i ) Called product density transform in statistics

52 Variational Inference Variational The math (continued): o For inference, we want to maximize the data likelihood: ln P (X) =L (Q)+KL(Q P ) Z P (X, Z) L (Q) = Q (Z) ln Q (Z) Z P (Z X) KL (Q P )= Q (Z) ln Q (Z) Lower bound Extra stuff o Optimal variational distribution is posterior (gives best lower bound): Q (Z) = P (Z X) o Doable when fitting a model with EM (ML estimate): Z Z ln P (X) / Q (Z) ln P (X, Z) dz = P (Z X)lnP (X, Z) dz dz dz Q function

53 Variational Inference Variational 1D, 3-component GMM Log likelihood and bounds given current estimate of 2 nd mean log likelihood EM lower bound variational lower bound 600 log probability nd mean value

54 Variational Inference Variational lower bound: Z e E i6=j [ln P (X,Z)] L (Q) / Q j (Z j ) ln dz j Q j (Z j ) Variational Expectation wrt product of other factors Best bound when: Q j (Z j ) / e E i6=j[ln P (X,Z)] j th factor depends on all others, so cycle through them: for j =1 : M end Q j (Z j ) / e E i6=j[ln P (X,Z)] If we use a conjugate prior for Z j, Q j has the form of the corresponding posterior Variational inference = setting the parameters of the Q j s Do this in E step for variational EM

55 Variational EM for the GMM Variational GMM w/ Dirichlet prior on weights o Joint distribution of all variables: P (X, Z, ; µ,, ) =P (X Z ; µ, ) P (Z ) P ( ; ) z o Variational factorization: o E step: Q (Z, ) = Q 1 (Z) Q 2 ( ) ln Q 1 (Z) / E [ln P (X Z ; µ, )+lnp (Z )] ln Q 2 ( ) / E Z [ln P (Z )+lnp ( ; )] N x µ, Q 1 (z i = j) = j N x i µ j, j P k N (x i µ k, k ) k E[ln ], = e Q 2 ( ) = Dir ( ), = + N E[z] z o M step: Regular EM update for µ, N x Coupled variational parameters (iterate) µ,

56 Variational EM for the GMM Variational GMM w/ Dirichlet prior on weights E step M step Update variational assignment parameters Update variational mixing weight parameters Update model parameters z z z µ, N x N N x µ,

57 Variational EM for the GMM Variational EM for GMM samples - Fit 20 Gaussians

58 Variational EM for the GMM Variational Variational EM for GMM w/ sparse Dirichlet prior on weights samples - Fit 20 Gaussians

59 Variational Inference for the FHMM Variational Full DBN z 1 t 1 z 1 t z 1 t+1 z 2 t 1 z 2 t z 2 t+1 x t 1 x t x t+1 P Joint distribution of all variables: X, Z 1:K ; A 1:K, 1:K, 1:K = " # KY Y T Y T P z k 1 ; k P z k t z k t 1 ; A k P k=1 t=2 t=1 x t z 1:K t ; 1:K

60 Variational Inference for the FHMM Variational Gaussian emission model with additive means (Ghahramani, Jordan 97) P x t z 1:K t = N x t! KX B k z k t, k=1 Shared covariance Matrix of means for k th chain

61 Variational Inference for the FHMM Variational Fully factored variational approximation z 1 t 1 z 1 t z 1 t+1 z 2 t 1 z 2 t z 2 t+1 Variational factorization: KY Q (Z) = TY Q z k t Factors turn out to be multinomial k=1 t=1

62 Variational Inference for the FHMM Variational Induced variational parameters and dependencies o Hidden variables are de-coupled o Variational parameters induced by factorization (act as means) o Parameters are locally coupled o Iterate: Update variational parameters using neighbors 1 t 1 1 t 1 t+1 z 1 t 1 z 1 t z 1 t+1 2 t 2 t 1 2 t+1 z 2 t 1 z 2 t z 2 t+1

63 Variational Inference for the FHMM Variational Update variational parameters 1 t 1 t 1 1 t+1 1 t z 1 t 2 t 2 t 1 2 t+1 2 t z 2 t

64 Variational Inference for the FHMM Variational Structured variational approximation z 1 t 1 z 1 t z 1 t+1 z 2 t 1 z 2 t z 2 t+1 Variational factorization: Choose these to be HMM- ish: - Initial prob * t = 1 - Transition prob * t > 1 Q (Z) = KY Q z k 1 TY Q z k t z k t 1 k=1 t=2

65 Variational Inference for the FHMM Variational Induced variational parameters and dependencies o Hidden variables are de-coupled between chains o Variational parameters induced by factorization (act as likelihoods) o Parameters are locally coupled o Iterate: Forward-backward on each chain using fake likelihoods Update variational parameters using other chains posteriors 1 t 1 1 t 1 t+1 z 1 t 1 z 1 t z 1 t+1 2 t 2 t 1 2 t+1 z 2 t 1 z 2 t z 2 t+1

66 Variational Inference for the FHMM Variational Forward-Backward Update variational parameters 1 t 1 1 t 1 t+1 1 t z 1 t 1 z 1 t z 1 t+1 z 1 t 2 t 1 2 t+1 2 t 2 t z 2 t 1 z 2 t z 2 t+1 z 2 t

67 Gibbs sampling Gibbs Basic idea: o Exact inference is hard (takes too long, math is #&@%) o Any distribution can be described by its samples o So approximate inference by sampling o Sampling from full posterior is hard: Z s P Z X, Z = apple Z Latent variables Parameters o Iteratively draw from local conditionals instead: for s =1 : S for j =1 : J Z s j P Zj Z s j, X end end Samples Variables Draw from conditional (keep other unknowns fixed) o Samples eventually resemble draws from full posterior

68 Gibbs EM for the GMM Gibbs GMM w/ Dirichlet prior on weights o Joint distribution of all variables: P (X, Z, ; µ,, ) =P (X Z ; µ, ) P (Z ) P ( ; ) z o Gibbs conditionals: 8 i 2 [1,N] Z s i P (Z i s, X i ) / P (X i Z i ) P (Z i s ) s P ( Z s ) / P (Z s ) P ( ) N x µ, o E step: 8 i 2 [1,N] Z s i Mult( i ), ij = s j N X i ; µ j, j KP s k N (X i ; µ k, k ) k=1 s Dir ( ), = + NX i=1 Z s i o M step: Regular EM update for µ, Coupled Gibbs parameters (iterate)

69 Gibbs EM for the GMM Gibbs GMM w/ Dirichlet prior on weights o Conditional sampling = updates only involve variables in Markov blanket E step M step Sample assignments Sample mixing weights Update parameters z z z N x µ, N N x µ,

70 Gibbs inference for the FHMM Gibbs Possibly slow convergence of Gibbs samples to posterior o Strong correlations between state variables o Gibbs draws samples along individual coordinates of Z space o In practice, it s quite fast Sample 1 st chain s state Sample 2 nd chain s state z 1 t 1 z 1 t z 1 t+1 z 1 t z 2 t z 2 t 1 z 2 t z 2 t+1 t =1:T x t x t

71 Audio Source Separation with FHMM Filtering FHMM Set-up o 10 variational/gibbs iterations per frame, M K /5 particles o Binary masks = max. of spectra for most likely state combination m k t = I B k bz k t > B k bz k t Piano-violin FHMM o 12 notes each o Optimal variational Gibbs o Particle filter is bad Basic proposal = bad tracking Optimal proposal = too slow mix Optimal Basic PF Speech-speech FHMM o 30 speech bases each (pre-trained) o Optimal variational Gibbs PF o Not bad! mix Optimal Solo Viterbi

72 #3: Learning #3: Learning Expectation-Maximization (e.g. Baum-Welch) o Find parameters that maximize data likelihood b = argmax P (x 1:T ) o If inference (E step) is too hard, use: Variational methods Sampling methods Method of moments (e.g. spectral learning) o Express moments in terms of parameters o Solve non-linear system of equations f ( ) m

73 Baum- Welch for the HMM #3: Learning Baum- Welch Find most likely parameters given data b = argmax = argmax = argmax A,O, P (x 1:T ; ) X z 1:T P (x 1:T, z 1:T ; ) X z 1:T P (z 1 ; ) TY P (z t z t 1 ; A) t=2 TY P (x t z t ; O) t=1 This is hard, so use EM Both states and parameters are unknown o Given parameters, estimate states (inference) o Given states, estimate parameters o Iterate

74 Baum- Welch for the HMM #3: Learning Baum- Welch E step: inference with forward-backward algorithm o Estimate hidden state probabilities (posteriors) (z t )=P (x 1:t, z t ) (z t )=P (x t+1:t z t ) z t 1 z t z t+1 x t x t 1 x t+1 P (z t x 1:T ) / (z t ) (z t ) P (z t 1, z t x 1:T ) / (z t 1 ) P (x t z t ) P (z t z t 1 ) (z t ) M step: update parameters with weighted averages over data o weights = posteriors

75 Method of moments #3: Learning Say what?

76 #4: Structure Learning #4: Structure Learning Say what?

77 Books: o Pattern Recognition for Machine Learning, Christopher Bishop, 2006 o Probabilistic Reasoning over Time, Stuart Russel and Peter Norvig, Chapter 15 in Artificial Intelligence: A Modern Approach, 2009 o Machine Learning: A Probabilistic Perspective, Kevin Murphy, 2013 o Bayesian Reasoning and Machine Learning, David Barber, 2013 Thesis: References o Dynamic Bayesian Networks: Representation, Inference and Learning, Kevin Murphy, 2002 Papers: o Factorial Hidden Markov Models, Zoubin Ghahramani and Michael Jordan, Journal of Machine Learning Research, 1997 o An Introduction to Hidden Markov Models and Bayesian Networks, Zoubin Ghahramani, Journal of Pattern Recognition and AI, 2001 o A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking, Sanjeev Arulampalam, Simon Maskell, Neil Gordon, and Tim Clapp, IEEE Transactions on Signal Processing, 2002 o An Introduction to the Kalman Filter, Greg Welch and Gary Bishop, 2006 o Graphical Models for Time Series, David Barber and Taylan Cemgil, IEEE Signal Processing Magazine, 2010

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Chapter 05: Hidden Markov Models

Chapter 05: Hidden Markov Models LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 05: Hidden Markov Models Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter

Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter IEEE International Workshop on Machine Learning for Signal Processing Sept 21-24, 2014 Reims, France Johannes Traa, Paris Smaragdis

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials

More information

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS Frédéric Mustière e-mail: mustiere@site.uottawa.ca Miodrag Bolić e-mail: mbolic@site.uottawa.ca Martin Bouchard e-mail: bouchard@site.uottawa.ca

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Advanced Data Science

Advanced Data Science Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Lecture 9. Time series prediction

Lecture 9. Time series prediction Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

MACHINE LEARNING 2 UGM,HMMS Lecture 7

MACHINE LEARNING 2 UGM,HMMS Lecture 7 LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012 Hidden Markov Models Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 19 Apr 2012 Many slides courtesy of Dan Klein, Stuart Russell, or

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information