Networks. Dynamic. Bayesian. A Whirlwind Tour. Johannes Traa. Computational Audio Lab, UIUC
|
|
- Kristian Wilkerson
- 5 years ago
- Views:
Transcription
1 Dynamic Bayesian Networks A Whirlwind Tour Johannes Traa Computational Audio Lab, UIUC
2 Sequential data is everywhere Speech waveform Bush s approval rating EEG brain signals Financial trends
3 What s a DBN? Dynamic Bayesian Network: Probabilistic graphical model for analyzing time series data Also called: o Times series model o Dynamic belief network o State space model (SSM) Useful for: o Tracking (e.g. sound source tracking, control systems) o Prediction (e.g. stock market forecasting, collision prevention) o Interpolation (e.g. sample recovery in audio/video) o Sequence classification (e.g. speech recognition) o Sequence clustering o And more
4 Roadmap Some common DBN architectures: o One layer: Markov Model/Autoregressive (AR) Model o Two Layer: Hidden Markov Model (HMM)/Linear Dynamical System (LDS) o Three Layer: Switching and Factorial HMM/LDS Problems we can solve for DBNs: o Evaluation (data likelihood) o Inference (hidden states) Viterbi algorithm Sequential inference (Kalman and particle filters) Variational inference Gibbs sampling o Learning (DBN parameters) o Structure Learning (DBN architecture)
5 Common Network Architectures One Layer Two Layer Three Layer
6 Markov Model Common DBNs Graphical Model z t 1 z t z t+1 State Transition Diagram a 11 a A = Transition matrix 2 3 a 11 a 12 a 13 4a 21 a 22 a 23 5 a 31 a 32 a 33 System equation a 31 3 DSP people like this ML people like this z t = A z t 1 Discrete state space or z t P (Az t 1 )
7 Markov Model State sequence = path through trellis a 11 a 12 a 13 a 11 a 12 a 13 a 11 a 12 a 13 4a 21 a 22 a a 21 a 22 a a 21 a 22 a 23 5 a 31 a 32 a 33 a 31 a 32 a 33 a 31 a 32 a 33 Common DBNs 2 a 11 a 12 3 a 13 4a 21 a 22 a 23 5 a 31 a 32 a t 1 t t +1 t +2
8 Markov Model Common DBNs Monophonic piano Markov model: o State = piano note o Time = spectrogram frame A = Meaningless without units , = o Sampled state (note) sequence: o What it sounds like when the notes are played: 2 = low 3 = middle 1 = high
9 Vector Auto- Regressive (VAR) Model Common DBNs Graphical Model z t 1 z t z t+1 System equation 2 px z t = A i z t i + u t () 6 4 i=1 u t N (0, ) z t z t 1. z t p = 2 3 A 1 A 2 A p 2 I I z t 1 z t 2 z t. p 3 2 u t Continuous state space z t = Ã z t () 1 + ũ t or z t N Ã zt 1,
10 Hidden Markov Model (HMM)/ Linear Dynamical System (LDS) Common DBNs Graphical Model z t 1 z t z t+1 Hidden state sequence x t x t 1 x t+1 Observation sequence System equations z t = A z t 1 + u t z t P (Az t 1, u ) or x t = B z t + v t x t P (Bz t, v ) Discrete- (Gaussian) for HMM, Gaussian Gaussian for LDS
11 Hidden Markov Model (HMM) Common DBNs Piano HMM o State = piano note (3) o Observation = spectrum o Time = spectrogram frame A = , = o Sampled state (note) sequence: o Observation sequence: (next slide)
12 Hidden Markov Model (HMM) Common DBNs o Observation sequence: Frequency Time
13 Graphical Model Switching HMM/LDS s t 1 s t s t+1 Common DBNs Hidden regime sequence z t 1 z t z t+1 Hidden state sequence x t x t 1 x t+1 Observed sequence System equations s t = C s t 1 z t = A st z t 1 + u t,st x t = B st z t + v t,st or s t P (Cs t 1 ) z t P (A st z t 1, u,st ) x t P (B st z t, v,st )
14 Switching HMM/LDS Common DBNs Switching HMM = HMM with special structure Global transition matrix à = 2 3 C 11 A 1 C 12 A 12 C 1K A 1K C 21 A 21 C 22 A 2 C 2K A 2K C K1 A K1 C K2 A K2 C KK A K Global emission matrix B = B 1 B 2 B K
15 Switching HMM Common DBNs Piano-Violin SHMM o Regime = instrument o State = piano/violin note (8 each) o Observation = spectrum o Time = spectrogram frame o Regime transition matrix C = apple 1 1, = o State transition matrices A = , =5 10 3
16 Switching HMM Common DBNs o Observation sequence: Frequency Time
17 Factorial HMM (FHMM) Common DBNs Graphical Model Hidden state sequence #1 z 1 t 1 z 1 t z 1 t+1 Hidden state sequence #2 z 2 t 1 z 2 t z 2 t+1 x t 1 x t x t+1 Observed sequence System equations z 1 t = A 1 z 1 t 1 z 2 t = A 2 z 2 t 1 x t = f B 1 z 1 t, B 2 z 2 t + u t or z 1 t P A 1 z 1 t 1 z 2 t P A 2 z 2 t 1 x t P f B 1 z 1 t, B 2 z 2 t, Emission density reflects interaction between state chains
18 Factorial HMM (FHMM) Common DBNs Factorial HMM = HMM with special structure o Global transition matrix is M 2 x M 2 (all state pairs) State sequence = path through big trellis (2, 1) (1, 1) (2, 2) (2, 2) (2, 2) (1, 2) (1, 3) (3, 3) (2, 3) (3, 1) (3, 1) (3, 1) (2, 1) (2, 1) (3, 2) (1, 1) (1, 2) (3, 2) (3, 3) (1, 1) (1, 2) (2, 3) (2, 3) (1, 3) (1, 3) (3, 2) (3, 3) t 1 t t +1
19 Factorial HMM (FHMM) Common DBNs o Observation sequence: Frequency Time
20 Many more DBNs to choose from Common DBNs N-gram model o Discrete version of AR model Gaussian mixture HMM o GMM in emission Gaussian sum LDS o GMM in state Auto-regressive HMM, Input-Output HMM o Emissions are inter-dependent Hierarchical HMM o Each state contains another HMM Mixture of HMMs o Samples are sequences Infinite HMM o Number of states/emissions can increase over time Non-negative dynamical system o Multiplicative noise in system equations Figures from Dynamic Bayesian Networks (Murphy)
21 Hierarchical HMM Common DBNs Figure from Murphy s thesis
22 Common DBNs General network structure Components of a DBN: o Nodes (variables): Observed Hidden Parameters o Directed edges (interactions) o Probability distributions (of the variables) Model fully specified by system equations: o (1) State transition dynamics o (2) Emission dynamics o (3) Initial conditions
23 Problems we can solve for DBNs Evaluation Inference Learning
24 We got problems Evaluation o What is the likelihood that the data was generated by my DBN? o Solution: sum-product (forward pass) Inference o What is(are) the most likely state sequence(s) given the data? o Solutions: max-product (e.g. Viterbi) filtering (e.g. Kalman/particle filters) forward-backward (e.g. Kalman smoother) variational methods sampling methods (e.g. Gibbs, Metropolis-Hastings) Same thing for a Bayesian Learning o What are the most likely parameters given the data? o Solutions: EM (e.g. Baum-Welch) (uses inference as sub-routine) method of moments (e.g. spectral learning) Structure learning o What is the most likely graph given the data? o Solutions:
25 #1: Evaluation #1: Evaluation Compute data likelihood (recursively) o Use conditional independence properties in directed graph o Sum-product (forward pass, average over states) P (x 1:T )= X z 1:T P (x 1:T, z 1:T ) = 1 T diag (o T ) A diag (o T 1 ) A diag (o 1 ) = X P (x T z T )P (z T z T 1 )P (x T 1 z T 1 ) P (z 2 z 1 )P (x 1 z 1 )P (z 1 ) z 1:T 2 = X P (x T z T ) X 4P (z T z T 1 )P (x T 1 z T 1 ) X " # 3 X P (z 2 z 1 )P (x 1 z 1 )P (z 1 ) 5 z T z T 1 zt 2 z 1 z t 1 z t z t+1 x t x t 1 x t+1
26 Find most likely state sequence or Find posterior distribution of state sequence Hard vs Soft decision Figure from Murphy s thesis
27 Off- line inference for the HMM Off- line Viterbi algorithm o Find most likely state sequence o Two passes on chain: Max-product (forward pass, remember most likely ancestors) Back-track (re-trace steps of ancestors) dz 1:T = argmax P (x 1:T z 1:T ) z 1:T P ( dz 1:T ) = max z T P (x T z T ) max z T 1 apple P (z T z T 1 )P (x T 1 z T 1 ) max z T 2 Sum replaced with max apple max z 1 P (z 2 z 1 )P (x 1 z 1 )P (z 1 ) o Observations only enter in as likelihoods Emission model can be GMM, neural network, etc.
28 Sequential inference: filtering Filtering Basic idea: use system model and observations to maintain probabilistic estimate of state Simple LDS: o Object tends to move in straight lines at constant speed o Measurement is state + noise Prediction Kalman Filter Predict Posterior Correct Prior Observation
29 Sequential inference: filtering Filtering Predict: Correct: Prediction Posterior ( Prior at time t+1) P (z t x 1:t 1 ) = Filtering Equations Z P (z t z t 1 )P (z t 1 x 1:t 1 ) dz t 1 z t 1 Z P (z t, z t 1 x 1:t 1 ) dz t 1 = z t 1 State transition P (z t x 1:t ) = P (z t x t, x 1:t 1 ) Prior at time t = P (x t z t, x 1:t 1 ) P (z t x 1:t 1 ) P (x t x 1:t 1 ) / P (x t z t )P (z t x 1:t 1 ) Emission
30 Filtering for the LDS: Kalman Filter Kalman Filter How can we track a noisy sinusoid? o Brownian motion: z t = z t 1 + u t u t N 0, x t = z t + v t v t N 0, 2 u 2 v o Issue: tracking will lag behind (wrong dynamics model) o Sinusoidal emission: z t = az t 1 + u t x t = sin (z t )+v t u t N 0, v t N 0, 2 u 2 v o Issue: Non-linear dynamics (intractable filtering equations)
31 Filtering for the LDS: Kalman Filter Kalman Filter System equations State describes rotating vector apple zt,1 z t,2 = apple cos ( ) sin ( ) sin( ) cos ( ) apple zt 1,1 z t 1,2 + apple ut,1 u t,2 x t = 1 0 apple z t,1 z t,2 + v t u t N (0, u ) v t N 0, 2 v Observation = sinusoid value We can apply the Kalman filter to a system with nonlinear-looking behavior by choosing the model wisely
32 Filtering for the LDS: Kalman Filter Kalman Filter Optimal filtering for sinusoidal model o Filter matches LDS 6 4 Clean sinusoid Measurement Filtered state
33 Filtering for the LDS: Kalman Filter Kalman Filter Filter assumes Brownian motion LDS o No knowledge of sinusoidal behavior causes lag 6 4 Clean sinusoid Measurement Filtered state
34 Filtering for the LDS: Kalman Filter Kalman Filter Correct model assumption, but over-estimate state noise covariance o Filter pays most attention to noisy measurements 6 4 Clean sinusoid Measurement Filtered state
35 Filtering for the LDS: Kalman Filter Kalman Filter Correct model assumption, but over-estimate observation noise variance o Filter pays most attention to state transition prior 6 4 Clean sinusoid Measurement Filtered state
36 Filtering for the LDS: Kalman Filter Kalman Filter Harmonic model for a piano note o State = multiple, decaying rotating vectors (stacked) o Emission = sum of harmonically-related sinusoids 2 3 a 1 R a 2 R 2 0 z t = z 2 t 1 + ũ t, ũ t N 0, ui 0 0 a M R M y t = z t + v t, v t N 0, 2 v Clean note Clean Noisy Denoised Noisy note Filtered note x 10 4
37 Intractable filter equations Filtering What if the system equations are crazy? o Discretization Partition state space into cells o Linearization Approximate nonlinearities via 1 st -order Taylor series expansion o Extended Kalman filter (EKF) o Moment-matching Approximate filtered distribution with a Gaussian o Unscented Kalman filter (UKF) o Switching Kalman Filter (SKF) o Variational approximations (deterministic) Approximate DBN by breaking edges in graph o Gibbs sampling (stochastic) Approximate inference with local sampling Very general, very awesome o Particle filter (PF) (stochastic) Approximate #&@% filtered distribution with point masses (sequential importance sampling)
38 Switching Kalman Filter Example: cockroach tracking Particle Filter Switching LDS s t Mult (Cs t 1 ) z t N (A st z t 1, st ) x t N (Bz t, st ) C = A 1 = A 2 = , A 3 = A 4 = = , 2 = , 3 = , 4 = State: z = h x dx dt y i > dy dt B = apple Switch values correspond to: (1) Stay still (2) Brownian motion (3) ~ Constant velocity (4) Sudden dash k = 2 k I, 2 1 = =1 2 3 =3 2 4 = 10
39 Switching Kalman Filter Example: cockroach tracking Filtering Switching Kalman filter (aka mixture of Kalman filters) Switch Predict Correct State Observation Collapse Predict Correct
40 Switching Kalman Filter Example: cockroach tracking Particle Filter
41 Sequential Inference for DBNs Particle Filtering Particle Filter Useful when: o Filtering equations are analytically intractable o Linear/Gaussian approximations fail o Computation power is plentiful Basic idea: o Replace filtered state distribution with weighted point masses o Particle = guess for state o Weight = confidence in guess o As L à, approximation weight is perfect particle J o State statistics (μ, Σ, etc.) easily computed from particles bp (z t x 1:t )= LX l=1! (l) t z (l) t L!1! P (zt x 1:t )
42 Importance Sampling Particle Filter o Tricky integral (expectation with respect to complicated distribution P(x)) o Approximate P(x) with proposal Q(x) o Weight compensates for mismatch o Sample from Q(x) to estimate integral Z f (x)p (x) d x = Z Z f (x) P (x) Q (x) Q (x) d x = f (x)w (x) Q (x) d x Filtered state distribution LX f x l w x l, x l Q (x) l=1 State transition/ emission densities Normalized weights o Apply this sequentially and we have the particle filter
43 Importance Sampling Particle Filter Example: n th moment of von Mises random variable o We want: E[x n ]= Z x n vm (x ; µ, apple) dx von Mises wrapped Gaussian uniform o Use Monte Carlo estimate: E[x n ] 1 L LX l=1 x l n o But can t sample from vm directly , x l / vm (x ; µ, apple) o Instead, find similar wrapped Gaussian (the closer, the better) o Approximate integral with importance sampling: LX E[x n ] w l x l n, x l 2 WN x ; µ,, w l / vm xl ; µ, apple WN (x l ; µ, 2 ) l=1
44 Importance Sampling Particle Filter Average absolute error in mean estimate (L = 10): WG: Unif: WG: Unif: WG: Unif: von Mises wrapped Gaussian uniform κ = κ = κ = 3
45 Sequential Importance Sampling Particle Filter o Merged filtering equations: Z P (z t x 1:t ) / P (x t z t )P (z t z t 1 )P (z t 1 x 1:t 1 ) dz t 1 o Approximate with importance sampling: P (z t x 1:t ) LX l=1 w l t z l t z l t Q z l t z l t 1, x t wt l / P x t z l t P z l t z l t 1 Q z l t z l t 1, x t w l t 1 o Optimal Q is typically hard to sample from o Common choice is transition density: Q z l t z l t 1, x t = P z l t z l t 1 w l t / P x t z l t w l t 1 o Shouldn t ignore emission density Use EKF/UKF to approximate optimal Q
46 Basic Particle Filter Prior state representation Particle Filter Predict: propagate particles through transition distribution Correct: update weights via emission density Compute statistics before resampling Resample: draw fresh particles from updated set Posterior state representation
47 Particle Filter Particle filtering Example: Multiple DBN tracking with scrambled observations 8i 2 [1,,K] z i t = f z i t 1, u i t x i t = g z i t, v i t X t = x 1 t,, x K t Multiple DBNs active Each DBN emits an observation We observe permuted bag of emission Nightmare to invert the generative model!!! Easy with PF Particle filter w/ mixture model in the state o Probabilistic Data Association (PDA) Observation-to-cluster association Particle-to-cluster association
48 Multiple LDS tracking: o GMM captures multimodal state distribution o Gaussians hold the particles in tight clusters o Collisions handled gracefully by probabilistic assignments Particle filtering Particle Filter Example: Multiple DBN tracking with scrambled observations
49 Particle filtering Example: grasshopper tracking Particle Filter LDS with indicator functions!!! z t = B z t u t C 0 5A, u t N 0, ui x t = apple z t + v t, v t N 0, 2 vi State: z = h x dx dt y dy dt i > Bounce function: (z) Thanks to Taylan Cemgil (for the grasshopper DBN)
50 Particle filtering Example: grasshopper tracking Particle Filter
51 Variational Inference Variational Basic Idea: o Inference is too hard (intractable, expensive) o Approximate DBN true with DBN simple o Set (variational) parameters of DBN simple to match DBN true o Amounts to breaking edges in the graphical model The math: o True joint distribution of observed and hidden variables: P (X, Z) o Variational approximation of joint of hidden nodes: Implies broken edges in graph Q (Z) = MY i=1 Q i (Z i ) Called product density transform in statistics
52 Variational Inference Variational The math (continued): o For inference, we want to maximize the data likelihood: ln P (X) =L (Q)+KL(Q P ) Z P (X, Z) L (Q) = Q (Z) ln Q (Z) Z P (Z X) KL (Q P )= Q (Z) ln Q (Z) Lower bound Extra stuff o Optimal variational distribution is posterior (gives best lower bound): Q (Z) = P (Z X) o Doable when fitting a model with EM (ML estimate): Z Z ln P (X) / Q (Z) ln P (X, Z) dz = P (Z X)lnP (X, Z) dz dz dz Q function
53 Variational Inference Variational 1D, 3-component GMM Log likelihood and bounds given current estimate of 2 nd mean log likelihood EM lower bound variational lower bound 600 log probability nd mean value
54 Variational Inference Variational lower bound: Z e E i6=j [ln P (X,Z)] L (Q) / Q j (Z j ) ln dz j Q j (Z j ) Variational Expectation wrt product of other factors Best bound when: Q j (Z j ) / e E i6=j[ln P (X,Z)] j th factor depends on all others, so cycle through them: for j =1 : M end Q j (Z j ) / e E i6=j[ln P (X,Z)] If we use a conjugate prior for Z j, Q j has the form of the corresponding posterior Variational inference = setting the parameters of the Q j s Do this in E step for variational EM
55 Variational EM for the GMM Variational GMM w/ Dirichlet prior on weights o Joint distribution of all variables: P (X, Z, ; µ,, ) =P (X Z ; µ, ) P (Z ) P ( ; ) z o Variational factorization: o E step: Q (Z, ) = Q 1 (Z) Q 2 ( ) ln Q 1 (Z) / E [ln P (X Z ; µ, )+lnp (Z )] ln Q 2 ( ) / E Z [ln P (Z )+lnp ( ; )] N x µ, Q 1 (z i = j) = j N x i µ j, j P k N (x i µ k, k ) k E[ln ], = e Q 2 ( ) = Dir ( ), = + N E[z] z o M step: Regular EM update for µ, N x Coupled variational parameters (iterate) µ,
56 Variational EM for the GMM Variational GMM w/ Dirichlet prior on weights E step M step Update variational assignment parameters Update variational mixing weight parameters Update model parameters z z z µ, N x N N x µ,
57 Variational EM for the GMM Variational EM for GMM samples - Fit 20 Gaussians
58 Variational EM for the GMM Variational Variational EM for GMM w/ sparse Dirichlet prior on weights samples - Fit 20 Gaussians
59 Variational Inference for the FHMM Variational Full DBN z 1 t 1 z 1 t z 1 t+1 z 2 t 1 z 2 t z 2 t+1 x t 1 x t x t+1 P Joint distribution of all variables: X, Z 1:K ; A 1:K, 1:K, 1:K = " # KY Y T Y T P z k 1 ; k P z k t z k t 1 ; A k P k=1 t=2 t=1 x t z 1:K t ; 1:K
60 Variational Inference for the FHMM Variational Gaussian emission model with additive means (Ghahramani, Jordan 97) P x t z 1:K t = N x t! KX B k z k t, k=1 Shared covariance Matrix of means for k th chain
61 Variational Inference for the FHMM Variational Fully factored variational approximation z 1 t 1 z 1 t z 1 t+1 z 2 t 1 z 2 t z 2 t+1 Variational factorization: KY Q (Z) = TY Q z k t Factors turn out to be multinomial k=1 t=1
62 Variational Inference for the FHMM Variational Induced variational parameters and dependencies o Hidden variables are de-coupled o Variational parameters induced by factorization (act as means) o Parameters are locally coupled o Iterate: Update variational parameters using neighbors 1 t 1 1 t 1 t+1 z 1 t 1 z 1 t z 1 t+1 2 t 2 t 1 2 t+1 z 2 t 1 z 2 t z 2 t+1
63 Variational Inference for the FHMM Variational Update variational parameters 1 t 1 t 1 1 t+1 1 t z 1 t 2 t 2 t 1 2 t+1 2 t z 2 t
64 Variational Inference for the FHMM Variational Structured variational approximation z 1 t 1 z 1 t z 1 t+1 z 2 t 1 z 2 t z 2 t+1 Variational factorization: Choose these to be HMM- ish: - Initial prob * t = 1 - Transition prob * t > 1 Q (Z) = KY Q z k 1 TY Q z k t z k t 1 k=1 t=2
65 Variational Inference for the FHMM Variational Induced variational parameters and dependencies o Hidden variables are de-coupled between chains o Variational parameters induced by factorization (act as likelihoods) o Parameters are locally coupled o Iterate: Forward-backward on each chain using fake likelihoods Update variational parameters using other chains posteriors 1 t 1 1 t 1 t+1 z 1 t 1 z 1 t z 1 t+1 2 t 2 t 1 2 t+1 z 2 t 1 z 2 t z 2 t+1
66 Variational Inference for the FHMM Variational Forward-Backward Update variational parameters 1 t 1 1 t 1 t+1 1 t z 1 t 1 z 1 t z 1 t+1 z 1 t 2 t 1 2 t+1 2 t 2 t z 2 t 1 z 2 t z 2 t+1 z 2 t
67 Gibbs sampling Gibbs Basic idea: o Exact inference is hard (takes too long, math is #&@%) o Any distribution can be described by its samples o So approximate inference by sampling o Sampling from full posterior is hard: Z s P Z X, Z = apple Z Latent variables Parameters o Iteratively draw from local conditionals instead: for s =1 : S for j =1 : J Z s j P Zj Z s j, X end end Samples Variables Draw from conditional (keep other unknowns fixed) o Samples eventually resemble draws from full posterior
68 Gibbs EM for the GMM Gibbs GMM w/ Dirichlet prior on weights o Joint distribution of all variables: P (X, Z, ; µ,, ) =P (X Z ; µ, ) P (Z ) P ( ; ) z o Gibbs conditionals: 8 i 2 [1,N] Z s i P (Z i s, X i ) / P (X i Z i ) P (Z i s ) s P ( Z s ) / P (Z s ) P ( ) N x µ, o E step: 8 i 2 [1,N] Z s i Mult( i ), ij = s j N X i ; µ j, j KP s k N (X i ; µ k, k ) k=1 s Dir ( ), = + NX i=1 Z s i o M step: Regular EM update for µ, Coupled Gibbs parameters (iterate)
69 Gibbs EM for the GMM Gibbs GMM w/ Dirichlet prior on weights o Conditional sampling = updates only involve variables in Markov blanket E step M step Sample assignments Sample mixing weights Update parameters z z z N x µ, N N x µ,
70 Gibbs inference for the FHMM Gibbs Possibly slow convergence of Gibbs samples to posterior o Strong correlations between state variables o Gibbs draws samples along individual coordinates of Z space o In practice, it s quite fast Sample 1 st chain s state Sample 2 nd chain s state z 1 t 1 z 1 t z 1 t+1 z 1 t z 2 t z 2 t 1 z 2 t z 2 t+1 t =1:T x t x t
71 Audio Source Separation with FHMM Filtering FHMM Set-up o 10 variational/gibbs iterations per frame, M K /5 particles o Binary masks = max. of spectra for most likely state combination m k t = I B k bz k t > B k bz k t Piano-violin FHMM o 12 notes each o Optimal variational Gibbs o Particle filter is bad Basic proposal = bad tracking Optimal proposal = too slow mix Optimal Basic PF Speech-speech FHMM o 30 speech bases each (pre-trained) o Optimal variational Gibbs PF o Not bad! mix Optimal Solo Viterbi
72 #3: Learning #3: Learning Expectation-Maximization (e.g. Baum-Welch) o Find parameters that maximize data likelihood b = argmax P (x 1:T ) o If inference (E step) is too hard, use: Variational methods Sampling methods Method of moments (e.g. spectral learning) o Express moments in terms of parameters o Solve non-linear system of equations f ( ) m
73 Baum- Welch for the HMM #3: Learning Baum- Welch Find most likely parameters given data b = argmax = argmax = argmax A,O, P (x 1:T ; ) X z 1:T P (x 1:T, z 1:T ; ) X z 1:T P (z 1 ; ) TY P (z t z t 1 ; A) t=2 TY P (x t z t ; O) t=1 This is hard, so use EM Both states and parameters are unknown o Given parameters, estimate states (inference) o Given states, estimate parameters o Iterate
74 Baum- Welch for the HMM #3: Learning Baum- Welch E step: inference with forward-backward algorithm o Estimate hidden state probabilities (posteriors) (z t )=P (x 1:t, z t ) (z t )=P (x t+1:t z t ) z t 1 z t z t+1 x t x t 1 x t+1 P (z t x 1:T ) / (z t ) (z t ) P (z t 1, z t x 1:T ) / (z t 1 ) P (x t z t ) P (z t z t 1 ) (z t ) M step: update parameters with weighted averages over data o weights = posteriors
75 Method of moments #3: Learning Say what?
76 #4: Structure Learning #4: Structure Learning Say what?
77 Books: o Pattern Recognition for Machine Learning, Christopher Bishop, 2006 o Probabilistic Reasoning over Time, Stuart Russel and Peter Norvig, Chapter 15 in Artificial Intelligence: A Modern Approach, 2009 o Machine Learning: A Probabilistic Perspective, Kevin Murphy, 2013 o Bayesian Reasoning and Machine Learning, David Barber, 2013 Thesis: References o Dynamic Bayesian Networks: Representation, Inference and Learning, Kevin Murphy, 2002 Papers: o Factorial Hidden Markov Models, Zoubin Ghahramani and Michael Jordan, Journal of Machine Learning Research, 1997 o An Introduction to Hidden Markov Models and Bayesian Networks, Zoubin Ghahramani, Journal of Pattern Recognition and AI, 2001 o A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking, Sanjeev Arulampalam, Simon Maskell, Neil Gordon, and Tim Clapp, IEEE Transactions on Signal Processing, 2002 o An Introduction to the Kalman Filter, Greg Welch and Gary Bishop, 2006 o Graphical Models for Time Series, David Barber and Taylan Cemgil, IEEE Signal Processing Magazine, 2010
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationWhy do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationLEARNING DYNAMIC SYSTEMS: MARKOV MODELS
LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationInference and estimation in probabilistic time series models
1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationEVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationWhy do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationChapter 05: Hidden Markov Models
LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 05: Hidden Markov Models Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationRecall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series
Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationMultiple Speaker Tracking with the Factorial von Mises- Fisher Filter
Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter IEEE International Workshop on Machine Learning for Signal Processing Sept 21-24, 2014 Reims, France Johannes Traa, Paris Smaragdis
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationRobert Collins CSE586 CSE 586, Spring 2015 Computer Vision II
CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials
More informationRAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS
RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS Frédéric Mustière e-mail: mustiere@site.uottawa.ca Miodrag Bolić e-mail: mbolic@site.uottawa.ca Martin Bouchard e-mail: bouchard@site.uottawa.ca
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationAdvanced Data Science
Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationMath 350: An exploration of HMMs through doodles.
Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationWe Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named
We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,
More informationBayesian Hidden Markov Models and Extensions
Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationSTATS 306B: Unsupervised Learning Spring Lecture 5 April 14
STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationLecture 9. Time series prediction
Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationMACHINE LEARNING 2 UGM,HMMS Lecture 7
LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationHidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012
Hidden Markov Models Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 19 Apr 2012 Many slides courtesy of Dan Klein, Stuart Russell, or
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationorder is number of previous outputs
Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y
More information