MAXIMUM A POSTERIORI ESTIMATION OF COUPLED HIDDEN MARKOV MODELS

Size: px
Start display at page:

Download "MAXIMUM A POSTERIORI ESTIMATION OF COUPLED HIDDEN MARKOV MODELS"

Transcription

1 MAXIMUM A POSTERIORI ESTIMATION OF COUPLED HIDDEN MARKOV MODELS Iead Rezek, Michael Gibbs and Stephen J. Roberts Robotics Research Group, Department of Engineering Science, University of Oxford, UK. irezek@robots.ox.ac.uk WWW: irezek Fax: +44 (0) Abstract. Coupled Hidden Markov Models (CHMM) are a tool which model interactions between variables in state space rather than observation space. Thus they may reveal coupling in cases where classical tools such as correlation fail. In this paper we derive the maximum a posteriori equations for the Expectation Maximization algorithm. The use of the models is demonstrated on simulated data, as well as in biomedical signal analysis. INTRODUCTION Analysis of complex systems (such as physiological systems) frequently involves studying the interactions between them. Indeed, the interactions themselves can give us clues, for example, about the state of health of the patient. For instance, interactions between cardio-respiratory variables are known to be abnormal in autonomic neuropathy [1]. Conversely, if the type of interaction is known, a departure from this state is, by definition, an anomaly. For instance, in multichannel EEG analysis we would expect coupling between the channels. If the states of the channels are divergent one may suspect some recording error (e.g. electrode failure). Traditionally, such interactions would be measured using correlation or, more generally, mutual information measures [2]. There are, however, major limitations associated with correlation, namely, the linearity and stationarity assumptions. Mutual information, on the other hand, can handle non-linearity. However, it is unable to infer anti-correlation and it also assumes stationarity (for the density estimation). Recent work has focused on Hidden Markov models (HMMs) in order to avoid some of the stationarity assumptions and model the dynamics explicitly [3]. HMMs handle the dynamics of the time-series through the use of a discrete state-space. These dynamics are taken from the state transition probabilities from one time step to the next. HMMs, however, typically have a single observation model and observation sequence and thus, by default, cannot handle the composite modelling of multiple time-series. This can be overcome through the use of multivariate observa-

2 B t B t+1 B t+2 B t+3 B t B t+1 B t+2 B t+3 X t X t+1 X t+2 X t+3 Y t Y t+1 Y t+2 Y t+3 Y t Y t+1 Y t+2 Y t+3 A t A t+1 A t+2 A t+3 X t X t+1 X t+2 X t+3 X t X t+1 X t+2 X t+3 A t A t+1 A t+2 A t+3 A t A t+1 A t+2 A t+3 (a) Standard (singlechain) HMM. (b) Coupled HMM with lag 1. (c) Coupled HMM with lag 2. Figure 1: Hidden Markov Model Topologies. tion models, such as multivariate autoregressive models [4]. The latter implies that interactions are modelled in the observation-space and not in state-space. This is unsatisfactory in some biomedical applications where the signals have completely different forms and thus spectra/bandwidths, e.g. heart-rate and respiration. Another approach models the interactions in state-space by coupling the state-spaces of each variate [5]. This allows interaction analysis to proceed on arbitrary multi-channel sequences. HMMS: MARKOV CONDITION AND COUPLING Classical HMMs A standard HMM is graphically represented as an unfolded mixture model whose states at time are conditioned on those at time (see Figure 1[a]), also known as the first order Markov property. Estimation is often performed using the maximum likelihood framework [6, 7], where the hidden states are iteratively re-estimated by their expected values and, using these, the model parameters are re-estimated. HMMs may be extended by changing the dimensions of the observation space,, and state space,, or by changing the distributions of and. In this paper, however, we are interested in systems with different topologies, and in particular, with extensions to more than one state variable. Coupled HMMS One possible extension to multi-dimensional systems is the coupled hidden Markov model (CHMM), as shown in Figure 1[b]. Here the current state is dependent on the states of its own chain ( ) and that of the neighbouring chain ( ) at the previous time-step. These models have been used in computer vision [5] and digital communication [8]. Several things are of note: The model, though still a DAG (directed acyclic graph) contains a loop which forms the essential difficulty in estimating parameters in CHMMs. One approach to solve this problem is clustering [9]: all nodes of a loop are merged

3 to form a single higher dimensional node. Though this may be a feasible approach for the simple model shown in Figure 1[b] (the state variables would be merged in this case) it is infeasible once the lag-link (link from or ) is moved forward in time. For instance, the DAG shown in figure 1[c] requires merging of the state variables and thus causes an immediate explosion of node dimensionality. Another approach, that of cut-sets [9], is also impractical for dynamic models, since the number of cut-set variables increases with the chain length. To illustrate this briefly, the same CHMM in redrawn in figure 2 as a nested sequence of loops. To cut-open each loop the values of and must be clamped, thus increasing the number of cut-set variables linearly with, the length of the chain. There are chains with possibly different observation models. Unless the observation models are identical, the chains cannot be simply transformed to a single HMM by forming a Cartesian product of the observationspaces. X t Xt+1 Xt+2 Xt+3 X t+4 X t+5 Y t+5 Y t+4 Yt+3 Yt+2 Yt+1 Yt Figure 2: Illustration of the nested loops in a CHMM In this paper, we consider CHMMs with lag and, to keep close to the original model, are required to employ node-merging. In the following section we derive the full forward-backward equations and the update equations for the maximum a posteriori estimators for the model parameters. MAXIMUM A POSTERIORI PARAMETER ESTIMATION Notation Variables and Probability Densities: is the complete set of observed variables, where is the set

4 of observation variables of the first chain and is the set of observation variables of the second chain. for each time step. step. are the actual observed (given) values of the first chain are the observed values of the second chain for each time is the set of is set of all hidden state variables, where is the set of state variables of the first chain and state variables of the second chain. At each time step the, state variable of the first chain can take one of discrete values. Likewise, at each time step, the state variable one of discrete values. of the second chain can take and are the state space dimensions. denotes the state transition probability at time chain, i.e. the probability of conditioned on and. for the first Likewise, is second chain s probability at time of conditioned on and at the previous time step. is the probability of observing conditioned on. is the probability of observing conditioned on. is the initial state probabilities of first chain. is the initial state probabilities of second chain. Model Parameters: The parameter vector contains parameters of the transition probabilities, the initial state probabilities, and the parameters of the observation models. follows a Multinomial density with parameters for each possible state of the conditioning variables and which are enumerated by and : Mu. follows a Multinomial density with parameters for each possible state of the conditioning variables and which are enumerated by and : Mu follows a Multinomial density with parameters : Mu.

5 follows a Multinomial density with parameters : Mu. and are assumed to be -variate Gaussian densities with mean vectors and covariance matrices, respectively for each channel:,, Parameter Priors: As for the parameter priors, we choose conjugate priors for each model parameter:,, Wishart, Wishart, Dirichlet for and, Dirichlet for and, Dirichlet, Dirichlet. All hyper-priors such that the priors were vague or flat [10]. were chosen The Model In the Maximum a posteriori (MAP) form of the Expectation Maximization Algorithm we aim to maximize the function [11] where and are probability densities. The likelihood function encodes the model and, in the case of a CHMM with lag, is given as (1) The EM algorithm then iterates two steps: E-Step: The expectation step, or E-Step, estimates the hidden variables such that it maximizes (1). This is done by filling in the missing values with their expectation conditioned on the data and the model parameters of the preceeding step. For HMMs, this amounts to computing the forward and backward recursions as shown in [7]. M-Step: The maximization step, or M-Step, maximizes (1) w.r.t to, conditioned on the expected values of. The maximum likelihood solution for CHMMs is identical to that for standard HMM case [4, 12]. Here, however, we derive the maximum a posteriori solution to CHMMs.

6 The Expectation Step Clearly, for CHMMs, the non-standard part is the belief propagation in the E-Step, computing the expected values of the hidden variables given the data and the parameters. For clarity, we will first demonstrate the derivation of the forward-backward recursions for a general CHMM, i.e. not necessarily one with discrete state space. The discrete case is recovered, simply, by replacing the integrals with sums. For standard HMMs we have: and by conditional independence, For CHMMs, we define a variable for each chain: and. At, At, since is independent of given, and so Repeating the steps for the second chain and for arbitrary gives (2) Thus, there are two un-coupled prediction equations and no node clustering is, yet, required. (3)

7 The backward recursions for standard HMMs are defined as We can solve for these inductively, as follows: Repeating the steps for any we obtain The loop in the CHMM prevents a split of the backward recursions into two uncoupled recursions. We are thus forced to perform a clustering operation of all the nodes in a loop, namely,, which leads to the following recursion: Because the back recursions cannot be decoupled we are left with no choice but to merge successive time steps. This simply leads to the joint forward and backrecursions This also demonstrates that it quickly becomes infeasible to explore different coupling topologies, as they will increase the dimensionality of the clustered loop exponentially. (4) (5) (6)

8 Finally, we require for the M-step the quantity (7) (8) and (9) (10) The Maximization Step Having computed the hidden states we can proceed with the estimation of the model parameters. Expanding the first term in equation (1) we obtain the following function (11) The equation above consists of sets of terms: in the first line are the terms for the initial state probabilities of each channel with parameters, in the second line are the terms for the state transition probabilities of the X-Channel with parameters, in the third for the state transition probabilities of the Y-Channel with parameters, and finally in the forth line for the observation models of each channel with parameters, (12)

9 The update formulas can be computed by maximizing the w.r.t. to the parameters. -functions above The means and To derive the update rules for the first chain, the required -function is where is is just the logarithm of a -variate Normal density and the log-prior Maximizing w.r.t. gives the update equation for the means of the first chain Likewise, for the second chain we maximize w.r.t. to obtain the update rule for the means of the second chain (13) (14) The covariance matrices and The -functions to maximize for the two chains are again and, which need to be maximized w.r.t. and, respectively. For the first chain, the -functions is where and the prior is a -variate Wishart density for the precision

10 Maximization gives updates for the covariance matrices according to (15) (16) The initial state probabilities and The relevant -function for the computation of the initial state probabilities is where the prior is a Dirichlet density and we impose the constraint. Maximizing the above gives the update equations for the initial state probabilities (17) (18) (19) The state transition probabilities and Finally, we need to compute the update equations for the state transition probabilities. The relevant -function is given by where the prior is a Dirichlet density and we impose the constraint.

11 Maximizing leads to update equations for the state transition probabilities (20) (21) for each possible parent state and. EXPERIMENTS Synthetic Data Synthetic data was generated by drawing samples from a CHMM. Figure 3 shows the input state sequence together with the estimated joint state and marginal sequences. The lower two traces in the figure depict the individual state sequences, estimated by marginalisation of the joint transition probabilities and feeding these into a standard single HMM model. Input Viterbi Joint Viterbi First Chain Viterbi Second Chain Viterbi Figure 3: CHMM state sequences (from top to bottom trace): input, estimated joint and marginal Viterbi paths. Cheyne Stokes Data We applied the CHMM to features extracted from a section of Cheyne Stokes Data 1, which consisted of one EEG recording and a simultaneous respiration recording sampled at. The feature, fractional spectral radius (FSR) measure [13], was 1 A breathing disorder in which bursts of fast respiration are interspersed with breathing absence.

12 Joint Viterbi with corresponding 2 observed sequences 1/0 State (Chain A/Chain B) 1/1 0/0 0/1 Figure 4: CHMM modelling of Cheyne Stokes data: Maximum Likelihood modelling of Cheyne Stokes:: there are 2 states corresponding to EEG and respiration being both on or both off (0/0 and 1/1) and 2 states corresponding to EEG in state on/off and respiration being in states off/on (i.e. transition states). computed from consecutive non-overlapping windows of two seconds length. Thus, respiration FSR then formed one channel while EEG FSR the second channel. The data has clearly visible states which can be used to verify the results of the estimated model. Figure 4 shows the FSR traces superimposed with the joint Viterbi path sequence. The CHMM states were set to and. Essentially, of the four total states the CHMM splits the sequence such that two correspond to respiration and EEG being in phase and two correspond to the transitions states, i.e. respiration on, EEG off and vice versa. Time Movement Planning Experiment Data This experiment was to produce a model to predict the onset of movement in humans by examining changes in electroencephalography signals (EEG) measured above the motor cortex area of the brain. HMMs have been successfully used to predict finger movement using EEG signals [14][15]. Using a separate source of information, here the EMG, we attempted to improve the prediction quality of the model. EEG recordings were taken from 5 subjects 3 cm behind the C3 and C4 electrode positions (according to the system), which will be called here C3 and C4. The subjects were asked to move their left or right hand when given cues. EMG data was also recorded from the arms to mark the onset of movement. Both the EEG and EMG signals were sampled at 125 Hz.Reflection coefficients [16] from a order autoregressive model, we then calculated from the difference between of

13 CHMM viterbi path EMG viterbi path Figure 5: Viterbi paths of the hand movement data: The top trace is computed from the marginalised CHMM using only EEG data. The lower trace indicates the timing of the hand movements and is from the marginalised CHMM using only EMG data. the EEG signals at C3 and C4, in addition to the reflection coefficients computed form the EMG signals. The relection coefficients were computed using sliding windows of length second shifted by seconds. The final signal for training was composed of 6 second long sections centred around the actual movement. We trained a CHMM and a standard HMM, both with Gaussian observation models, for comparison. The CHMM model was trained using EEG and EMG data. For testing, however, the model s EMG channel was marginalised out, thus transforming the CHMM model to a standard HMM model which is now tested on EEG alone, i.e. training was performed on all available data while prediction was done one EEG alone. Figure 5 (top trace) shows the Viterbi path obtained by testing the marginalised CHMM on EEG data alone. The bottom trace shows the the Viterbi path of the marginalised CHMM-EMG channel which indicates the exact timing of the movements. If no information transfer between the channels exists then the CHMM would degenerate into a product of two single chain HMMs. Inspection of the transition probabilities and the observation models reveals that information from the EMG channel is used to change both the EEG observation model and the inter-chain transition probabilities. The use of the EMG during training really focuses the model on the actual movement planning processes in the brain. A single HMM, without further information from other sources, will inevitably latch onto many other brain processes. This leads to reduced prediction quality. Sleep EEG Data In a final example, we fitted a -th order Bayesian auto-regressive (AR) model [16] to a full night EEG recording. The reflection-coefficients were extracted from two channels, namely C3 and C4, using sliding non-overlapping windows of s length (sampling rate Hz). The coefficients then formed the observation sequence for a CHMM. The estimation was performed in a completely un-supervised manner. For each of the two chains the number of hidden states was set to, corresponding to three sleep-stages [17]: NREM, REM and Wake, a decision based on the prior knowledge of the structure in the data. The input data were labeled by a human expert as deep

14 sleep (sleep stage 4, S4), REM and Wake. Figure 6[a] depicts the Hypnogram (human labels) together with the estimated state sequence and Figure 6[b] shows the co-occurrences of the Viterbi labels with those of the Hypnogram. Roughly speaking, of the states with the longest duration, state corresponds to wake, state to REM sleep, and states and correspond to most non-rem sleep. Generally, a CHMM estimated using the Maximum a posteriori procedure does make good sense of the data, but the correspondence between the human labels and the CHMM labels is not unique. CONCLUSIONS & FUTURE WORK The maximum a posteriori EM procedure is very fast and stable, indeed, much more so than a simple maximum likelihood estimator (for stability) [18]or sampling based methods (for speed) [19]. Moreover, the smoothness and stability of the state sequence is much higher than that obtained using single chain HMMs [20]. The results presented here are only qualitative in that we have not assessed, for example, the classification performance quantitatively. This is because the paper presents a step towards a fully Bayesian estimation of CHMM parameters using variational methods. These will then also permit the estimation of the true model, in particular the hidden state space size and the lag. The CHMMs themselves may be expanded to more than just two coupled chains and have continuous hidden state spaces or other observation models. One other possible direction is that of trimming unnecessary transition probabilities with the use of entropy priors [21]. More important for us, however, is the question of lagestimation. Future work will focus on model-lag selection for which we envisage ensemble learning methods [22]. REFERENCES [1] R. Bannister and C.J. Mathias. Autonomic Failure, A Textbook of Clinical Disorders of the Autonomic Nervous System. Oxford University Press, Oxford, 3 edition, [2] R.K. Kushwaha, W.J. Williams, and H. Shevrin. An Information Flow Technique for Category Related Evoked Potentials. IEEE Transactions on Biomedical Engineering, 39(2): , [3] M. Azzouzi and I.T. Nabney. Time Delay Estimation with Hidden Markov Models. In Ninth International Conference on Artificial Neural Networks, pages IEE, London, [4] W.D Penny and S.J. Roberts. Gaussian Observation Hidden Markov Models for EEG analysis. Technical Report TR-98-12, Imperial College London, [5] M. Brand. Coupled hidden Markov models for modeling interacting processes. Technical Report 405, MIT Media Lab Perceptual Computing, June [6] S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Computation and neural systems, California Institute of Technology, [7] L. R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceeding of the IEEE, 77(2): , 1989.

15 Joint Viterbi (3 pt Median Filtered) 2/2 2/1 2/0 1/2 1/1 1/0 0/2 0/1 0/0 Hypnogram Wake Move REM S1 S2 S3 S4 (a) CHMM state sequence and Hypnogram (Human expert scorning) Label Co occurances /1 0/ /0 0/2 0 1/1 Wake Move REM S1 1/2 S2 S3 S4 2/1 2/0 Viterbi Label Hypnogram Label (b) Label Co-occurrences Figure 6: Comparison between estimated labels and Hypnogram Labels for sleep EEG

16 [8] B.J. Frey. Graphical Models for Machine Learning and Digital Communication. MIT Press, [9] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, 2nd edition, [10] P.M. Lee. Bayesian Statistics: An Introduction. Edward Arnold, [11] R.W. McLachlan and T. Krishnan. The EM Algorithm and Extensions. John Wiley & Sons, New York, NY, [12] S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Computation, 11(2): , [13] I. Rezek and S.J. Roberts. Stochastic Complexity Measures for Physiological Signal Analysis. IEEE Transactions on Biomedical Engineering, 44(9), [14] J.R. Wolpaw, N. Birbaumer, W.J. Heetderks, D.J. McFarland, P.Hunter Peckham, G. Shalk, E. Donchin, L.A. Quatrano, C.JU. Robinson, and T.M. Vaughan. Brain- Computer Interface Technology: A Review of the First International Meeting. IEEE Transactions on Rehabilitation Engineering, 8(2): , [15] W.D. Penny, S.J. Roberts, E. Curran, and M. Stokes. EEG-based communication: a pattern recognition approach. IEEE Transactions on Rehabilitation Engineering, 8(2), [16] P. Sykacek, S. Roberts, A. Flexer, and G. Dorffner. A probabilistic approach to high resolution sleep analysis. In G. Dorffner, editor, Proceedings of the International Conference on Neural Networks (ICANN), page to appear, Wien, Berlin, New York, Springer Verlag. [17] S.J. Roberts. Analysis of the Human Sleep Electroencephalogram using a Selforganising Neural Network. PhD thesis, University of Oxford, [18] I. Rezek and S.J. Roberts. Estimation of Coupled Hidden Markov Models with application to Biosignal Interaction Modelling. In NNSP International Workshop on Neural Networks for Signal Processing., irezek. [19] I. Rezek, Peter Sykacek, and S.J. Roberts. Learning Interaction Dynamics with Coupled Hidden Markov Models. IEE Proceedings - Science, Measurement and Technology., 147(6): , November irezek. [20] A. Flexer, P. Sykacek P, I. Rezek, and G. Dorffner. Using Hidden Markov Models to build an automatic, continuous and probabilistic sleep stager for the SIESTA project. In European Medical And Biological Engineering Conference, [21] M. Brand. An entropic estimator for structure discovery. In Neural Information Processing Systems, NIPS 11, [22] T.S. Jaakkola. Tutorial on variational approximation methods Acknowledgement The authors would like to thank Will Penny and Peter Sykacek for very helpful discussions and software. I.R. and M.G. are supported by the UK EPSRC & M.G. is also supported by the BBC Research and Development, to whom we are most grateful.

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems Silvia Chiappa and Samy Bengio {chiappa,bengio}@idiap.ch IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland Abstract. We compare the use

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Particle Swarm Optimization of Hidden Markov Models: a comparative study

Particle Swarm Optimization of Hidden Markov Models: a comparative study Particle Swarm Optimization of Hidden Markov Models: a comparative study D. Novák Department of Cybernetics Czech Technical University in Prague Czech Republic email:xnovakd@labe.felk.cvut.cz M. Macaš,

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

Minimum Entropy Data Partitioning. Exhibition Road, London SW7 2BT, UK. in even a 3-dimensional data space). a Radial-Basis Function (RBF) classier in

Minimum Entropy Data Partitioning. Exhibition Road, London SW7 2BT, UK. in even a 3-dimensional data space). a Radial-Basis Function (RBF) classier in Minimum Entropy Data Partitioning Stephen J. Roberts, Richard Everson & Iead Rezek Intelligent & Interactive Systems Group Department of Electrical & Electronic Engineering Imperial College of Science,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

EEG- Signal Processing

EEG- Signal Processing Fatemeh Hadaeghi EEG- Signal Processing Lecture Notes for BSP, Chapter 5 Master Program Data Engineering 1 5 Introduction The complex patterns of neural activity, both in presence and absence of external

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models $ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Lecture 11: Hidden Markov Models

Lecture 11: Hidden Markov Models Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Master 2 Informatique Probabilistic Learning and Data Analysis

Master 2 Informatique Probabilistic Learning and Data Analysis Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel

More information

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Lecture 9. Time series prediction

Lecture 9. Time series prediction Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture

More information

Independent Component Analysis

Independent Component Analysis Independent Component Analysis James V. Stone November 4, 24 Sheffield University, Sheffield, UK Keywords: independent component analysis, independence, blind source separation, projection pursuit, complexity

More information

Data Analyzing and Daily Activity Learning with Hidden Markov Model

Data Analyzing and Daily Activity Learning with Hidden Markov Model Data Analyzing and Daily Activity Learning with Hidden Markov Model GuoQing Yin and Dietmar Bruckner Institute of Computer Technology Vienna University of Technology, Austria, Europe {yin, bruckner}@ict.tuwien.ac.at

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Graphical models: parameter learning

Graphical models: parameter learning Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

Figure : Learning the dynamics of juggling. Three motion classes, emerging from dynamical learning, turn out to correspond accurately to ballistic mot

Figure : Learning the dynamics of juggling. Three motion classes, emerging from dynamical learning, turn out to correspond accurately to ballistic mot Learning multi-class dynamics A. Blake, B. North and M. Isard Department of Engineering Science, University of Oxford, Oxford OX 3PJ, UK. Web: http://www.robots.ox.ac.uk/vdg/ Abstract Standard techniques

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

Learning from Sequential and Time-Series Data

Learning from Sequential and Time-Series Data Learning from Sequential and Time-Series Data Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/? Sequential and Time-Series Data Many real-world applications

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Bayesian Classification of Single-Trial Event-Related Potentials in EEG

Bayesian Classification of Single-Trial Event-Related Potentials in EEG Bayesian Classification of Single-Trial Event-Related Potentials in EEG Jens Kohlmorgen Benjamin Blankertz Fraunhofer FIRST.IDA Kekuléstr. 7, D-12489 Berlin, Germany E-mail: {jek, blanker}@first.fraunhofer.de

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Hidden Markov Models (recap BNs)

Hidden Markov Models (recap BNs) Probabilistic reasoning over time - Hidden Markov Models (recap BNs) Applied artificial intelligence (EDA132) Lecture 10 2016-02-17 Elin A. Topp Material based on course book, chapter 15 1 A robot s view

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Speeding up Inference in Markovian Models

Speeding up Inference in Markovian Models Speeding up Inference in Markovian Models Theodore Charitos, Peter de Waal and Linda C. van der Gaag Institute of Information and Computing Sciences, Utrecht University P.O Box 80.089, 3508 TB Utrecht,

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

Household Energy Disaggregation based on Difference Hidden Markov Model

Household Energy Disaggregation based on Difference Hidden Markov Model 1 Household Energy Disaggregation based on Difference Hidden Markov Model Ali Hemmatifar (ahemmati@stanford.edu) Manohar Mogadali (manoharm@stanford.edu) Abstract We aim to address the problem of energy

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

Bayesian Nonparametric Learning of Complex Dynamical Phenomena Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

VIBES: A Variational Inference Engine for Bayesian Networks

VIBES: A Variational Inference Engine for Bayesian Networks VIBES: A Variational Inference Engine for Bayesian Networks Christopher M. Bishop Microsoft Research Cambridge, CB3 0FB, U.K. research.microsoft.com/ cmbishop David Spiegelhalter MRC Biostatistics Unit

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data 0. Notations Myungjun Choi, Yonghyun Ro, Han Lee N = number of states in the model T = length of observation sequence

More information

Inference in Explicit Duration Hidden Markov Models

Inference in Explicit Duration Hidden Markov Models Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 /

More information

A variational radial basis function approximation for diffusion processes

A variational radial basis function approximation for diffusion processes A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Convergence Rate of Expectation-Maximization

Convergence Rate of Expectation-Maximization Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation

More information