Multiplex network inference
|
|
- Buddy Mosley
- 5 years ago
- Views:
Transcription
1 (using hidden Markov models) University of Cambridge Bioinformatics Group Meeting 11 February 2016
2 Words of warning Disclaimer These slides have been produced by combining & translating two of my previous slide decks: A five-minute presentation given during my internship at Jane Street, as part of the short expositions by interns series; A 1.5h long lecture given during the Nedelja informatike 1 seminar at my high school (for students gifted for informatics). An implication of this is that it might feel like more of a high-level pitch than a low-level specification of the models involved I am more than happy to discuss the internals during or after the presentation! (you may also consult the paper distributed by Thomas before the talk) 1 If anyone would like to visit Belgrade and hold a lecture at this seminar, that d be really incredible :)
3 Motivation Why multiplex networks? This talk will represent an introduction to one of the most popular types of complex networks, as well as its applications to machine learning. Multiplex networks were the central topic of my Bachelor s this resulted in a journal publication (Journal of Complex Networks, Oxford). These networks hold significant potential for modelling many real-world systems. My work represents (to the best of my knowledge) the first attempt at developing a machine learning algorithm over these networks, with highly satisfactory results!
4 Motivation Roadmap 1. We will start off with a few slides that (in)formally define multiplex networks, along with some motivating examples. 2. Then our attention turns to hidden Markov models (HMMs), in particular, taking advantage of the standard algorithms to tackle a machine learning problem. 3. Finally, we will show how the two concepts can be integrated (i.e. how I have integrated them... ).
5 Theoretical introduction Let s start with graphs! Imagine that, within a system containing four nodes, you have concluded that certain pairs of nodes are connected in a certain manner You ve got your usual, boring graph; in this context often called a monoplex network.
6 Theoretical introduction Some more graphs You now notice that, in other frames of reference (examples to come soon), these nodes may be connected in different ways
7 Theoretical introduction Influence Finally, you conclude that these layers of interaction are not independent, but may interact with each other in nontrivial ways (thus forming a network of networks ). Multiplex networks provide us with a relatively simple way of representing these interactions by adding new interlayer edges between a node s images in different layers. Revisiting the previous example...
8 Theoretical introduction Previous example
9 Theoretical introduction Previous example (0, G 1 ) (1, G 1 ) (2, G 1 ) (3, G 1 ) (0, G 2 ) (1, G 2 ) (2, G 2 ) (3, G 2 ) (0, G 3 ) (1, G 3 ) (2, G 3 ) (3, G 3 )
10 Theoretical introduction We have a multiplex network!
11 Introduction Multiplex networks Hidden Markov Models Multiplex HMMs Review of applications Examples Despite the simplicity of this model, a wide variety of real-world systems exhibit natural multiplexity. Examples include:
12 Introduction Multiplex networks Hidden Markov Models Multiplex HMMs Review of applications Examples Despite the simplicity of this model, a wide variety of real-world systems exhibit natural multiplexity. Examples include: I Transportation networks (De Domenico et al.)
13 Review of applications Examples Despite the simplicity of this model, a wide variety of real-world systems exhibit natural multiplexity. Examples include: Genetic networks (De Domenico et al.)
14 Review of applications Examples Despite the simplicity of this model, a wide variety of real-world systems exhibit natural multiplexity. Examples include: Social networks (Granell et al.)
15 Theoretical introduction Markov chains Let S be a discrete set of states, and {X n } n 0 a sequence of random variables taking values from S. This sequence satisfies the Markov property if it is memoryless: if the next value in the sequence depends only on the current value; i.e. for all n 0: P(X n+1 = x n+1 X n = x n,..., X 0 = x 0 ) = P(X n+1 = x n+1 X n = x n ) It is then called a Markov chain. X t = x t signifies that the chain is in state x t ( S) at time t.
16 Theoretical introduction Time homogeneity A common assumption is that a Markov chain is time-homogenous that the transition probabilities do not change with time; i.e. for all n 0: P(X n+1 = b X n = a) = P(X 1 = b X 0 = a) Time homogeneity allows us to represent Markov chains with a finite state set S using only a single matrix, T: T ij = P(X 1 = j X 0 = i) for all i, j S. It holds that x S T ix = 1 for all i S. It is also useful to define a start-state probability vector π, s.t. π x = P(X 0 = x).
17 Theoretical introduction Markov chain example 0.5 y S = {x, y, z} T = x 0.3 z
18 Theoretical introduction Hidden Markov Models A hidden Markov model (HMM) is a Markov chain in which the state sequence may be unobservable (hidden). This means that, while the Markov chain parameters (e.g. transition matrix and start-state probabilities) are still known, there is no way to directly determine the state sequence {X n } n 0 the system will follow. What can be observed is an output sequence produced at each time step, {Y n } n 0. The output sequence can assume any value from a given set of outputs, O. Here we will assume O to be discrete, but it is easily extendable to the continuous case (GMHMMs, as used in the paper) and all the standard algorithms retain their usual form.
19 Theoretical introduction Further parameters It is assumed that the output at any given moment depends only on the current state; i.e. for all n 0: P(Y n = y n X n = x n,..., X 0 = x 0, Y n 1 = y n 1,..., Y 0 = y 0 ) =P(Y n = y n X n = x n ) Assuming time homogeneity on P(Y n = y n X n = x n ) as before, the only additional parameter needed to fully specify an HMM is the output probability matrix, O, defined as O xy = P(Y 0 = y X 0 = x)
20 Theoretical introduction HMM example b a c S = {x, y, z} 0.9 x y z 1 O = {a, b, c} T = O =
21 Theoretical introduction Learning and inference There are three main problems that one may wish to solve on a (GM)HMM, and each can be addressed by a standard algorithm:
22 Theoretical introduction Learning and inference There are three main problems that one may wish to solve on a (GM)HMM, and each can be addressed by a standard algorithm: Probability of an observed output sequence. Given an output sequence, {y t } T t=0, determine the probability that it was produced by the given HMM Θ, i.e. P(Y 0 = y 0,..., Y T = y T Θ) (1) This problem is efficiently solved with the forward algorithm.
23 Theoretical introduction Learning and inference There are three main problems that one may wish to solve on a (GM)HMM, and each can be addressed by a standard algorithm: Most likely sequence of states for an observed output sequence. Given an output sequence, {y t } T t=0, determine the most likely sequence of states, { x t } T t=0, that produced it within a given HMM Θ, i.e. { x t } T t=0 = argmax {x t} T t=0 P({x t } T t=0 {y t } T t=0, Θ) (2) This problem is efficiently solved with the Viterbi algorithm.
24 Theoretical introduction Learning and inference There are three main problems that one may wish to solve on a (GM)HMM, and each can be addressed by a standard algorithm: Adjusting the model parameters. Given an output sequence, {y t } T t=0 and an HMM Θ, produce a new HMM Θ that is more likely to produce that sequence, i.e. P({y t } T t=0 Θ ) P({y t } T t=0 Θ) (3) This problem is efficiently solved with the Baum-Welch Algorithm.
25 Problem setup Supervised learning One of the most common kinds of machine learning is supervised learning; the task is to construct a learning algorithm that will, upon observing a set of data with known labels (training data), construct a function capable of determining labels for, thus far, unseen data (test data). Training data, s Learning algorithm, L L( s) Unseen data, x Labelling function, h h(x) Label, y
26 Problem setup Binary classification The simplest example of a problem solvable via supervised learning is binary classification: given two classes (C 1 and C 2 ) determining which class an input x belongs to. The applications of this are widespread: Diagnostics (does a patient have a given disease, based on glucose levels, blood pressure, and sim. measurements?); Giving credit (is a person expected to return their credit, based on their financial history?); Trading (should a stock be bought or sold, depending on previous price movements?); Autonomous driving (are the driving conditions too dangerous for self-driving, based on meteorological data?);...
27 Classification via HMMs Classification via HMMs The training data consists of sequences for which class membership (in C 1 or C 2 ) is known. We may construct two separate HMMs; one producing all the training sequences belonging to C 1, the other producing all the training sequences belonging to C 2. The models can be trained by doing a sufficient number of iterations of the Baum-Welch algorithm over the sequences from the training data belonging to their respective classes. After constructing the models, classifying new sequences is simple; employ the forward algorithm to determine whether it is more likely that a new sequence was produced by C 1 or C 2.
28 Motivation A slightly different problem Now assume that we have access to more than one output type at all times (e.g. if we measure patient parameters through time, we may simultaneously measure the blood pressure and blood glucose levels). We may attempt to reformulate our output matrix O, such that it has k + 1 dimensions for k output types: O x,a,b,c,... = P (a, b, c,... x)... however this becomes intractable fairly quickly, both memory and numerics-wise... also, many combinations of the outputs may never be seen in the training data.
29 Motivation Modelling There exists a variety of ways for handling multiple outputs simultaneously, however most of them do not take into account the potentially nontrivial nature of interactions between these outputs. Worst offender: Naïve/Idiot Bayes This is where multiplex networks come into play, as a model which was proved efficient in modelling real-world systems. Fundamental idea: We will model each of the outputs separately within separate HMMs, after which we will combine the HMMs into a, larger-scale, multiplex HMM. The entire structure will still behave as an HMM, so we will be able to classify using the forward algorithm, just as before.
30 Model description Interlayer edges Therefore, we assume that we have k HMMs (one for each output type) with n nodes each. In each time step, the system is within one of the nodes of one of the HMMs, and may either: change the current node (remaining within the same HMM), or change the current HMM (remaining in the same node). Assumption: the multiplex is layer-coupled the probabilities of changing the HMM at each timestep can be represented with a single matrix, ω (of size k k). Therefore, ω ij gives the probability of, at any time step, transitioning from the HMM producing the ith output type to the HMM producing the jth output type.
31 Model description Multiplex HMM parameters Important: while in the ith HMM, we are only interested in the probability of producing the ith output type; we do not consider the other k 1 types! (they are assumed to be produced with probability 1) With that in mind, the full system may be observed as an HMM over node-layer pairs (x, i), such that: π (x,i) = ω ii π i x 0 a b i j T (a,i),(b,j) = ω ij a = b i j ω ii T i ab i = j O (x,i), y = O i xy i
32 Model description Example: Chain HMM y 0 y 1 y 2... y n O 0,y0 O 1,y1 O 2,y2 O n,yn start x 0 x 1 x x n
33 Model description Example: Two Chain HMMs y 0 y 1 y 2... y n O 0,y0 O 1,y1 O 2,y2 O n,yn start x 0 x 1 x x n start x 0 x 1 x 2... x n O 0,y 0 O 1,y 1 O 2,y 2 O n,y n y 0 y 1 y 2... y n
34 Model description Example: Multiplex Chain HMM y 0 y 1 y 2... y n O 0,y0 O 1,y1 O 2,y2 O n,yn start π 1 π 2 ω 11 ω 11 ω 11 ω x 0 x 1 x x n ω 12 ω 12 ω 12 ω 12 ω 21 ω 21 ω 21 ω 21 x 0 x 1 x 2... x n ω 22 ω 22 ω 22 ω 22 O 0,y 0 O 1,y 1 O 2,y 2 O n,y n y 0 y 1 y 2... y n
35 Training and classification Training Training the individual HMM layers (i.e. determining the parameters π i, T i, O i ) may be done as before (by using the Baum-Welch algorithm). Determining the entries of ω is much harder; this matrix specifies the relative dependencies between processes generating the different output types, which is undetermined for many practical problems! Therefore we assume an optimisation approach that makes no further assumptions about the problem within my project, a multiobjective genetic algorithm, NSGA-II, was used to determine optimal values of entries of ω.
36 Training and classification Classification As mentioned before, for binary classification: We construct two separate models; Each model is separately trained over the training sequences belonging to its class; New sequences are classified into the class for whose model the forward algorithm assigns a larger likelihood.
37 Training and classification Putting it all together Training set, s Baum-Welch (t 1, t 2,... ) C 1 (t 1, t 2,... ) C 2 Baum-Welch Train layer 1 Train layer 2... Train layer 1 Train layer 2... NSGA-II Train interlayer edges NSGA-II Train interlayer edges P( y C 1) P( y C 2) forward alg. forward alg. Unseen sequence, y C = argmax Ci P( y C i)
38 Results and implementation Application & results My project has applied this method to classifying patients for breast invasive carcinoma based on gene expression and methylation data for genes assumed to be responsible: we have achieved a mean accuracy of 94.2% and mean sensitivity of 95.8% after 10-fold crossvalidation! This was accomplished without any optimisation efforts: Fixed the number of nodes to n = 4; Used the standard NSGA-II parameters without tweaking; Ordered the sequences based on the euclidean norm of the expression and methylation levels. so we expect further advances to be possible.
39 Results and implementation Implementation and conclusions You may find the full C++ implementation of this model at We are currently in the process of publishing a new paper describing basic workflows with the software. Hopefully, more multiplex-related work to come! Developed viral during the Hack Cambridge hackathon; check it out at
40 Results and implementation Thank you! Questions?
STA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationLecture 11: Hidden Markov Models
Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationAdvanced Data Science
Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationAdvanced Algorithms. Lecture Notes for April 5, 2016 Dynamic programming, continued (HMMs); Iterative improvement Bernard Moret
Advanced Algorithms Lecture Notes for April 5, 2016 Dynamic programming, continued (HMMs); Iterative improvement Bernard Moret Dynamic Programming (continued) Finite Markov models A finite Markov model
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationHidden Markov Models
Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationLog-Linear Models, MEMMs, and CRFs
Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationHMM: Parameter Estimation
I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationLecture 9. Intro to Hidden Markov Models (finish up)
Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is
More informationFactor Graphs and Message Passing Algorithms Part 1: Introduction
Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More information11 The Max-Product Algorithm
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced
More informationHidden Markov Models NIKOLAY YAKOVETS
Hidden Markov Models NIKOLAY YAKOVETS A Markov System N states s 1,..,s N S 2 S 1 S 3 A Markov System N states s 1,..,s N S 2 S 1 S 3 modeling weather A Markov System state changes over time.. S 1 S 2
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationMACHINE LEARNING 2 UGM,HMMS Lecture 7
LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationHidden Markov Models. Terminology, Representation and Basic Problems
Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters
More informationHidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:
Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationCSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto
CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto Revisiting PoS tagging Will/MD the/dt chair/nn chair/?? the/dt meeting/nn from/in that/dt
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationExpectation maximization
Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationSemi-Markov/Graph Cuts
Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, 2015 1 / 30 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationMath 350: An exploration of HMMs through doodles.
Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or
More informationWe Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named
We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,
More informationSTATS 306B: Unsupervised Learning Spring Lecture 5 April 14
STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationHidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes
Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More informationHidden Markov Models,99,100! Markov, here I come!
Hidden Markov Models,99,100! Markov, here I come! 16.410/413 Principles of Autonomy and Decision-Making Pedro Santana (psantana@mit.edu) October 7 th, 2015. Based on material by Brian Williams and Emilio
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationCSE 473: Artificial Intelligence Autumn Topics
CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More information11.3 Decoding Algorithm
11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationBayesian decision making
Bayesian decision making Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánů 1580/3, Czech Republic http://people.ciirc.cvut.cz/hlavac,
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationHuman Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data
Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data 0. Notations Myungjun Choi, Yonghyun Ro, Han Lee N = number of states in the model T = length of observation sequence
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationImplementation of EM algorithm in HMM training. Jens Lagergren
Implementation of EM algorithm in HMM training Jens Lagergren April 22, 2009 EM Algorithms: An expectation-maximization (EM) algorithm is used in statistics for finding maximum likelihood estimates of
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationHidden Markov Models
Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic
More informationShankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms
Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationHidden Markov models
Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures
More informationWhat s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction
Hidden Markov Models (HMMs) for Information Extraction Daniel S. Weld CSE 454 Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) standard sequence model in genomics, speech, NLP, What
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationHidden Markov Models. x 1 x 2 x 3 x N
Hidden Markov Models 1 1 1 1 K K K K x 1 x x 3 x N Example: The dishonest casino A casino has two dice: Fair die P(1) = P() = P(3) = P(4) = P(5) = P(6) = 1/6 Loaded die P(1) = P() = P(3) = P(4) = P(5)
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationPlan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping
Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More information