CS395T: Structured Models for NLP Lecture 10: Loopy Graphical Models

Size: px
Start display at page:

Download "CS395T: Structured Models for NLP Lecture 10: Loopy Graphical Models"

Transcription

1 CS395T: Structured Models for NLP Lecture 10: Loopy Graphical Models Recall: Global vs. Greedy State space Start state Gold end state Greg Durrett Greedy: 2n local training examples, only see gold states Global: one global example, might see new states Recall: Global Training with Early UpdaMng For each epoch For each sentence For i=1 2*len(sentence) # 2n transimons in arc-standard beam[i] = compute_successors(beam[i-1]) If beam[i] does not contain gold: # Feats are cumulamve up unml this point apply_gradient_update(feats(gold[0:i]) - feats(beam[i,0])) break # If we got to the end, gold may smll not be one-best If i == 2*len(sentence): apply_gradient_update(feats(gold) - feats(beam[2*len(sentence),0])) Administrivia Survey results: pace a bit too fast (assumes too much prior knowledge) Fast pace for a couple of lectures on graph-structured models, classical machine translamon More moderate pace on fundamentals of NNs / RNNs / neural MT Details for projects: I ll try to do this more FronMers / current research: a]er RNNs More materials: precision/recall of readings? Don t have expectamons for the final project It starts at 9:30am: sorry :(

2 Road Map Text Text Analysis Annota/ons POS tagging SyntacMc parsing NER Coreference resolumon First half of the class: more text analysis Second half of the class: more applicamons Applica/ons Summarize Extract informamon Answer quesmons IdenMfy senmment Translate Sequences: POS, NER Road Map Trees: consmtuency parsing, dependency parsing, semanmc role labeling Today: graph-structured models with two inference techniques: belief propagamon, Gibbs sampling Next Mme: classical (non-neural) machine translamon Then, part 2 of the class: neural networks, RNNs, CNNs, etc. P (y x) = 1 Z i=2 e Recall: CRFs t y n exp( t (y i 1,y i )) exp( e (y i,i,x)) i=1 Skip-chain CRFs The news agency Tanjug reported on the outcome of the meemng.? PER? The delegamon met the president at the airport, Tanjug said. Z is normalizing constant: how did we compute it? And marginals? Forward-backward: efficient dynamic program for summing things out Coreferent enmmes should be the same type One sense per discourse assumpmon: bank (river) and bank (financial) rarely occur in the same context

3 P (y x) / i=2 PER Skip-chain CRFs (j, k) are pairs of variables that we manually linked up PER exp( t (y i 1,y i )) exp( e (y i,i,x)) exp( `(y j,y k )) i=1 (j,k)? Inference P (y x) / exp( t (y i 1,y i )) exp( e (y i,i,x)) exp( `(y j,y k )) i=2 i=1 (j,k) How do we do forward-backward in this case? Assume just one sentence What if there are no links (j, k)? What if there s one link (j, k)? Iterate upward through i: keep track of state i-1, keep track of state j Dynamic program now tracks two states, so an extra factor of s For k links, state blows up by factor of s k Inference Inference Forward matrix (tensor): The news agency Tanjug reported on the outcome of the meemng. state j The delegamon met the president at the airport, Tanjug said. doc len state at curr Mmestep state j O(s 2 ) predecessors The news agency Tanjug reported on the outcome of the meemng. The delegamon met the president at the airport, Tanjug said. esterday, Tanjug also reported on Now would need to track two prior states generally becomes intractable

4 SoluMon 1: Belief propagamon Inference SoluMon 2: Gibbs sampling Belief PropagaMon Belief PropagaMon Forward-backward: instance of sum-product algorithm for inference in general tree-structured CRFs Sum-product doesn t work when there are loops, but it s usually a good approximamon, so we can just use it anyway f 1 v 1 f 2 v 2 Sum-Product Algorithm P (, x) = NotaMon: N(v) factors that are neighbors of variable v (use y for values) N(f) variables that are neighbors of factor f y f y values associated with a factor f Messages µ : vectors of values on edges between variable and factor (one message in each direcmon along edge). DistribuMons over y Posteriors are products of messages from factors: exp( ( )) exp( (, )) exp( (y1 0 )) exp( (y,y0 2 )) P y,y0 2 P (y i = y x) / f2n(v i ) µ f!vi (y)

5 V->f messages: F->v messages: Sum-Product Algorithm µ v!f (y) = f 0 2N(v),f 0 6=f Sum over all values of y for this factor with the ith coordinate set to y i µ f 0!v(y) Value of y is a product of what all other incoming messages say about y. I.e., propagate informamon from the rest of the graph, but don t feed the factor its own outputs µ f!vi (y i )= X y f, i exp( (y f, i,y i )) k:v k 2N(f),k6=i µ vk!f (y f,k ) Product over all other factors messages f 1 v 1 f 2 v 2 Sum-Product Algorithm exp( ( )) = [0.9, 0.1] exp( (, )) = 1 0 Factor requires = (these are zeroes in real space!) P (, )= P (, x) = exp( ( )) exp( (, )) exp( (y1 0 )) exp( (y,y0 2 )) P y,y0 2 Probability reflects both factors f 1 µ v!f (y) = v 1 f 2 v 2 Sum-Product Algorithm P (, x) = exp( ( )) = [0.9, 0.1] exp( (, )) = 1 0 IniMalize messages arbitrarily, then iterate over nodes (in some order): µ f!vi (y i )= X f 0 2N(v),f 0 6=f µ f 0!v(y) exp( ( )) exp( (, )) exp( (y1 0 )) exp( (y,y0 2 )) P y,y0 2 y f, i exp( (y f, i,y i )) k:v k 2N(f),k6=i µ vk!f (y f,k ) f 1 v 1 f 2 v 2 Final marginals: Sum-Product Algorithm P (, x) = P (y i = y x) / exp( ( )) exp( (, )) exp( (y1 0 )) exp( (y,y0 2 )) P y,y0 2 If graph is tree structured: once every node/factor has talked to every other node/factor, we have convergence For linear chains: need to run a forward pass and a backward pass f2n(v i ) µ f!vi (y)

6 ConnecMons to Forward-Backward µ v!f (y) = µ f 0!v(y) f 0 2N(v),f 0 6=f µ f!vi (y i )= X exp( (y f, i,y i )) y f, i Message from variable to next factor: product over current emission and previous factor message Message from factor to variable: incorporates transimon scores k:v k 2N(f),k6=i µ vk!f (y f,k ) Loopy Sum-Product y 3 exp( ( )) = [0.9, 0.1] exp( (, )) = 1 0 exp( (,y 3 )) = exp( (y 3, )) = What happens in this case? Posteriors blow up! Sum-product algorithm is not correct with loops We ve just broken the forward update into two pieces! Belief PropagaMon Algorithm y 3 exp( (, )) = exp( (,y 3 )) = exp( ( )) = [0.9, 0.1] 1 0 exp( (y 3, )) = Belief propaga/on: ignore this problem. Run sum-product for a while and use what it computes as an approxima3on to the true posterior Most of the Mme cyclic dependencies are not strong and it works out Some momvamon from stamsmcal physics, no guarantees on results /PER? EnMty Analysis Model for joint coreference, NER, and enmty linking to Wikipedia; here we ll just look at coref+ner Coreference ResoluMon is headquartered just outside AusMn. The company was founded LOC SemanMc Typing PER

7 EnMty Analysis EnMty Analysis Each menmon chooses an antecedent AusMn NEW PERSON,,... c 3 t 3 The company Coreference SemanMc typing f(x,c 3 ) c 1 c 3 Coreference t 1 c 2 t 2 AusMn t 3 The company f(x,t 3 ) SemanMc typing EnMty Analysis EnMty Analysis Coreference If coreferent, these menmons should have the same semanmc type c 1 c 2 SemanMc type PERSON,,... t 1 t 2 The company The company 5 menmons and a typical document has 200

8 What does BP inference look like? Belief PropagaMon c 2 Coreference Be a tree factor imposes the tree constraint on the variables t 1 t 2 New SemanMc type Sibling factors make it loopy PERSON The company PERSON Achieve the same thing as Koo s higher-order features Bansal et al. (2014) Skip-chain CRFs PER PER? P (y x) / exp( t (y i 1,y i )) exp( e (y i,i,x)) exp( `(y j,y k )) i=2 i=1 (j,k) Can we approximate P(y x) in other ways?

9 i=2 Inference P (y x) / exp( t (y i 1,y i )) exp( e (y i,i,x)) exp( `(y j,y k )) i=1 Can we sample from P(y x) and use those samples to approximate it? (Monte Carlo methods) For distribumons that are very peaked, samples look like the max anyway (j,k) Key idea: resample a single variable at a Mme condimoned on all others P (y i = y y i, x) / exp [ t(y i 1,y)+ t (y, y i+1 )+ e (y, i, x)+ 3 X X `(y, y k )+ `(y k,y) 5 k:(i,k) linked k:(k,i) linked P (y i = y y i, x) / exp [ t(y i 1,y)+ t (y, y i+1 )+ e (y, i, x)+ 3 X X `(y, y k )+ `(y k,y) 5 k:(i,k) linked Orange things are all constants now! k:(k,i) linked Fix all predicmons except one, easy to compute condimonal probabilimes (normalize scores for this parmcular variable y) Iterate over all variables repeatedly, like belief propagamon IniMalize y values to something reasonable for k=1 t iteramons: for i=1 m words in the document: y i = Sample from P (y i,...,i 1,i+1,...,m, x) Note: we need to iterate over the document several Mmes! The Gibbs sampling procedure forms a Markov chain whose equilibrium distribumon is the posterior However, you might need to run it for a very long Mme to get samples which don t depend on the inimalizamon.

10 Problems with P (x 1,x 2 ) x 2 x Start with x = (0, 0) P (x 2 x 1 = 0) = [0.98, 0.02] P (x 1 x 2 = 0) = [0.98, 0.02] stay at (0, 0) 98% of the Mme stay at (0, 0) 98% of the Mme Takes ~50 steps before we switch to (1, 1) need to run Gibbs sampling for a long Mme to get a good approximamon of the posterior Unsupervised POS inducmon with alignments across languages Takeaways Can define loopy factor graphs and smll do inference Belief propagamon and Gibbs sampling both work best if there are only weak cyclic dependencies. This is usually the case if the loopy factors incorporate features and the loops are large Can incorporate nice features this way, not as commonplace and a bit harder to get working, but everyone thinks this stuff is cool Other ways of doing this: output reranking, beam search (give up on doing principled inference), Naseem et al. (2009)

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week

More information

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week

More information

CS395T: Structured Models for NLP Lecture 5: Sequence Models II

CS395T: Structured Models for NLP Lecture 5: Sequence Models II CS395T: Structured Models for NLP Lecture 5: Sequence Models II Greg Durrett Some slides adapted from Dan Klein, UC Berkeley Recall: HMMs Input x =(x 1,...,x n ) Output y =(y 1,...,y n ) y 1 y 2 y n P

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

with Local Dependencies

with Local Dependencies CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence

More information

Conditional Random Fields

Conditional Random Fields Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Unsupervised learning (part 1) Lecture 19

Unsupervised learning (part 1) Lecture 19 Unsupervised learning (part 1) Lecture 19 David Sontag New York University Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore Bayesian networks enable

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

CS388: Natural Language Processing Lecture 9: CNNs, Neural CRFs. CNNs. Administrivia. This Lecture. Greg Durrett. Project 1 due today at 5pm

CS388: Natural Language Processing Lecture 9: CNNs, Neural CRFs. CNNs. Administrivia. This Lecture. Greg Durrett. Project 1 due today at 5pm CS388: Natural Language Processing Lecture 9: CNNs, Neural CRFs Project 1 due today at 5pm Administrivia Mini 2 out tonight, due in two weeks Greg Durrett This Lecture CNNs CNNs for SenLment Neural CRFs

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay Lecture - 21 HMM, Forward and Backward Algorithms, Baum Welch

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

Log-Linear Models with Structured Outputs

Log-Linear Models with Structured Outputs Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120 Spring 2016 Northeastern University David Smith (some slides from Andrew McCallum) Overview Sequence labeling task (cf.

More information

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction Hidden Markov Models (HMMs) for Information Extraction Daniel S. Weld CSE 454 Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) standard sequence model in genomics, speech, NLP, What

More information

Hidden Markov Models

Hidden Markov Models CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

Structured Prediction Models via the Matrix-Tree Theorem

Structured Prediction Models via the Matrix-Tree Theorem Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Probability and Structure in Natural Language Processing

Probability and Structure in Natural Language Processing Probability and Structure in Natural Language Processing Noah Smith, Carnegie Mellon University 2012 Interna@onal Summer School in Language and Speech Technologies Quick Recap Yesterday: Bayesian networks

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

CS1800: Strong Induction. Professor Kevin Gold

CS1800: Strong Induction. Professor Kevin Gold CS1800: Strong Induction Professor Kevin Gold Mini-Primer/Refresher on Unrelated Topic: Limits This is meant to be a problem about reasoning about quantifiers, with a little practice of other skills, too

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace A.I. in health informatics lecture 8 structured learning kevin small & byron wallace today models for structured learning: HMMs and CRFs structured learning is particularly useful in biomedical applications:

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

CSC321 Lecture 9 Recurrent neural nets

CSC321 Lecture 9 Recurrent neural nets CSC321 Lecture 9 Recurrent neural nets Roger Grosse and Nitish Srivastava February 3, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 9 Recurrent neural nets February 3, 2015 1 / 20 Overview You

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Log-Linear Models, MEMMs, and CRFs

Log-Linear Models, MEMMs, and CRFs Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

CS838-1 Advanced NLP: Hidden Markov Models

CS838-1 Advanced NLP: Hidden Markov Models CS838-1 Advanced NLP: Hidden Markov Models Xiaojin Zhu 2007 Send comments to jerryzhu@cs.wisc.edu 1 Part of Speech Tagging Tag each word in a sentence with its part-of-speech, e.g., The/AT representative/nn

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Bayesian networks: approximate inference

Bayesian networks: approximate inference Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Soft Inference and Posterior Marginals. September 19, 2013

Soft Inference and Posterior Marginals. September 19, 2013 Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

CS395T: Structured Models for NLP Lecture 17: CNNs. Greg Durrett

CS395T: Structured Models for NLP Lecture 17: CNNs. Greg Durrett CS395T: Structured Models for NLP Lecture 17: CNNs Greg Durrett Project 2 Results Top 3 scores: Su Wang: 90.13 UAS Greedy logisrc regression with extended feature set, trained for 30 epochs with Adagrad

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

Hidden Markov Models: All the Glorious Gory Details

Hidden Markov Models: All the Glorious Gory Details Hidden Markov Models: All the Glorious Gory Details Noah A. Smith Department of Computer Science Johns Hopkins University nasmith@cs.jhu.edu 18 October 2004 1 Introduction Hidden Markov models (HMMs, hereafter)

More information

Pair Hidden Markov Models

Pair Hidden Markov Models Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Multiclass Classification

Multiclass Classification Multiclass Classification David Rosenberg New York University March 7, 2017 David Rosenberg (New York University) DS-GA 1003 March 7, 2017 1 / 52 Introduction Introduction David Rosenberg (New York University)

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information

CS395T: Structured Models for NLP Lecture 2: Machine Learning Review

CS395T: Structured Models for NLP Lecture 2: Machine Learning Review CS395T: Structured Models for NLP Lecture 2: Machine Learning Review Greg Durrett Some slides adapted from Vivek Srikumar, University of Utah Administrivia Course enrollment Lecture slides posted on website

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017 1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information