CS 6140: Machine Learning Spring 2017

Similar documents
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Latent Dirichlet Alloca/on

Statistical Pattern Recognition

Statistical Pattern Recognition

Bayesian networks Lecture 18. David Sontag New York University

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning Techniques for Computer Vision

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models

Gaussian Mixture Models, Expectation Maximization

A brief introduction to Conditional Random Fields

Latent Dirichlet Allocation (LDA)

Collapsed Variational Bayesian Inference for Hidden Markov Models

Latent Variable Models and Expectation Maximization

Brief Introduction of Machine Learning Techniques for Content Analysis

13: Variational inference II

Predicting Sequences: Structured Perceptron. CS 6355: Structured Prediction

p L yi z n m x N n xi

Unsupervised learning (part 1) Lecture 19

Expectation Maximization

Pattern Recognition and Machine Learning

STA 4273H: Statistical Machine Learning

Expectation Maximization (EM)

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Latent Variable Models and Expectation Maximization

Expectation Maximization (EM)

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

Lecture 7: Con3nuous Latent Variable Models

L11: Pattern recognition principles

Expectation Maximization

Generative Clustering, Topic Modeling, & Bayesian Inference

STA 414/2104: Machine Learning

Introduction to Machine Learning CMU-10701

CS Lecture 18. Topic Models and LDA

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Log-Linear Models, MEMMs, and CRFs

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Clustering, K-Means, EM Tutorial

Gaussian Mixture Models

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

CSCI-567: Machine Learning (Spring 2019)

MIXTURE MODELS AND EM

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Latent Variable Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

Clustering VS Classification

Non-Parametric Bayes

Mixture Models. Michael Kuhn

The Expectation-Maximization Algorithm

Lecture 3: ASR: HMMs, Forward, Viterbi

Recent Advances in Bayesian Inference Techniques

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Lecture 1a: Basic Concepts and Recaps

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Dimension Reduction (PCA, ICA, CCA, FLD,

STA 4273H: Statistical Machine Learning

Mixtures of Gaussians. Sargur Srihari

Outline of Today s Lecture

Clustering and Gaussian Mixtures

Mixture of Gaussians Models

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Probability and Structure in Natural Language Processing

A minimalist s exposition of EM

Lecture 12: Algorithms for HMMs

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Introduction to Machine Learning

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

Statistical learning. Chapter 20, Sections 1 4 1

Linear Dynamical Systems

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Hidden Markov Models

Lecture 12: Algorithms for HMMs

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

p(d θ ) l(θ ) 1.2 x x x

Graphical Models for Collaborative Filtering

STAD68: Machine Learning

Study Notes on the Latent Dirichlet Allocation

Machine Learning Lecture 5

Introduction to Machine Learning

Bayesian Nonparametrics for Speech and Signal Processing

Generative Model (Naïve Bayes, LDA)

Data Preprocessing. Cluster Similarity

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

AN INTRODUCTION TO TOPIC MODELS

Lecture 13 : Variational Inference: Mean Field Approximation

ECE521 Lecture 19 HMM cont. Inference in HMM

Transcription:

CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu

Logis@cs Assignment 3 is due on 3/30. 4/13: course project presenta@on. 4/20: final exam.

What we learned last @me Sequen@al labeling models Hidden Markov Models Maximum-entropy Markov model Condi@onal Random Fields

Sample Markov Model for POS 0.1 Det 0.95 Noun 0.9 0.5 0.5 start 0.1 0.4 0.1 PropNoun 0.05 0.8 0.25 0.25 0.1 Verb stop

The Markov Assump@on

Hidden Markov Models (HMMs) Words Part-of-Speech tags

Formally

Viterbi Backtrace s 1 s 0 s 2 s N s F t 1 t 2 t 3 t T-1 t T Most likely Sequence: s 0 s N s 1 s 2 s 2 s F

Log-Linear Models

Using Log-Linear Models

Condi@onal Random Fields (CRFs)

Today s Outline Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on [Some slides are borrowed from Christopher Bishop and David Sontag]

Today s Outline Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on

K-means Algorithm Goal: represent a data set in terms of K clusters each of which is summarized by a prototype (mean) Ini@alize prototypes, then iterate between two phases: Step 1: assign each data point to nearest prototype Step 2: update prototypes to be the cluster means Simplest version is based on Euclidean distance

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

The Gaussian Distribu@on Mul@variate Gaussian mean covariance

Gaussian Mixtures Linear super-posi@on of Gaussians Normaliza@on and posi@vity require Can interpret the mixing coefficients as prior probabili@es

Example: Mixture of 3 Gaussians

Contours of Probability Distribu@on

Sampling from the Gaussian To generate a data point: first pick one of the components with probability then draw a sample from that component Repeat these two steps for each new data point

Synthe@c Data Set

Synthe@c Data Set Without Labels

Fieng the Gaussian Mixture We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means Covariances

Fieng the Gaussian Mixture We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means covariances If we knew which component generated each data point, the maximum likelihood solu@on would involve fieng each component to the corresponding cluster Problem: the data set is unlabelled We shall refer to the labels as latent (= hidden) variables

Synthe@c Data Set Without Labels

Posterior Probabili@es We can think of the mixing coefficients as prior probabili@es for the components For a given value of we can evaluate the corresponding posterior probabili@es, called responsibili,es These are given from Bayes theorem by

Posterior Probabili@es (colour coded)

Today s Outline Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

BCS Summer School, Exeter, 2003 Christopher M. Bishop

EM in General Consider arbitrary distribu@on over the latent variables (p is the true distribu@on) The following decomposi@on always holds where

Decomposi@on

Op@mizing the Bound E-step: maximize with respect to equivalent to minimizing KL divergence sets equal to the posterior distribu@on M-step: maximize bound with respect to equivalent to maximizing expected complete-data log likelihood Each EM cycle must increase incomplete-data likelihood unless already at a (local) maximum

E-step

M-step

Today s Outline Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on [Slides are based on David Blei s ICML 2012 tutorial]

Genera@ve model for a document in LDA

Genera@ve model for a document in LDA

Comparison of mixture and admixture models

Usage of LDA

EM for mixture models

EM for mixture models

What We Learned Today Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on

Homework Reading Murphy 11.1-11.2, 11.4.1-11.4.4, 27.1-27.3 More about EM hhp://cs229.stanford.edu/notes/cs229-notes7b.pdf hhp://cs229.stanford.edu/notes/cs229-notes8.pdf More about LDA hhp://menome.com/wp/wp-content/uploads/ 2014/12/Blei2011.pdf hhp://obphio.us/pdfs/lda_tutorial.pdf