CS 6140: Machine Learning Spring PDF Free Download

CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu

Logis@cs Assignment 3 is due on 3/30. 4/13: course project presenta@on. 4/20: final exam.

What we learned last @me Sequen@al labeling models Hidden Markov Models Maximum-entropy Markov model Condi@onal Random Fields

Sample Markov Model for POS 0.1 Det 0.95 Noun 0.9 0.5 0.5 start 0.1 0.4 0.1 PropNoun 0.05 0.8 0.25 0.25 0.1 Verb stop

The Markov Assump@on

Hidden Markov Models (HMMs) Words Part-of-Speech tags

Formally

Viterbi Backtrace s 1 s 0 s 2 s N s F t 1 t 2 t 3 t T-1 t T Most likely Sequence: s 0 s N s 1 s 2 s 2 s F

Log-Linear Models

Using Log-Linear Models

Condi@onal Random Fields (CRFs)

Today s Outline Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on [Some slides are borrowed from Christopher Bishop and David Sontag]

Today s Outline Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on

K-means Algorithm Goal: represent a data set in terms of K clusters each of which is summarized by a prototype (mean) Ini@alize prototypes, then iterate between two phases: Step 1: assign each data point to nearest prototype Step 2: update prototypes to be the cluster means Simplest version is based on Euclidean distance

BCS Summer School, Exeter, 2003 Christopher M. Bishop

The Gaussian Distribu@on Mul@variate Gaussian mean covariance

Gaussian Mixtures Linear super-posi@on of Gaussians Normaliza@on and posi@vity require Can interpret the mixing coefficients as prior probabili@es

Example: Mixture of 3 Gaussians

Contours of Probability Distribu@on

Sampling from the Gaussian To generate a data point: first pick one of the components with probability then draw a sample from that component Repeat these two steps for each new data point

Synthe@c Data Set

Synthe@c Data Set Without Labels

Fieng the Gaussian Mixture We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means Covariances

Fieng the Gaussian Mixture We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means covariances If we knew which component generated each data point, the maximum likelihood solu@on would involve fieng each component to the corresponding cluster Problem: the data set is unlabelled We shall refer to the labels as latent (= hidden) variables

Synthe@c Data Set Without Labels

Posterior Probabili@es We can think of the mixing coefficients as prior probabili@es for the components For a given value of we can evaluate the corresponding posterior probabili@es, called responsibili,es These are given from Bayes theorem by

Posterior Probabili@es (colour coded)

Today s Outline Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on

BCS Summer School, Exeter, 2003 Christopher M. Bishop

EM in General Consider arbitrary distribu@on over the latent variables (p is the true distribu@on) The following decomposi@on always holds where

Decomposi@on

Op@mizing the Bound E-step: maximize with respect to equivalent to minimizing KL divergence sets equal to the posterior distribu@on M-step: maximize bound with respect to equivalent to maximizing expected complete-data log likelihood Each EM cycle must increase incomplete-data likelihood unless already at a (local) maximum

E-step

M-step

Today s Outline Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on [Slides are based on David Blei s ICML 2012 tutorial]

Genera@ve model for a document in LDA

Comparison of mixture and admixture models

Usage of LDA

EM for mixture models

What We Learned Today Bayesian Networks Mixture Models Expecta@on Maximiza@on Latent Dirichlet Alloca@on

Homework Reading Murphy 11.1-11.2, 11.4.1-11.4.4, 27.1-27.3 More about EM hhp://cs229.stanford.edu/notes/cs229-notes7b.pdf hhp://cs229.stanford.edu/notes/cs229-notes8.pdf More about LDA hhp://menome.com/wp/wp-content/uploads/ 2014/12/Blei2011.pdf hhp://obphio.us/pdfs/lda_tutorial.pdf