Lecture 3: Machine learning, classification, and generative models

Size: px
Start display at page:

Download "Lecture 3: Machine learning, classification, and generative models"

Transcription

1 EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel <mim@ee.columbia.edu> 4 Hidden Markov models Columbia University Dept. of Electrical Engineering dpwe/e6820 February 7, 2008 Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

2 Outline 1 Classification 2 Generative models 3 Gaussian models 4 Hidden Markov models Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

3 Classification and generative models F2/Hz 4000 f/hz time/s x x F1/Hz ay ao Classification discriminative models discrete, categorical random variable of interest fixed set of categories Generative models descriptive models continuous or discrete random variable(s) of interest can estimate parameters Bayes rule makes them useful for classification Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

4 Building a classifier Define classes/attributes could state explicit rules better to define through training examples Define feature space Define decision algorithm set parameters from examples Measure performance calculated (weighted) error rate 1100 Pols vowel formants: "u" (x) vs. "o" (o) 1000 F2 / Hz F1 / Hz Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

5 Classification system parts Sensor signal segment feature vector class Pre-processing/ segmentation Feature extraction Classification Post-processing STFT Locate vowels Formant extraction Context constraints Costs/risk Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

6 Feature extraction Right features are critical waveform vs formants vs cepstra invariance under irrelevant modifications Theoretically equivalent features may act very differently in a particular classifier representations make important aspects explicit remove irrelevant information Feature design incorporates domain knowledge although more data less need for cleverness Smaller feature space (fewer dimensions) simpler models (fewer parameters) less training data needed faster training [inverting MFCCs] Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

7 Optimal classification Minimize probability of error with Bayes optimal decision ˆθ = argmax p(θ i x) θ i p(error) = p(error x)p(x) dx = (1 p(θ i x))p(x) dx i Λ i where Λ i is the region of x where θ i is chosen... but p(θ i x) is largest in that region so p(error) is minimized Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

8 Sources of error p(x ω 1 ) Pr(ω 1 ) X ω1 ^ X ω2 ^ p(x ω 2 ) Pr(ω 2 ) Pr(ω 2 X ω1 ^ ) = Pr(err X ω1 ^ ) x Pr(ω 1 X ω2 ^ ) = Pr(err X ω2 ^ ) Suboptimal threshold / regions (bias error) use a Bayes classifier Incorrect distributions (model error) better distribution models / more training data Misleading features ( Bayes error ) irreducible for given feature set regardless of classification scheme Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

9 Two roads to classification Optimal classifier is but we don t know p(θ i x) ˆθ = argmax θ i p(θ i x) Can model distribution directly e.g. Nearest neighbor, SVM, AdaBoost, neural net maps from inputs x to outputs θi a discriminative model Often easier to model data likelihood p(x θ i ) use Bayes rule to convert to p(θ i x) a generative (descriptive) model Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

10 Nearest neighbor classification 1800 Pols vowel formants: "u" (x), "o" (o), "a" (+) 1600 F2 / Hz x * o + new observation x F1 / Hz Find closest match (Nearest Neighbor) Naïve implementation takes O(N) time for N training points As N, error rate approaches twice the Bayes error rate With K summarized classes, takes O(K) time Locality sensitive hashing gives approximate nearest neighbors in O(dn 1/c2 ) time (Andoni and Indyk, 2006) Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

11 Support vector machines (w. x) + b = 1 Large margin linear classifier for separable data x 2 regularization of margin avoids y i over-fitting = 1 w can be adapted to non-separable data (C parameter) made nonlinear using kernels k(x 1, x 2 ) = Φ(x 1 ) Φ(x 2 ). (w. x) + b = + 1 x 1 (w x) + b = 0 Depends only on training points near the decision boundary, the support vectors Unique, optimal solution for given Φ and C y i = +1 Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

12 Outline 1 Classification 2 Generative models 3 Gaussian models 4 Hidden Markov models Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

13 Generative models Describe the data using structured probabilistic models Observations are random variables whose distribution depends on model parameters Source distributions p(x θ i ) reflect variability in features reflect noise in observation generally have to be estimated from data (rather than known in advance) p(x ω i ) ω 1 ω 2 ω 3 ω 4 x Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

14 Generative models (2) Three things to do with generative models Evaluate the probability of an observation, possibly under multiple parameter settings p(x), p(x θ 1 ), p(x θ 2 ),... Estimate model parameters from observed data ˆθ = argmin C(θ, θ x) θ Run the model forward to generate new data x p(x ˆθ) Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

15 Random variables review Random variables have joint distributions, p(x, y) y p(x,y) Marginal distribution of y p(y) = p(x, y) dx p(y) µy σ y p(x) µ x σ x σ xy x Knowing one value in a joint distribution constrains the remainder Conditional distribution of x given y p(x,y) y Y p(x y) p(x, y) p(y) = p(x, y) p(x, y) dy x p(x y = Y ) Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

16 Bayes rule p(x y)p(y) = p(x, y) = p(y x)p(x) p(y x) = p(x y)p(y) p(x) can reverse conditioning given priors/marginals terms can be discrete or continuous generalizes to more variables p(x, y, z) = p(x y, z)p(y, z) = p(x y, z)p(y z)p(z) allows conversion between joint, marginals, conditionals Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

17 Bayes rule for generative models Run generative models backwards to compare them p(θ x) = Likelihood p(x θ) R p(x θ)p(θ) p(θ) Posterior prob Evidence = p(x) Prior prob Posterior is the classification we re looking for combination of prior belief in each class with likelihood under our model normalized by evidence (so posteriors = 1) Objection: priors are often unknown... but omitting them amounts to assuming they are all equal Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

18 Computing probabilities and estimating parameters Want probability of the observation under a model, p(x) regardless of parameter settings Full Bayesian integral p(x) = p(x θ)p(θ) dθ Difficult to compute in general, approximate as p(x ˆθ) Maximum likelihood (ML) ˆθ = argmax p(x θ) θ Maximum a posteriori (MAP): ML + prior ˆθ = argmax θ p(θ x) = argmax p(x θ)p(θ) θ Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

19 Model checking After estimating parameters, run the model forward Check that model is rich enough to capture variation in data parameters are estimated correctly there aren t any bugs in your code Generate data from the model and compare it to observations x p(x θ) are they similar under some statistics T (x) : R d R? can you find the real data set in a group of synthetic data sets? Then go back and update your model accordingly Gelman et al. (2003, ch. 6) Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

20 Outline 1 Classification 2 Generative models 3 Gaussian models 4 Hidden Markov models Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

21 Gaussian models Easiest way to model distributions is via parametric model assume known form, estimate a few parameters Gaussian model is simple and useful. In 1D [ p(x θ i ) = 1 exp 1 ( ) ] x 2 µi σ i 2π 2 σ i Parameters mean µ i and variance σ i fit P M P M e 1/2 p(x ω i ) µ i σ i x Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

22 Gaussians in d dimensions p(x θ i ) = [ 1 (2π) d/2 exp 1 ] Σ i 1/2 2 (x µ i) T Σ 1 i (x µ i ) Described by a d-dimensional mean µ i and a d d covariance matrix Σ i Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

23 Gaussian mixture models Single Gaussians cannot model distributions with multiple modes distributions with nonlinear correlations What about a weighted sum? p(x) k c k p(x θ k ) where {ck } is a set of weights and {p(x θ k )} is a set of Gaussian components can fit anything given enough components Interpretation: each observation is generated by one of the Gaussians, chosen with probability c k = p(θ k ) Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

24 Gaussian mixtures (2) e.g. nonlinear correlation resulting surface original data Gaussian components Problem: finding c k and θ k parameters easy if we knew which θ k generated each x Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

25 Expectation-maximization (EM) General procedure for estimating model parameters when some are unknown e.g. which GMM component generated a point Iteratively updated model parameters θ to maximize Q, the expected log-probability of observed data x and hidden data z Q(θ, θ t ) = p(z x, θ t ) log p(z, x θ) z E step: calculate p(z x, θt ) using θ t M step: find θ that maximizes Q using p(z x, θt ) can prove p(x θ) non-decreasing hence maximum likelihood model local optimum depends on initialization Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

26 Fitting GMMs with EM Want to find parameters of the Gaussians θk = {µ k, Σ k } weights/priors on Gaussians ck = p(θ k )... that maximize likelihood of training data x If we could assign each x to a particular θ k, estimation would be direct Hence treat mixture indices, z, as hidden form Q = E [p(x, z θ)] differentiate wrt model parameters equations for µ k, Σ k, c k to maximize Q Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

27 GMM EM updated equations Parameters that maximize Q ν nk p(z k x n, θ t ) µ k = n ν nkx n Σ k = c k = 1 N n ν nk n ν nk(x n µ k )(x n µ k ) T n ν nk n ν nk Each involves ν nk, fuzzy membership of x n in Gaussian k Updated parameter is just sample average, weighted by fuzzy membership Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

28 GMM examples Vowel data fit with different mixture counts 1 Gauss logp(x)= Gauss logp(x)= Gauss logp(x)= Gauss logp(x)= [Example... ] Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

29 Outline 1 Classification 2 Generative models 3 Gaussian models 4 Hidden Markov models Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

30 Markov models A (first order) Markov model is a finite-state system whose behavior depends only on the current state The future is independent of the past, conditioned on the present e.g. generative Markov model S A.1 B.1.1 C E p(q n+1 q n ) q n S A B C E q n+1 S A B C E S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

31 Hidden Markov models Markov model where state sequence Q = {q n } is not directly observable ( hidden ) But, observations X do depend on Q x n is RV that depends only on current state p(x n q n ) p(x q) p(x q) Emission distributions q = A q = B q = C q = A q = B q = C observation x xn xn State sequence AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC Observation sequence time step n can still tell something about state sequence... Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

32 (Generative) Markov models HMM is specified by parameters Θ: - states q i k a t - transition probabilities a ij k a t k a t k a t emission distributions b i (x) p(x q) k a t x (+ initial state probabilities π i ) a ij p(q j n q i n 1) b i (x) p(x q i ) π i p(q i 1) Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

33 Markov models for sequence recognition Independence of observations observation xn depends only on current state q n p(x Q) = p(x 1, x 2,... x N q 1, q 2,... q N ) = p(x 1 q 1 )p(x 2 q 2 ) p(x N q N ) N N = p(x n q n ) = b qn (x n ) n=1 n=1 Markov transitions transition to next state q i+1 depends only on q i p(q M) = p(q 1, q 2,... M) = p(q N q N 1... q 1 )p(q N 1 q N 2... q 1 )p(q 2 q 1 )p(q 1 ) = p(q N q N 1 )p(q N 1 q N 2 )p(q 2 q 1 )p(q 1 ) N N = p(q 1 ) p(q n q n 1 ) = π q1 a qn 1 q n n=2 n=2 Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

34 Model-fit calculations From state-based modeling : p(x Θ j ) = all Q p(x Q, Θ j )p(q Θ j ) For HMMs N p(x Q) = b qn (x n ) p(q M) = π q1 n=1 N a qn 1 q n n=2 Hence, solve for ˆΘ = argmax Θj p(θ j X ) Using Bayes rule to convert from p(x Θ j ) Sum over all Q??? Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

35 Summing over all paths 0.9 S S A B E 0.1 Model M A B E S A B E q 0 q 1 q 2 q 3 q 4 S A A A E S A A B E S A B B E S B B B E 0.2 x n p(x B) p(x A) States E B A Observations x 1, x 2, x Observation likelihoods p(x q) x 1 x 2 x 3 A q{ B S time n All possible 3-emission paths Q k from S to E Paths p(q M) = Πn p(q n q n-1 ) p(x Q,M) = Πn p(x n q n ) p(x,q M).9 x.7 x.7 x.1 = x.7 x.2 x.2 = x.2 x.8 x.2 = x.8 x.8 x.2 = x 0.2 x 0.1 = x 0.2 x 2.3 = x 2.2 x 2.3 = x 2.2 x 2.3 = Σ = Σ = p(x M) = n Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

36 The forward recursion Dynamic-programming-like technique to sum over all Q Define α n (i) as the probability of getting to state q i at time step n (by any path): α n (i) = p(x 1, x 2,... x n, q n = q i ) p(x n 1, q i n) α n+1 (j) can be calculated recursively: Model states q i α n (i+1) a i+1j a ij b j (x n+1 ) S α n+1 (j) = [Σ α n (i) a ij ] b j (x n+1 ) i=1 α n (i) Time steps n Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

37 Forward recursion (2) Initialize α 1 (i) = π i b i (x 1 ) Then total probability p(x1 N Θ) = S i=1 α N(i) Practical way to solve for p(x Θ j ) and hence select the most probable model (recognition) Observations X p(x M 1 ) p(m 1 ) p(x M 2 ) p(m 2 ) Choose best Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

38 Optimal path May be interested in actual q n assignments which state was active at each time frame e.g. phone labeling (for training?) Total probability is over all paths... but can also solve for single best path, Viterbi state sequence Probability along best path to state q j n+1 : [ ] ˆα n+1 (j) = max {ˆα n (i)a ij } b j (x n+1 ) i backtrack from final state to get best path final probability is product only (no sum) log-domain calculation is just summation Best path often dominates total probability p(x Θ) p(x, ˆQ Θ) Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

39 Interpreting the Viterbi path Viterbi path assigns each x n to a state q i performing classification based on b i (x)... at the same time applying transition constraints a ij Inferred classification 3 x n 2 Viterbi labels: AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC Can be used for segmentation train an HMM with garbage and target states decode on new data to find targets, boundaries Can use for (heuristic) training e.g. forced alignment to bootstrap speech recognizer e.g. train classifiers based on labels... Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

40 Aside: Training and test data A rich model can learn every training example (overtraining) But the goal is to classify new, unseen data sometimes use cross validation set to decide when to stop training For evaluation results to be meaningful: don t test with training data! don t train on test data (even indirectly... ) Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

41 Aside (2): Model complexity More training data allows the use of larger models More model parameters create a better fit to the training data more Gaussian mixture components more HMM states For fixed training set size, there will be some optimal model size that avoids overtraining Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

42 Summary Classification is making discrete (hard) decisions Basis is comparison with known examples explicitly or via a model Classification models discriminative models, like SVMs, neural nets, boosters, directly learn posteriors p(θ i x) generative models, like Gaussians, GMMs, HMMs, model likelihoods p(x θ) Bayes rule lets us use generative models for classification EM allows parameter estimation even with some data missing Parting thought Is it wise to use generative models for discrimination or vice versa? Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

43 References Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proc. IEEE Symposium on Foundations of Computer Science, pages , Washington, DC, USA, IEEE Computer Society. Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC, second edition, July ISBN X. Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): , Jeff A. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2): , June Michael Mandel (E6820 SAPR) Machine learning February 7, / 43

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Session 1: Pattern Recognition

Session 1: Pattern Recognition Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis

More information

Pattern Recognition Applied to Music Signals

Pattern Recognition Applied to Music Signals JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Outline of Today s Lecture

Outline of Today s Lecture University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Jeff A. Bilmes Lecture 12 Slides Feb 23 rd, 2005 Outline of Today s

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information