Lecture 3: Pattern Classification
|
|
- Primrose Cannon
- 5 years ago
- Views:
Transcription
1 EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures and EM Methodological remarks Dan Ellis <dpwe@ee.columbia.edu> Columbia University Dept. of Electrical Engineering Spring 2003 E6820 SAPR - Dan Ellis L03 - Pattern Classification
2 1 Pattern classification Classification means: finding categorical (discrete labels for real-world (continuous observations F2/Hz 4000 f/hz time/s x x F1/Hz ay ao Why? - information extraction - conditional processing - detect exceptions E6820 SAPR - Dan Ellis L03 - Pattern Classification
3 Building a classifier Define classes/attributes - could state explicit rules - better to define through training examples Define feature space Define decision algorithm - set parameters from examples Measure performance - calculate (weighted error rate 1100 Pols vowel formants: "u" (x vs. "o" (o 1000 F2 / Hz F1 / Hz E6820 SAPR - Dan Ellis L03 - Pattern Classification
4 Classification system parts Sensor signal segment feature vector Pre-processing/ segmentation Feature extraction STFT Locate vowels Formant extraction class Classification Post-processing Context constraints Costs/risk E6820 SAPR - Dan Ellis L03 - Pattern Classification
5 Feature extraction Right features are critical - waveform vs. formants vs. cepstra - invariance under irrelevant modifications Theoretically equivalent features may act very differently in practice - representations make important aspects explicit - remove irrelevant information Feature design incorporates domain knowledge - more data less need for cleverness? Smaller feature space (fewer dimensions simpler models (fewer parameters less training data needed faster training E6820 SAPR - Dan Ellis L03 - Pattern Classification
6 Minimum distance classification Pols vowel formants: "u" (x, "o" (o, "a" ( F2 / Hz x * o new observation x F1 / Hz Find closest match (nearest neighbor min[ D( x, y ω i ], - choice of distance metric D(x,y is important Find representative match (class prototypes min[ D( x, z ω ] - class data { y ω,i } class model z ω E6820 SAPR - Dan Ellis L03 - Pattern Classification
7 2 Linear and nonlinear classifiers Minimum Euclidean distance is equivalent to a linear discriminant function: - examples { y ω,i } template z ω = mean{ y ω,i } - observed feature vector x = [x 1 x 2...] T - Euclidean distance metric Dxy [, ] = ( x y T ( x y - minimum distance rule: class i = argmin i D[ x, z i ] = argmin i D 2 [ x, z i ] - i.e. choose class O over U if D 2 [ x, z O ] < D 2 [ x, z U ]... E6820 SAPR - Dan Ellis L03 - Pattern Classification
8 Linear classifier (cont d Decision rule: Choose class O over U if: D 2 [ x, z O ] < D 2 [ x, z U ] ( x z O T ( x z O < ( x z U T ( x z U Projection x T T x + z OzO 2x T z O x T T < x + z UzU 2x T z U 2x T T T ( z O z U > z OzO z UzU Threshold ( x z U T ( z O z U > 1 -- ( z 2 O z U T ( z O z U i.e. distance from z U to x is less than half the distance to z O... E6820 SAPR - Dan Ellis L03 - Pattern Classification
9 Linear classification boundaries Min-distance divides normal to class centers: decision boundary 1 (z O -z U T (z O -z U 2 z O -z U z U x o z O (z O -z U (x-z U T (z O -z U z O -z U * x Scaling axes changes boundary: new boundary x o Squeezed horizontally x Squeezed vertically new boundary o E6820 SAPR - Dan Ellis L03 - Pattern Classification
10 Decision boundaries Linear functions give linear boundaries - planes and hyperplanes as dimensions increase Linear boundaries can become curves under feature transformations - e.g. a linear discriminant in x' = [x 2 1 x ] T What about more complex boundaries? - i.e. if a linear discriminant is how about a nonlinear function? F[ w i x i ] F[ w ij x i ] w i x i - changes only threshold space, but j i j i F[ w jk F[ w ij x i ]] offers more flexibility, and has even more... E6820 SAPR - Dan Ellis L03 - Pattern Classification
11 Neural networks Sums over nonlinear functions of sums large range of decision surfaces e.g. Multi-layer perceptron (MLP with 1 hidden layer: y k = F[ w jk F[ w ij x j i ]] j x 1 + x w 2 h jk + o 1 x F[ ] Input layer w ij + h 1 Hidden layer Problem is finding the weights w ij... (training + o2 Output layer E6820 SAPR - Dan Ellis L03 - Pattern Classification
12 Back-propagation training Find w ij to minimize e.g. E = n for training set of patterns {x n } and desired (target outputs {t n } o( x n t n Differentiate error with respect to weights: o k = F[ w jk h j j ] E w jk = = E o o Σ w jk 2( o t F' Σ w jk = w jk α j ( w jk h j [ ] h j E w jk i.e. gradient descent with learning rate α 2 E6820 SAPR - Dan Ellis L03 - Pattern Classification
13 Neural net example 2 input units (normalized F1, F2 5 hidden units, 3 output units ( U, O, A F1 F2 A O U Sigmoid nonlinearity: F[ x] 1 = e x df = F( 1 F dx sigm(x d sigm(x dx E6820 SAPR - Dan Ellis L03 - Pattern Classification
14 1 Neural net training 2:5:3 net: MS error by training epoch Mean squared error Training epoch 10 iterations iterations example E6820 SAPR - Dan Ellis L03 - Pattern Classification
15 3 Statistical Interpretation Observations are random variables whose distribution depends on the class: Observation x Class ω i (hidden discrete continuous Source distributions p(x ω i - reflect variability in feature - reflect noise in observation - generally have to be estimated from data (rather than known in advance p(x ω i p(x ω i Pr(ω i x ω 1 ω 2 ω 3 ω 4 x E6820 SAPR - Dan Ellis L03 - Pattern Classification
16 Random Variables review Random vars have joint distributions (pdf s: p(x,y y µy σ y σ xy p(y p(x µ x σ x x - marginal px ( = pxy (, dy - covariance Σ = E[ ( o µ ( o µ T ] = σ x 2 σxy σ xy σ y 2 E6820 SAPR - Dan Ellis L03 - Pattern Classification
17 Conditional probability Knowing one value in a joint distribution constrains the remainder: y Y p(x,y x p(x y = Y Bayes rule: pxy (, = pxy ( py ( pyx ( = pxy ( py ( px ( can reverse conditioning given priors/marginal - either term can be discrete or continuous E6820 SAPR - Dan Ellis L03 - Pattern Classification
18 Priors and posteriors Bayesian inference can be interpreted as updating prior beliefs with new information, x: Pr( ω i Prior probability pxω ( i pxω ( j Pr( ω j j Likelihood Evidence = p(x = Pr( ω i x Posterior probability Posterior is prior scaled by likelihood & normalized by evidence (so Σ(posteriors = 1 Objection: priors are often unknown - but omitting them amounts to assuming they are all equal E6820 SAPR - Dan Ellis L03 - Pattern Classification
19 Bayesian (MAP classifier Minimize the probability of error by choosing maximum a posteriori (MAP class: ωˆ = argmax ω i Pr( ω i x Intuitively right - choose most probable class in light of the observation Formally, expected probability of error, EPrerr [ ( ] = px ( Pr( err x dx = i X ωˆ i px ( ( 1 Pr( ω i x dx where X ωˆ is the region of X where ω i is chosen i.. but Pr(ω i x is largest in that region, so Pr(err is minimized. E6820 SAPR - Dan Ellis L03 - Pattern Classification
20 Practical implementation Optimal classifier is but we don t know ωˆ = Pr( ω i x argmax ω i Pr( ω i x Can model directly e.g. train a neural net to map from inputs x to a set of outputs Pr(ω i - a discriminative model Often easier to model conditional distributions then use Bayes rule to find MAP class pxω ( i Labeled training examples {x n,ω xn } Sort according to class Estimate conditional pdf for class ω 1 p(x ω 1 E6820 SAPR - Dan Ellis L03 - Pattern Classification
21 Likelihood models pxω ( i the search for ωˆ = argmax Pr( ω i x ω i pxω ( i Pr( ω i becomes argmax ω i pxω ( j Pr( ω j j but denominator = p(x is the same over all ω i Given models for each distribution, hence ωˆ or even = argmax ω i argmax ω i log pxω ( i Pr( ω i [ pxω ( i + logpr( ω i ] Choose parameters to maximize data likelihood, px ( train ω i E6820 SAPR - Dan Ellis L03 - Pattern Classification
22 Sources of error p(x ω 1 Pr(ω 1 X ω1 ^ X ω2 ^ p(x ω 2 Pr(ω 2 Pr(ω 2 X ω1 ^ = Pr(err X ω1 ^ x Pr(ω 1 X ω2 ^ = Pr(err X ω2 ^ Suboptimal threshold / regions (bias error - use a Bayes classifier! Incorrect distributions (model error - better distribution models/more training data Misleading features ( Bayes error - irreducible for a given feature set regardless of classification scheme E6820 SAPR - Dan Ellis L03 - Pattern Classification
23 4 Gaussian models Easiest way to model distributions is via parametric model - assume known form, estimate a few parameters Gaussian model is simple & useful: in 1 dim: pxω ( i = πσ i exp normalization to make it sum to x µ i σ i Parameters mean µ i and variance σ i 2 fit P M P M e 1/2 p(x ω i µ i σ i x E6820 SAPR - Dan Ellis L03 - Pattern Classification
24 Gaussians in d dimensions: p( x ω i = ( 2π d 1Ú 2 Σ i exp 1 -- ( x µ 2 i T 1 Σ i ( x µ i Described by d dimensional mean vector µ i and d x d covariance matrix Σ i argmax ω i Classify by maximizing log likelihood i.e ( x µ 2 i T 1 Σ i ( x µ i 1 -- log Σ 2 i + logpr( ω i E6820 SAPR - Dan Ellis L03 - Pattern Classification
25 Multivariate Gaussian covariance Eigen analysis of inverse of covariance gives 1 Σ i = ( SV T ( SV with diagonal S Λ 1 Ú 2 = exponent is 1 -- ( x µ 2 i T ( SV T ( SV ( x µ i i.e. (SV transforms x into uncorrelated, normalized-variance space x space SVx space v SV v'=svv ( x µ = Mahalanobis distance 2 i T 1 Σ i ( x µ i - Euclidean distance in normalized space example... E6820 SAPR - Dan Ellis L03 - Pattern Classification
26 Gaussian Mixture models (GMMs Single Gaussians cannot model - distributions with multiple modes - distributions with nonlinear correlation What about a weighted sum? px ( c k pxm ( k i.e. k where {c k } is a set of weights and pxm ( k { } is a set of Gaussian components - can fit anything given enough components Interpretation: each observation is generated by one of the Gaussians, chosen at random with priors c k = Pr( m k E6820 SAPR - Dan Ellis L03 - Pattern Classification
27 Gaussian mixtures (2 e.g. nonlinear correlation: resulting surface original data Gaussian components Problem: finding c k and m k parameters - easy if we knew which m k generated each x E6820 SAPR - Dan Ellis L03 - Pattern Classification
28 Expectation-maximization (EM General procedure for estimating a model when some parameters are unknown - e.g. which component generated a value in a Gaussian mixture model Procedure: Iteratively update fixed model parameters Θ to maximize Q=expected value of log-probability of known training data x trn and unknown parameters u: Q( Θ, Θ old = Pr u x trn, Θ old ( log p( u, x trn Θ u - iterative because Pr( u x trn, Θ old changes each time we change Θ - can prove this increases data likelihood, hence maximum-likelihood (ML model - local optimum - depends on initialization E6820 SAPR - Dan Ellis L03 - Pattern Classification
29 Fitting GMMs with EM Want to find component Gaussians m k { µ k, Σ k } plus weights = = c k Pr( m k to maximize likelihood of training data x trn If we could assign each x to a particular m k, estimation would be direct Hence, treat ks (mixture indices as unknown, form Q = E[ log p( x, k Θ ], differentiate with respect to model parameters equations for µ k, Σ k and c k to maximize Q... E6820 SAPR - Dan Ellis L03 - Pattern Classification
30 GMM EM update equations: Solve for maximizing Q: pkx n, Θ n µ k = pkx ( n, Θ n pkx n, Θ n Σ k = n ( x n ( ( x n µ k ( x n µ k T pkx ( n, Θ 1 c k = --- pkx N ( n, Θ n pkx ( n, Θ Each involves, fuzzy membership of x n to Gaussian k Parameter is just sample average, weighted by fuzzy membership E6820 SAPR - Dan Ellis L03 - Pattern Classification
31 GMM examples Vowel data fit with different mixture counts: 1 Gauss logp(x= Gauss logp(x= Gauss logp(x= Gauss logp(x= example... E6820 SAPR - Dan Ellis L03 - Pattern Classification
32 5 Methodological Remarks: 1: Training and test data A rich model can learn every training example (overtraining error rate Test data Training data Overfitting training or parameters But, goal is to classify new, unseen data i.e. generalization - sometimes use cross validation set to decide when to stop training For evaluation results to be meaningful: - don t test with training data! - don t train on test data (even indirectly... E6820 SAPR - Dan Ellis L03 - Pattern Classification
33 Model complexity More model parameters better fit - more Gaussian mixture components - more MLP hidden units More training data can use larger models For best generalization (no overfitting, there will be some optimal model size: More parameters Word Error Rate More training data WER% Optimal parameter/data ratio Constant training time Training set / hours Hidden layer / units WER for PLP12N-8k nets vs. net size & training data E6820 SAPR - Dan Ellis L03 - Pattern Classification
34 Model combination No single model is always best But different models have different strengths Benefits from combining several models e.g. Pr( ω = w k Pr( ω m k k Pr( ω m k where each comes from a different model / feature set and w k = Pr( m k estimates probability that model k is correct Should be able to incorporate into one model, but may be easier to combine in practice Trick is to have good estimates for w k : - static, from training data - dynamic, from test data... E6820 SAPR - Dan Ellis L03 - Pattern Classification
35 Summary Classification is making hard decisions Basis is comparison with known examples - explicitly, or via a model Probabilistic interpretation for principled decisions Ways to build models: - neural nets can learn posteriors - Gaussians can model pdfs pxω ( Pr( ω x EM allows estimation of complex models Parting thought: How do you know what s going on? E6820 SAPR - Dan Ellis L03 - Pattern Classification
Lecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationSession 1: Pattern Recognition
Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationPattern Recognition Applied to Music Signals
JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationBayes Decision Theory
Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationCS 136 Lecture 5 Acoustic modeling Phoneme modeling
+ September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More information