Lecture 3: Pattern Classification. Pattern classification
|
|
- Elisabeth Chapman
- 6 years ago
- Views:
Transcription
1 EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and EM Methodological remarks Dan Ellis <dpwe@ee.columbia.edu> Columbia University Dept. of Electrical Engineering Spring 6 E68 SAPR - Dan Ellis L3 - Pattern Classification Pattern classification Classification means: finding categorical (discrete labels for real-world (continuous observations F/Hz 4 f/hz time/s F/Hz ay ao Why? - information etraction - conditional processing - detect eceptions E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -
2 Building a classifier Define classes/attributes - could state eplicit rules - better to define through training eamples Define feature space Define decision algorithm - set parameters from eamples Measure performance - calculate (weighted error rate Pols vowel formants: "u" ( vs. "o" (o F / Hz F / Hz E68 SAPR - Dan Ellis L3 - Pattern Classification Classification system parts Sensor signal segment feature vector Pre-processing/ segmentation Feature etraction STFT Locate vowels Formant etraction Classification class Post-processing Contet constraints Costs/risk E68 SAPR - Dan Ellis L3 - Pattern Classification
3 Feature etraction Right features are critical - waveform vs. formants vs. cepstra - invariance under irrelevant modifications Theoretically equivalent features may act very differently in practice - representations make important aspects eplicit - remove irrelevant information Feature design incorporates domain knowledge - more data less need for cleverness? Smaller feature space (fewer dimensions simpler models (fewer parameters less training data needed faster training E68 SAPR - Dan Ellis L3 - Pattern Classification Minimum Distance Classification 8 Pols vowel formants: "u" (, "o" (o, "a" ( F / Hz 8 6 * o new observation F / Hz Find closest match (Nearest Neighbor min[ D(, y ω i ], - choice of distance metric D(,y is important Find representative match (class prototypes min[ D(, z ω ] - class data { y ω,i } class model z ω E68 SAPR - Dan Ellis L3 - Pattern Classification
4 Linear and nonlinear classifiers Minimum Euclidean distance is equivalent to a linear discriminant function: - eamples { y ω,i } template z ω mean{ y ω,i } - observed feature vector [...] T - Euclidean distance metric Dy [, ] ( y T ( y - minimum distance rule: class i argmin i D[, z i ] argmin i D [, z i ] - i.e. choose class O over U if D [, z O ] < D [, z U ]... E68 SAPR - Dan Ellis L3 - Pattern Classification Linear classifier (cont d Decision rule: Choose class O over U if: D [, z O ] < D [, z U ] ( z O T ( z O < ( z U T ( z U Projection T T + z OzO T z O T T < + z UzU T z U T T T ( z O z U > z OzO z UzU ( z U T z O ( z O z U >< -- ( z O z U T ( z O z U z U Threshold i.e. distance from z U to is less than half the distance to z O... E68 SAPR - Dan Ellis L3 - Pattern Classification
5 Linear classification boundaries Min-distance divides normal to class centers: decision boundary (z O -z U T (z O -z U z O -z U z U o z O (z O -z U (-z U T (z O -z U z O -z U * Scaling aes changes boundary: new boundary o Squeezed horizontally Squeezed vertically new boundary o E68 SAPR - Dan Ellis L3 - Pattern Classification Decision boundaries Linear functions give linear boundaries - planes and hyperplanes as dimensions increase Linear boundaries can become curves under feature transformations - e.g. a linear discriminant in ' [...] T - support vector machines... What about more comple boundaries? - i.e. if a linear discriminant is w i i how about a nonlinear function? - changes only threshold space, but F[ w i i ] F[ w ij i ] j i j i F[ w jk F[ w ij i ]] offers more fleibility, and has even more... E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -
6 Neural networks Sums over nonlinear functions of sums large range of decision surfaces e.g. Multi-layer perceptron (MLP with hidden layer: y k F[ w jk F[ w ij j i ]] j w ij + + h + w h jk + y F[ ] 3 Input layer Hidden layer Problem is finding the weights w ij... (training + y Output layer E68 SAPR - Dan Ellis L3 - Pattern Classification Back-propagation training Find w ij to minimize e.g. E n for training set of patterns { n } and desired (target outputs {t n } Differentiate error with respect to weights - contribution from each pattern n, t n : y k F[ w jk h j j ] E w jk E y y Σ w jk ( y t F' Σ E w jk w jk α w jk ( w jk h j j [ ] h j i.e. gradient descent with learning rate α y( n t n E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -
7 Neural net eample input units (normalized F, F 5 hidden units, 3 output units ( U, O, A F F A O U Sigmoid nonlinearity: F[ ] e df F( F d sigm( d sigm( d E68 SAPR - Dan Ellis L3 - Pattern Classification Neural net training :5:3 net: MS error by training epoch Mean squared error Training epoch 6 iterations 6 iterations eample E68 SAPR - Dan Ellis L3 - Pattern Classification
8 Support Vector Machines (SVMs Principled nonlinear decision boundaries... Key idea: Boundary can be linear in a higher-dimensional space e.g. Φ - depends only on training points at edges (support vectors - unique optimal solution for a given projection Φ - solution involves only inner products in projected space e.g. via kernel (w. + b yi k( y, Φ( Φ( y w (w. + b + y i + (w. + b ( y E68 SAPR - Dan Ellis L3 - Pattern Classification Statistical Interpretation Observations are random variables whose distribution depends on the class: Observation Class ω i (hidden discrete p( ω i continuous Source distributions p( ω i - reflect variability in feature - reflect noise in observation - generally have to be estimated from data (rather than known in advance p( ω i Pr(ω i ω ω ω 3 ω 4 E68 SAPR - Dan Ellis L3 - Pattern Classification
9 Random Variables review Random vars have joint distributions (pdf s: p(,y y µy σ y σ y p(y p( µ σ - marginal p ( py (, dy - covariance Σ E[ ( o µ ( o µ T ] σ σy σ y σ y E68 SAPR - Dan Ellis L3 - Pattern Classification Conditional probability Knowing one value in a joint distribution constrains the remainder: y Y p(,y p( y Y Bayes rule: py (, py ( py ( py ( py ( py ( p ( can reverse conditioning given priors/marginal - either term can be discrete or continuous E68 SAPR - Dan Ellis L3 - Pattern Classification
10 Priors and posteriors Bayesian inference can be interpreted as updating prior beliefs with new information, : Bayes Rule: Pr( ω i Prior probability pω ( i pω ( j Pr( ω j j Likelihood Evidence p( Pr( ω i Posterior probability Posterior is prior scaled by likelihood & normalized by evidence (so Σ(posteriors Objection: priors are often unknown -...but omitting them amounts to assuming they are all equal E68 SAPR - Dan Ellis L3 - Pattern Classification Bayesian (MAP classifier Minimize the probability of error by choosing maimum a posteriori (MAP class: ωˆ argma ω i Pr( ω i Intuitively right - choose most probable class in light of the observation Formally, epected probability of error, EPrerr [ ( ] p ( Pr( err d i X ωˆ i p ( ( Pr( ω i d where X ωˆ is the region of X where ω i is chosen i.. but Pr(ω i is largest in that region, so Pr(err is minimized. E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -
11 Practical implementation Optimal classifier is but we don t know ωˆ argma ω i Pr( ω i Can model directly e.g. train a neural net / SVM to map from inputs to a set of outputs Pr(ω i - a discriminative model Often easier to model conditional distributions then use Bayes rule to find MAP class pω ( i Pr( ω i Labeled training eamples { n,ω n } Sort according to class Estimate conditional pdf for class ω p( ω E68 SAPR - Dan Ellis L3 - Pattern Classification Likelihood models Given models for each distribution, the search for hence or even Choose parameters to maimize data likelihood pω ( i ωˆ argma Pr( ω i ω i pω ( i Pr( ω i becomes argma ω i pω ( j Pr( ω j j but denominator p( is the same over all ω i ωˆ argma ω i log argma ω i pω ( i Pr( ω i [ pω ( i + logpr( ω i ] p ( train ω i E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -
12 Sources of error p( ω Pr(ω X ω ^ X ω ^ p( ω Pr(ω Pr(ω X ω ^ Pr(err X ω ^ Pr(ω X ω ^ Pr(err X ω ^ Suboptimal threshold / regions (bias error - use a Bayes classifier! Incorrect distributions (model error - better distribution models/more training data Misleading features ( Bayes error - irreducible for a given feature set regardless of classification scheme E68 SAPR - Dan Ellis L3 - Pattern Classification Gaussian models Easiest way to model distributions is via parametric model - assume known form, estimate a few parameters Gaussian model is simple & useful: in -D: pω ( i ep πσ i normalization to make it sum to -- µ i σ i Parameters mean µ i and variance σ i fit P M P M e / p( ω i µ i σ i E68 SAPR - Dan Ellis L3 - Pattern Classification
13 Gaussians in d dimensions: p( ω i ( π d Ú Σ i ep -- ( µ i T Σ i ( µ i Described by d dimensional mean vector µ i and d d covariance matri Σ i argma ω i Classify by maimizing log likelihood i.e. -- ( µ i T Σ i ( µ i -- log Σ i + logpr( ω i E68 SAPR - Dan Ellis L3 - Pattern Classification Multivariate Gaussian covariance Eigen analysis of inverse of covariance gives Σ i ( SV T ( SV with diagonal S Λ Ú Gaus n ~ ep -- ( µ i T ( SV T ( SV ( µ i i.e. (SV transforms into uncorrelated, normalized-variance space space SV space v SV v'svv ( µ Mahalanobis distance i T Σ i ( µ i - Euclidean distance in normalized space eample... E68 SAPR - Dan Ellis L3 - Pattern Classification
14 Gaussian Miture models (GMMs Single Gaussians cannot model - distributions with multiple modes - distributions with nonlinear correlation What about a weighted sum? i.e. p ( c k pm ( k k where {c k } is a set of weights and { } is a set of Gaussian components pm ( k - can fit anything given enough components Interpretation: Each observation is generated by one of the Gaussians, chosen at random, with priors: c k Pr( m k E68 SAPR - Dan Ellis L3 - Pattern Classification Gaussian mitures ( e.g. nonlinear correlation: resulting surface original data Gaussian components Problem: finding c k and m k parameters - easy if we knew which m k generated each E68 SAPR - Dan Ellis L3 - Pattern Classification
15 Epectation-maimization (EM General procedure for estimating a model when some parameters are unknown - e.g. which component generated a value in a Gaussian miture model Procedure: Iteratively update fied model parameters Θ to maimize Qepected value of log-probability of known training data trn and unknown parameters u: Q( Θ, Θ old Pr u trn, Θ old ( log p( u, trn Θ u - iterative because Pr( u trn, Θ old changes each time we change Θ - can prove this increases data likelihood, hence maimum-likelihood (ML model - local optimum - depends on initialization E68 SAPR - Dan Ellis L3 - Pattern Classification Fitting GMMs with EM Want to find component Gaussians m k plus weights c k Pr( m k { µ k, Σ k } to maimize likelihood of training data trn If we could assign each to a particular m k, estimation would be direct Hence, treat ks (miture indices as unknown, form Q E[ log p(, k Θ ], differentiate with respect to model parameters equations for µ k, Σ k and c k to maimize Q... E68 SAPR - Dan Ellis L3 - Pattern Classification
16 GMM EM update equations: Solve for maimizing Q: pk n, Θ n µ k pk ( n, Θ n pk n, Θ n Σ k n ( n ( ( n µ k ( n µ k T pk ( n, Θ c k --- pk N ( n, Θ n pk ( n, Θ Each involves, fuzzy membership of n to Gaussian k Parameter is just sample average, weighted by fuzzy membership E68 SAPR - Dan Ellis L3 - Pattern Classification GMM eamples Vowel data fit with different miture counts: Gauss logp(-9 6 Gauss logp( Gauss logp( Gauss logp( eample... E68 SAPR - Dan Ellis L3 - Pattern Classification
17 5 Methodological Remarks: : Training and test data A rich model can learn every training eample (overtraining error rate Test data Training data Overfitting training or parameters But, goal is to classify new, unseen data i.e. generalization - sometimes use cross validation set to decide when to stop training For evaluation results to be meaningful: - don t test with training data! - don t train on test data (even indirectly... E68 SAPR - Dan Ellis L3 - Pattern Classification Model compleity More model parameters better fit - more Gaussian miture components - more MLP hidden units More training data can use larger models For fied training set size, there will be some optimal model size (to avoid overfitting: 44 4 More parameters Word Error Rate More training data WER% Optimal parameter/data ratio Constant training time Training set / hours Hidden layer / units 5 WER for PLPN-8k nets vs. net size & training data E68 SAPR - Dan Ellis L3 - Pattern Classification
18 Model combination No single model is always best But different models have different strengths Benefits from combining several models e.g. Pr( ω w k Pr( ω m k k where each Pr( ω m k comes from a different model / feature set and w k Pr( m k estimates probability that model k is correct Should be able to incorporate into one model, but may be easier to combine in practice Trick is to have good estimates for w k : - static, from training data - dynamic, from test data... E68 SAPR - Dan Ellis L3 - Pattern Classification Summary Classification is making discrete (hard decisions Basis is comparison with known eamples - eplicitly, or via a model Probabilistic interpretation for principled decisions Ways to build models: - neural nets can learn posteriors - Gaussians can model pdfs - other approaches... EM allows estimation of comple models Parting thought: pω ( How do you know what s going on? Pr( ω E68 SAPR - Dan Ellis L3 - Pattern Classification
Lecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationSession 1: Pattern Recognition
Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationPattern Recognition Applied to Music Signals
JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationL20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs
L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationMachine Learning Lecture 2
Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationp(x ω i 0.4 ω 2 ω
p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationLinear Discriminant Functions
Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationBayesian Inference of Noise Levels in Regression
Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationExpectation Maximization Deconvolution Algorithm
Epectation Maimization Deconvolution Algorithm Miaomiao ZHANG March 30, 2011 Abstract In this paper, we use a general mathematical and eperimental methodology to analyze image deconvolution. The main procedure
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationBayesian Approach 2. CSC412 Probabilistic Learning & Reasoning
CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationBayes Decision Theory
Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More information6. Vector Random Variables
6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics 8. Multivariate Analysis Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 2017/18 Multi-Variate Classification Consider
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationRegression with Input-Dependent Noise: A Bayesian Treatment
Regression with Input-Dependent oise: A Bayesian Treatment Christopher M. Bishop C.M.BishopGaston.ac.uk Cazhaow S. Qazaz qazazcsgaston.ac.uk eural Computing Research Group Aston University, Birmingham,
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet
MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1 The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect
More information