Lecture 3: Pattern Classification. Pattern classification

Size: px
Start display at page:

Download "Lecture 3: Pattern Classification. Pattern classification"

Transcription

1 EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and EM Methodological remarks Dan Ellis <dpwe@ee.columbia.edu> Columbia University Dept. of Electrical Engineering Spring 6 E68 SAPR - Dan Ellis L3 - Pattern Classification Pattern classification Classification means: finding categorical (discrete labels for real-world (continuous observations F/Hz 4 f/hz time/s F/Hz ay ao Why? - information etraction - conditional processing - detect eceptions E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -

2 Building a classifier Define classes/attributes - could state eplicit rules - better to define through training eamples Define feature space Define decision algorithm - set parameters from eamples Measure performance - calculate (weighted error rate Pols vowel formants: "u" ( vs. "o" (o F / Hz F / Hz E68 SAPR - Dan Ellis L3 - Pattern Classification Classification system parts Sensor signal segment feature vector Pre-processing/ segmentation Feature etraction STFT Locate vowels Formant etraction Classification class Post-processing Contet constraints Costs/risk E68 SAPR - Dan Ellis L3 - Pattern Classification

3 Feature etraction Right features are critical - waveform vs. formants vs. cepstra - invariance under irrelevant modifications Theoretically equivalent features may act very differently in practice - representations make important aspects eplicit - remove irrelevant information Feature design incorporates domain knowledge - more data less need for cleverness? Smaller feature space (fewer dimensions simpler models (fewer parameters less training data needed faster training E68 SAPR - Dan Ellis L3 - Pattern Classification Minimum Distance Classification 8 Pols vowel formants: "u" (, "o" (o, "a" ( F / Hz 8 6 * o new observation F / Hz Find closest match (Nearest Neighbor min[ D(, y ω i ], - choice of distance metric D(,y is important Find representative match (class prototypes min[ D(, z ω ] - class data { y ω,i } class model z ω E68 SAPR - Dan Ellis L3 - Pattern Classification

4 Linear and nonlinear classifiers Minimum Euclidean distance is equivalent to a linear discriminant function: - eamples { y ω,i } template z ω mean{ y ω,i } - observed feature vector [...] T - Euclidean distance metric Dy [, ] ( y T ( y - minimum distance rule: class i argmin i D[, z i ] argmin i D [, z i ] - i.e. choose class O over U if D [, z O ] < D [, z U ]... E68 SAPR - Dan Ellis L3 - Pattern Classification Linear classifier (cont d Decision rule: Choose class O over U if: D [, z O ] < D [, z U ] ( z O T ( z O < ( z U T ( z U Projection T T + z OzO T z O T T < + z UzU T z U T T T ( z O z U > z OzO z UzU ( z U T z O ( z O z U >< -- ( z O z U T ( z O z U z U Threshold i.e. distance from z U to is less than half the distance to z O... E68 SAPR - Dan Ellis L3 - Pattern Classification

5 Linear classification boundaries Min-distance divides normal to class centers: decision boundary (z O -z U T (z O -z U z O -z U z U o z O (z O -z U (-z U T (z O -z U z O -z U * Scaling aes changes boundary: new boundary o Squeezed horizontally Squeezed vertically new boundary o E68 SAPR - Dan Ellis L3 - Pattern Classification Decision boundaries Linear functions give linear boundaries - planes and hyperplanes as dimensions increase Linear boundaries can become curves under feature transformations - e.g. a linear discriminant in ' [...] T - support vector machines... What about more comple boundaries? - i.e. if a linear discriminant is w i i how about a nonlinear function? - changes only threshold space, but F[ w i i ] F[ w ij i ] j i j i F[ w jk F[ w ij i ]] offers more fleibility, and has even more... E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -

6 Neural networks Sums over nonlinear functions of sums large range of decision surfaces e.g. Multi-layer perceptron (MLP with hidden layer: y k F[ w jk F[ w ij j i ]] j w ij + + h + w h jk + y F[ ] 3 Input layer Hidden layer Problem is finding the weights w ij... (training + y Output layer E68 SAPR - Dan Ellis L3 - Pattern Classification Back-propagation training Find w ij to minimize e.g. E n for training set of patterns { n } and desired (target outputs {t n } Differentiate error with respect to weights - contribution from each pattern n, t n : y k F[ w jk h j j ] E w jk E y y Σ w jk ( y t F' Σ E w jk w jk α w jk ( w jk h j j [ ] h j i.e. gradient descent with learning rate α y( n t n E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -

7 Neural net eample input units (normalized F, F 5 hidden units, 3 output units ( U, O, A F F A O U Sigmoid nonlinearity: F[ ] e df F( F d sigm( d sigm( d E68 SAPR - Dan Ellis L3 - Pattern Classification Neural net training :5:3 net: MS error by training epoch Mean squared error Training epoch 6 iterations 6 iterations eample E68 SAPR - Dan Ellis L3 - Pattern Classification

8 Support Vector Machines (SVMs Principled nonlinear decision boundaries... Key idea: Boundary can be linear in a higher-dimensional space e.g. Φ - depends only on training points at edges (support vectors - unique optimal solution for a given projection Φ - solution involves only inner products in projected space e.g. via kernel (w. + b yi k( y, Φ( Φ( y w (w. + b + y i + (w. + b ( y E68 SAPR - Dan Ellis L3 - Pattern Classification Statistical Interpretation Observations are random variables whose distribution depends on the class: Observation Class ω i (hidden discrete p( ω i continuous Source distributions p( ω i - reflect variability in feature - reflect noise in observation - generally have to be estimated from data (rather than known in advance p( ω i Pr(ω i ω ω ω 3 ω 4 E68 SAPR - Dan Ellis L3 - Pattern Classification

9 Random Variables review Random vars have joint distributions (pdf s: p(,y y µy σ y σ y p(y p( µ σ - marginal p ( py (, dy - covariance Σ E[ ( o µ ( o µ T ] σ σy σ y σ y E68 SAPR - Dan Ellis L3 - Pattern Classification Conditional probability Knowing one value in a joint distribution constrains the remainder: y Y p(,y p( y Y Bayes rule: py (, py ( py ( py ( py ( py ( p ( can reverse conditioning given priors/marginal - either term can be discrete or continuous E68 SAPR - Dan Ellis L3 - Pattern Classification

10 Priors and posteriors Bayesian inference can be interpreted as updating prior beliefs with new information, : Bayes Rule: Pr( ω i Prior probability pω ( i pω ( j Pr( ω j j Likelihood Evidence p( Pr( ω i Posterior probability Posterior is prior scaled by likelihood & normalized by evidence (so Σ(posteriors Objection: priors are often unknown -...but omitting them amounts to assuming they are all equal E68 SAPR - Dan Ellis L3 - Pattern Classification Bayesian (MAP classifier Minimize the probability of error by choosing maimum a posteriori (MAP class: ωˆ argma ω i Pr( ω i Intuitively right - choose most probable class in light of the observation Formally, epected probability of error, EPrerr [ ( ] p ( Pr( err d i X ωˆ i p ( ( Pr( ω i d where X ωˆ is the region of X where ω i is chosen i.. but Pr(ω i is largest in that region, so Pr(err is minimized. E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -

11 Practical implementation Optimal classifier is but we don t know ωˆ argma ω i Pr( ω i Can model directly e.g. train a neural net / SVM to map from inputs to a set of outputs Pr(ω i - a discriminative model Often easier to model conditional distributions then use Bayes rule to find MAP class pω ( i Pr( ω i Labeled training eamples { n,ω n } Sort according to class Estimate conditional pdf for class ω p( ω E68 SAPR - Dan Ellis L3 - Pattern Classification Likelihood models Given models for each distribution, the search for hence or even Choose parameters to maimize data likelihood pω ( i ωˆ argma Pr( ω i ω i pω ( i Pr( ω i becomes argma ω i pω ( j Pr( ω j j but denominator p( is the same over all ω i ωˆ argma ω i log argma ω i pω ( i Pr( ω i [ pω ( i + logpr( ω i ] p ( train ω i E68 SAPR - Dan Ellis L3 - Pattern Classification 6-- -

12 Sources of error p( ω Pr(ω X ω ^ X ω ^ p( ω Pr(ω Pr(ω X ω ^ Pr(err X ω ^ Pr(ω X ω ^ Pr(err X ω ^ Suboptimal threshold / regions (bias error - use a Bayes classifier! Incorrect distributions (model error - better distribution models/more training data Misleading features ( Bayes error - irreducible for a given feature set regardless of classification scheme E68 SAPR - Dan Ellis L3 - Pattern Classification Gaussian models Easiest way to model distributions is via parametric model - assume known form, estimate a few parameters Gaussian model is simple & useful: in -D: pω ( i ep πσ i normalization to make it sum to -- µ i σ i Parameters mean µ i and variance σ i fit P M P M e / p( ω i µ i σ i E68 SAPR - Dan Ellis L3 - Pattern Classification

13 Gaussians in d dimensions: p( ω i ( π d Ú Σ i ep -- ( µ i T Σ i ( µ i Described by d dimensional mean vector µ i and d d covariance matri Σ i argma ω i Classify by maimizing log likelihood i.e. -- ( µ i T Σ i ( µ i -- log Σ i + logpr( ω i E68 SAPR - Dan Ellis L3 - Pattern Classification Multivariate Gaussian covariance Eigen analysis of inverse of covariance gives Σ i ( SV T ( SV with diagonal S Λ Ú Gaus n ~ ep -- ( µ i T ( SV T ( SV ( µ i i.e. (SV transforms into uncorrelated, normalized-variance space space SV space v SV v'svv ( µ Mahalanobis distance i T Σ i ( µ i - Euclidean distance in normalized space eample... E68 SAPR - Dan Ellis L3 - Pattern Classification

14 Gaussian Miture models (GMMs Single Gaussians cannot model - distributions with multiple modes - distributions with nonlinear correlation What about a weighted sum? i.e. p ( c k pm ( k k where {c k } is a set of weights and { } is a set of Gaussian components pm ( k - can fit anything given enough components Interpretation: Each observation is generated by one of the Gaussians, chosen at random, with priors: c k Pr( m k E68 SAPR - Dan Ellis L3 - Pattern Classification Gaussian mitures ( e.g. nonlinear correlation: resulting surface original data Gaussian components Problem: finding c k and m k parameters - easy if we knew which m k generated each E68 SAPR - Dan Ellis L3 - Pattern Classification

15 Epectation-maimization (EM General procedure for estimating a model when some parameters are unknown - e.g. which component generated a value in a Gaussian miture model Procedure: Iteratively update fied model parameters Θ to maimize Qepected value of log-probability of known training data trn and unknown parameters u: Q( Θ, Θ old Pr u trn, Θ old ( log p( u, trn Θ u - iterative because Pr( u trn, Θ old changes each time we change Θ - can prove this increases data likelihood, hence maimum-likelihood (ML model - local optimum - depends on initialization E68 SAPR - Dan Ellis L3 - Pattern Classification Fitting GMMs with EM Want to find component Gaussians m k plus weights c k Pr( m k { µ k, Σ k } to maimize likelihood of training data trn If we could assign each to a particular m k, estimation would be direct Hence, treat ks (miture indices as unknown, form Q E[ log p(, k Θ ], differentiate with respect to model parameters equations for µ k, Σ k and c k to maimize Q... E68 SAPR - Dan Ellis L3 - Pattern Classification

16 GMM EM update equations: Solve for maimizing Q: pk n, Θ n µ k pk ( n, Θ n pk n, Θ n Σ k n ( n ( ( n µ k ( n µ k T pk ( n, Θ c k --- pk N ( n, Θ n pk ( n, Θ Each involves, fuzzy membership of n to Gaussian k Parameter is just sample average, weighted by fuzzy membership E68 SAPR - Dan Ellis L3 - Pattern Classification GMM eamples Vowel data fit with different miture counts: Gauss logp(-9 6 Gauss logp( Gauss logp( Gauss logp( eample... E68 SAPR - Dan Ellis L3 - Pattern Classification

17 5 Methodological Remarks: : Training and test data A rich model can learn every training eample (overtraining error rate Test data Training data Overfitting training or parameters But, goal is to classify new, unseen data i.e. generalization - sometimes use cross validation set to decide when to stop training For evaluation results to be meaningful: - don t test with training data! - don t train on test data (even indirectly... E68 SAPR - Dan Ellis L3 - Pattern Classification Model compleity More model parameters better fit - more Gaussian miture components - more MLP hidden units More training data can use larger models For fied training set size, there will be some optimal model size (to avoid overfitting: 44 4 More parameters Word Error Rate More training data WER% Optimal parameter/data ratio Constant training time Training set / hours Hidden layer / units 5 WER for PLPN-8k nets vs. net size & training data E68 SAPR - Dan Ellis L3 - Pattern Classification

18 Model combination No single model is always best But different models have different strengths Benefits from combining several models e.g. Pr( ω w k Pr( ω m k k where each Pr( ω m k comes from a different model / feature set and w k Pr( m k estimates probability that model k is correct Should be able to incorporate into one model, but may be easier to combine in practice Trick is to have good estimates for w k : - static, from training data - dynamic, from test data... E68 SAPR - Dan Ellis L3 - Pattern Classification Summary Classification is making discrete (hard decisions Basis is comparison with known eamples - eplicitly, or via a model Probabilistic interpretation for principled decisions Ways to build models: - neural nets can learn posteriors - Gaussians can model pdfs - other approaches... EM allows estimation of comple models Parting thought: pω ( How do you know what s going on? Pr( ω E68 SAPR - Dan Ellis L3 - Pattern Classification

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Session 1: Pattern Recognition

Session 1: Pattern Recognition Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis

More information

Lecture 3: Machine learning, classification, and generative models

Lecture 3: Machine learning, classification, and generative models EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel

More information

Pattern Recognition Applied to Music Signals

Pattern Recognition Applied to Music Signals JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Where now? Machine Learning and Bayesian Inference

Where now? Machine Learning and Bayesian Inference Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

p(x ω i 0.4 ω 2 ω

p(x ω i 0.4 ω 2 ω p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Linear Discriminant Functions

Linear Discriminant Functions Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Bayesian Inference of Noise Levels in Regression

Bayesian Inference of Noise Levels in Regression Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Expectation Maximization Deconvolution Algorithm

Expectation Maximization Deconvolution Algorithm Epectation Maimization Deconvolution Algorithm Miaomiao ZHANG March 30, 2011 Abstract In this paper, we use a general mathematical and eperimental methodology to analyze image deconvolution. The main procedure

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

Bayes Decision Theory - I

Bayes Decision Theory - I Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

6. Vector Random Variables

6. Vector Random Variables 6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics 8. Multivariate Analysis Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 2017/18 Multi-Variate Classification Consider

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Regression with Input-Dependent Noise: A Bayesian Treatment

Regression with Input-Dependent Noise: A Bayesian Treatment Regression with Input-Dependent oise: A Bayesian Treatment Christopher M. Bishop C.M.BishopGaston.ac.uk Cazhaow S. Qazaz qazazcsgaston.ac.uk eural Computing Research Group Aston University, Birmingham,

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1 The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect

More information