MACHINE LEARNING ADVANCED MACHINE LEARNING

Size: px
Start display at page:

Download "MACHINE LEARNING ADVANCED MACHINE LEARNING"

Transcription

1 MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions

2 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal, likelihood Uni-dimensional, multi-dimensional Gauss function Decomposition in eigenvalue Estimation of density with epectation maimization Use of Bayes rule for classification

3 3 3 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete values over the intervals [...N ] and [...N y] respectively. ( = ) ( ) P i : the probability that the variable takes value i. 0 P = i, i =,..., N, N i= ( i) and P = =. ( ) Idem for P y = j, j =,... N y

4 4 4 MACHINE LEARNING Probability Distributions, Density Functions p() a continuous function is the probability density function or probability distribution function (PDF) (sometimes also called probability distribution or simply density) of variable. p ( ) 0, p( ) d =

5 MACHINE LEARNING PDF: Eample The uni-dimensional Gaussian or Normal distribution is a pdf given by: ( ) p = e σ 2π ( µ ) 2 2 2σ 2, μ:mean, σ :variance The Gaussian function is entirely determined by its mean and variance. For this reason, it is referred to as a parametric distribution. Illustrations from Wikipedia 5 5

6 MACHINE LEARNING Epectation The epectation of the random variable with probability P() (in the discrete case) and pdf p() (in the continuous case), also called the epected value or mean, is the mean of the observed value of weighted by p(). If X is the set of observations of, then: { } When takes discrete values: µ = E = P( ) { } X For continuous distributions: µ = E = p( ) d X 6 6 6

7 MACHINE LEARNING Variance 2 σ, the variance of a distribution measures the amount of spread of the distribution around its mean: {( ) 2 } { } { } σ Var( ) E µ E E = = = σ is the standard deviation of

8 MACHINE LEARNING Mean and Variance in the Gauss PDF ~68% of the data are comprised between +/ sigma ~96% of the data are comprised between +/ 2 sigma-s ~99% of the data are comprised between +/ 3 sigma-s This is no longer true for arbitrary pdf-s! 8 Illustrations from Wikipedia 8 8

9 9 9 MACHINE LEARNING Mean and Variance in Miture of Gauss Functions sigma= f=/3(f+f2+f3) Epectation: Gaussians distributions Resulting distribution when superposing the 3 Gaussian distributions. For other pdf than the Gaussian distribution, the variance represents a notion of dispersion around the epected value.

10 MACHINE LEARNING 202 Multi-dimensional Gaussian Function The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by: ( µ ) 2 2 2σ p( ; µσ, ) = e, μ:mean, σ:variance σ 2π The multi-dimensional Gaussian or Normal distribution has a pdf given by: ( ; µ, ) p Σ = ( 2π ) N 2 2 Σ e Σ 2 T ( µ ) ( µ ) if is N-dimensional, then μ is a N dim ensional mean vector is a N N covariance matri 0 0 0

11 MACHINE LEARNING 2-dimensional Gaussian Pdf (, ) p ( ; µ, ) p Σ = ( 2π ) N 2 2 Σ e Σ 2 T ( µ ) ( µ ) Isolines: p ( ) = cst if is N-dimensional, then μ is a N dim ensional mean vector is a N N covariance matri

12 MACHINE LEARNING 202 Modeling Joint Pdf with a Gaussian Function i i= { }... Construct covariance matri from (centered) set of datapoints X = : = XX M T M ( ; µ, ) p Σ = ( 2π ) N 2 2 Σ e Σ 2 T ( µ ) ( µ ) if is N-dimensional, then μ is a N dim ensional mean vector is a N N covariance matri 2 2 2

13 MACHINE LEARNING 202 is square and symmetric. It can be decomposed using the eigenvalue decomposition. = VΛV 2 Modeling Joint Pdf with a Gaussian Function T, λ 0 =... 0 λ N i i= { }... Construct covariance matri from (centered) set of datapoints X = : = XX M T V : matri of eigenvectors, Λ : diagonal matri composed of eigenvalues For the -std ellipse, the aes' lengths are st eigenvector equal to: λ 0 T λ and λ2, with = V V. 2nd eigenvector 0 λ2 Each isoline corresponds to a scaling of the std ellipse. M 3 3 3

14 MACHINE LEARNING 202 Fitting a single Gauss function and PCA Principal Component Analysis (PCA) Identifies a suitable representation of a multivariate data set by de-correlating the dataset. 2 When projected onto e and e, the set of datapoints appears to follow two uncorrelated Normal distributions. (( 2 T ) ) ~ ( ) ;, p e X N X µ λ 2 st eigenvector 2nd eigenvector (( T ) ) ~ ;, ( ) 2 p e X N X µ λ 2 e 4 e 4 4

15 5 5 MACHINE LEARNING 202 Marginal Pdf ( ) (, ) ( ) Consider two random variables and ywith joint distribution py, then the marginal probability of is given by: p = py, y dy Drop lower indices for clarify of notation, p is a Gaussian when p y, is also a Gaussian To compute the marginal, one needs the joint distribution p(,y). Often, one does not know it and one can only estimate it. If is a multidimensional variable the marginal is a joint distribution!

16 6 6 MACHINE LEARNING Joint vs Marginal and the Curse of Dimensionality ( ) ( ) ( ) The joint distribution p, y is richer than the marginals p, p y. In discrete case: To estimate the marginals of N variables taking K values requires to estimate ( ) N K probabilities. To estimate the joint K computing ~ N probabilities. distribution corresponds to

17 7 7 MACHINE LEARNING 202 Marginal & Conditional Pdf ( ) (, ) ( ) Consider two random variables and ywith joint distribution py,, then the marginal probability of is given by: p = p y dy The conditional probability of y given is given by: ( ) p y = py (, ) p ( ) To compute the conditional from the joint distribution requires to estimate the marginal of one of the two variables. If you need only the conditional, estimate it directly!

18 8 8 MACHINE LEARNING Joint vs Conditional: Eample Here we care about predicting y = hitting angle given = position, but not the converse p( y ) ( ) One can estimate directly and use this { } to predict yˆ ~ E p y

19 9 9 MACHINE LEARNING Joint Distribution Estimation and Curse of Dimensionality Pros of computing the joint distribution: Provides statistical dependencies across all variables and the marginal distributions Cons: Computational costs grow eponentially with number of dimensions (statistical power: 0 samples to estimate each parameter of a model) Compute solely the conditional if you care only about dependencies across variables

20 20 20 MACHINE LEARNING 202 The conditional and marginal pdf of a multi-dimensional Gauss function are all Gauss functions! marginal density of y µ marginal density of Illustrations from Wikipedia

21 2 2 MACHINE LEARNING 202 PDF equivalency with Discrete Probability The cumulative distribution function (or simply distribution function) of X is: ( *) ( * = ) D P ( * ) * D = p( ) d, p() d ~ probability of to fall within an infinitesimal interval [, + d]

22 MACHINE LEARNING 202 PDF equivalency with Discrete Probability Uniform distribution on p( ) Probability that [ ab] takes a value in the subinterval, is given by : P( b): = D ( b) = p( ) d P( a b) = D ( b) D ( a) Pa ( b) = pd ( ) = b a b D * ( ) *

23 23 23 MACHINE LEARNING 202 PDF equivalency with Discrete Probability 0.4 Gaussian fct Cumulative distribution

24 24 24 MACHINE LEARNING Conditional and Bayes s Rule ( ) Consider the joint probability of two variables and ypy :, The conditional probability is computed as follows: ( ) p y = py (, ) p ( ) ( ) p y = ( ) ( ) p( ) p y p y Bayes' theorem

25 MACHINE LEARNING Eample of use of Bayes rule in ML Naïve Bayes classification can be used to determine the likelihood of the class label. Bayes rule for binary classification: ( ) has class label + if p y =+ > δ, δ 0. otherwise takes class label ( = + ) P y Gaussian fct δ min ma 25 25

26 MACHINE LEARNING Probability that takes a value in the subinterval min ma, is given by : P a b D D ma min ( ) = ( ) ( ) D( ) Cumulative distribution min ma ( = + ) P y Gaussian fct δ min ma 26 26

27 27 27 MACHINE LEARNING 202 Eample of use of Bayes rule in ML 0.4 Gaussian fct Cumulative distribution Probability of within interval = with cutoff

28 28 28 MACHINE LEARNING 202 Eample of use of Bayes rule in ML 0.4 Gaussian fct Cumulative distribution Probability of within interval = with cutoff0.35 cutoff

29 MACHINE LEARNING 202 Eample of use of Bayes rule in ML 0.4 Gaussian fct 0.2 Cumulative distribution Probability of of within interval = with cutoff0. The 0.5 choice of cut-off in Bayes rule corresponds to some probability of seeing an event, e.g. to belong y = to class. True only when the class follows a Gauss distribution! 29 29

30 30 30 MACHINE LEARNING Eample of use of Bayes rule in ML Naïve Bayes classification can be used to determine the likelihood of the class label. Here eample for binary classification. Decision rule is: has class label y = + if: ( =+ ) ( = ) p y p y Boundary is found at the intersection between the isolines. Each class is modeled with one Gauss distribution. ( ) ~ ( ;, ) p y N µσ

31 3 3 MACHINE LEARNING Eample of use of Bayes rule in ML One can also add a prior ( ) and p( ) on either of the marginal to change the boundary. ( ) p( ) Change this ratio ( ) p( ) p y =+ p( y =+ ) = p( y =+ ) p y = p( y = ) = p( y = ) p y are usually estimated from data. If not estimated, one can add a prior on either marginal, e.g. ( ) give more importance to one class by decreasing its p y. Change this ratio

32 MACHINE LEARNING ( ) and p( ) Eample of use of Bayes rule in ML One can also add a prior on either of the marginal to change the boundary. ( ) p( ) Change this ratio ( ) p( ) p y =+ p( y =+ ) = p( y =+ ) p y = p( y = ) = p( y = ) p y are usually estimated from data. If not estimated, one can add a prior on either marginal, e.g. ( ) give more importance to one class by decreasing its p y. Change this ratio ( ) p( y ) The ratio δ = p y =+ / = determines the boundary

33 MACHINE LEARNING The ROC (Receiver Operating Curve) determines the influence of the threshold. Compute false and true positives for each value of δ in this interval. True Positives( TP) : nm of datapoints of class that are correctly classified ( ) p( y ) The ratio δ = p y =+ / = determines the boundary. False Positives( FP) : nm of datapoints of class 2 that are incorrectly classified 333

34 34 34 MACHINE LEARNING Likelihood Function 2 M ( ) = (,,..., ) ( ) The Likelihood function orlikelihood for short determines i { } the probability density of observing the set X =, of M datapoints, ( ) if each datapoint has been generated by the pdf p : L X p If the data are independently and identically distributed (i.i.d),then : M i ( ) p( ) L X = i= The likelihood can be used to determinehow well a particular choice ( ) of density p models the data at hand. M i=

35 MACHINE LEARNING Likelihood of Gaussian Pdf Paramatrization Consider that the pdf of the dataset X is parametrized with parameters µ, Σ. One writes: ( ; µ, Σ) or p( X µ, Σ) p X The likelihood function of the model parameters is given by: ( µ, Σ ) : = ( ; µ, Σ) L X p X Measures probability of observing X if the distribution of X is parametrized with µ, Σ If all datapoints are identically and independently distributed (i.i.d.) M i ( µ, Σ ) = ( ; µ, Σ) L X p i= To determine the best fit, search for parameters that maimize the likelihood

36 MACHINE LEARNING Likelihood Function: Eercise Fraction of observation of Fraction of observation of Likelihood=5.0936e Likelihood=9.5536e Fraction of observation of Fraction of observation of Likelihood=4.4994e Likelihood=4.8759e Data Model Estimate the likelihood of each of the 4 models. Determine which one is optimal

37 MACHINE LEARNING Maimum Likelihood Optimization The principle of maimum likelihood consists of finding the optimal parameters of a given distribution by maimizing the likelihood function of these parameters, equivalently by maimizing the probability of the data given the model and its parameters. For a multi-variate Gaussian pdf, one can determine the mean and covariance matri by solving: ( µ Σ ) = ( µ Σ) ma L, X ma p X, µ, Σ µ, Σ p( X µ, Σ ) = 0 and p( X µ, Σ ) = 0 µ Σ Eercise: If p is the Gaussian pdf, then the above has an analytical solution

38 MACHINE LEARNING Epectation-Maimization (E-M) EM used when no closed form solution eists for the maimum likelihood Eample: Fit Miture of Gaussian Functions p K ( ) = p ( Σ ) = k k K K ; Θ α ; µ, with Θ { µ, Σ,... µ, Σ }, α [0, ]. k = k k k Linear weighted combination K Gaussian Functions Eample of a uni-dimensional miture of 3 Gaussian functions plotted on the right f=/3(f+f2+f3)

39 MACHINE LEARNING Epectation-Maimization (E-M) EM used when no closed form solution eists for the maimum likelihood Eample: Fit Miture of Gaussian Functions p K ( ) = p ( Σ ) = k k K K ; Θ α ; µ, with Θ { µ, Σ,... µ, Σ }, α [0, ]. k = k k k Linear weighted combination K Gaussian Functions No closed form solution to: ( Θ ) = ( Θ) ma L X ma p X Θ Θ M K i k k ( Θ ) = α k k ( µ Σ ) ma L X ma p ;, Θ Θ i= k = E-M is an iterative procedure to estimate the best set of parameters Converges to a local optimum Sensitive to initialization!

40 4 4 MACHINE LEARNING Epectation-Maimization (E-M) EM is an iterative procedure Θ= { µ Σ µ Σ K K 0) Make a guess pick a set of,,..., (initialization) ( Θˆ X Z) ) Compute likelihood of initial model L, (E-Step) ( Θ X Z) 2) Update Θ by gradient ascent on L, 3) Iterate between steps and 2 until reach plateau (no improvement on likelihood) } E-M converges to a local optimum only!

41 MACHINE LEARNING Estimate the joint density, p(,y), across pairs of datapoints using a miture of two Gauss functions: K i i i i i i (, ) = α i (, ; µ, ), with (, ; µ, ) = ( µ, ) py py py N i= i i µ, : mean and covariance matri of Gaussian i y Epectation-Maimization (E-M) Parameters are learned through the iterative procedure E-M. Start with random initialization

42 43 43 MACHINE LEARNING Epectation-Maimization (E-M) Eercise Demonstrate the non-conveity of the likelihood with an eample E-M converges to a local optimum only!

43 444 MACHINE LEARNING Epectation-Maimization (E-M) Solution The assignment of Gaussian or 2 to one of the two cluster yields the same likelihood The likelihood function has two minima. E-M converges to one of the two solutions depending on initialization.

44 MACHINE LEARNING Determining the number of components to estimate a pdf with miture of Gaussians through Maimum Likelihood Matlab and mldemos eamples

45 46 APPLIED MACHINE LEARNING Determining Number of Components: Bayesian Information Criterion: BIC = 2ln L + Bln( M ) L: maimum likelihood of the model given B parameters M: number of datapoints Lower BIC implies either fewer eplanatory variables, better fit, or both. BIC Penalize too small increments in likelihood that leads to too large an increase in the number of parameters (case of overfitting) B

46 MACHINE LEARNING Eample of use of Bayes rule in ML Naïve Bayes classification can be used to determine the likelihood of the class label. Bayes rule for binary classification: ( ) has class label + if p y =+ > δ, δ 0. otherwise takes class label ( = + ) P y Gaussian fct δ

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 Prof. Steven Waslander p(a): Probability that A is true 0 pa ( ) 1 p( True) 1, p( False) 0 p( A B) p( A) p( B) p( A B) A A B B 2 Discrete Random Variable X

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

p(x ω i 0.4 ω 2 ω

p(x ω i 0.4 ω 2 ω p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013

Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013 SCHOOL OF COMPUTING, UNIVERSITY OF UTAH Machine Learning CS-6350, Assignment - 3 Due: 08 th October 2013 Chandramouli, Shridharan sdharan@cs.utah.edu (00873255) Singla, Sumedha sumedha.singla@utah.edu

More information

MACHINE LEARNING 2012 MACHINE LEARNING

MACHINE LEARNING 2012 MACHINE LEARNING 1 MACHINE LEARNING kernel CCA, kernel Kmeans Spectral Clustering 2 Change in timetable: We have practical session net week! 3 Structure of toda s and net week s class 1) Briefl go through some etension

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principal Components Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information