Lecture 4: Probabilistic Learning

Size: px
Start display at page:

Download "Lecture 4: Probabilistic Learning"

Transcription

1 DD2431 Autumn, 2015

2 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3

3 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Classification with Probability Distributions Classification x2 x features y {y 1,..., y K } class ˆk = arg max P(y k x) k = arg max P(y k )P(x y k ) k x1

4 Estimation Theory Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods in the last lecture we assumed we knew: P(y) Prior P(x y) Likelihood P(x) Evidence and we used them to compute the Posterior P(y x) How can we obtain this information from observations (data)? Estimation Theory Learning

5 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Parametric vs Non-Parametric Estimation Parametric Non Parametric Bayesian non-parametric methods integrate out the parameters.

6 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Assumption # 1: Class Independence x2 x1 Assumptions: samples from class i do not influence estimate for class j, i j Generative vs discriminative models

7 x1 Parameter estimation (cont.) Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods class independence assumption: x2 xx[,2] xx[,2] xx[,2] xx[,2] xx[,2] each distribution is a likelihood in the form P(x θ i ) for class i in the following we drop the class index and talk about P(x θ)

8 Assumption #2: i.i.d. Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Samples from each class are independent and identically distributed: D = {x 1,..., x N } The likelihood of the whole data set can be factorized: P(D θ) = P(x 1,..., x N θ) = And the log-likelihood becomes: N P(x i θ) i=1 log P(D θ) = N log P(x i θ) i=1

9 Three Approaches Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian

10 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Maximum likelihood estimation: Illustration Find parameter vector ˆθ that maximizes P(D θ) with D = {x 1,..., x n } p(d theta) X log likelihood likelihood

11 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Maximum likelihood estimation: Illustration Find parameter vector ˆθ that maximizes P(D θ) with D = {x 1,..., x n } p(d theta) X likelihood log likelihood 1 estimate the optimal parameters of the model

12 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Maximum likelihood estimation: Illustration Find parameter vector ˆθ that maximizes P(D θ) with D = {x 1,..., x n } p(d theta) p(x theta) X likelihood log likelihood 1 estimate the optimal parameters of the model 2 evaluate the predictive distribution on new data points

13 ML estimation of Gaussian mean Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods N(x µ, σ 2 ) = 1 ] (x µ)2 exp [ 2πσ 2σ 2, with θ = {µ, σ 2 } Log-likelihood of data (i.i.d. samples): log P(D θ) = N i=1 ( ) log N(x i µ, σ 2 ) = N log 2πσ N (x i µ) 2 i=1 2σ 2 0 = d log P(D θ) dµ = N i=1 (x i µ) N i=1 σ 2 = x i Nµ σ 2 N ˆµ = 1 N i=1 x i

14 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods ML estimation of Gaussian parameters ˆµ = 1 N ˆσ 2 = 1 N N i=1 x i N (x i ˆµ) 2 i=1 same result by minimizing the sum of square errors! but we make assumptions explicit

15 Problem: few data points Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 10 repetitions with 5 points each X

16 Problem: few data points Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 10 repetitions with 5 points each X

17 Maximum a Posteriori Estimation Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods ˆµ, ˆσ 2 = arg max µ,σ 2 [ N ] P(x i µ, σ 2 )P(µ, σ 2 ) i=1 where the prior P(µ, σ 2 ) needs a nice mathematical form for closed solution ˆµ MAP = ˆσ 2 MAP = N N + γ ˆµ ML + γ N + γ δ N N α ˆσ2 ML + 2β + γ(δ + ˆµ MAP) 2 N α where α, β, γ, δ are parameters of the prior distribution

18 ML, MAP and Point Estimates Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Both ML and MAP produce point estimates of θ Assumption: there is a true value for θ advantage: once ˆθ is found, everything is known p(d theta) p(x theta) X likelihood log likelihood

19 Bayesian estimation Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Consider θ as a random variable characterize θ with the posterior distribution P(θ D) given the data ML: D ˆθ ML MAP: D, P(θ) ˆθ MAP Bayes: D, P(θ) P(θ D) for new data points, instead of P(x new ˆθ ML ) or P(x new ˆθ MAP ), compute: P(x new D) = P(x new θ)p(θ D)dθ θ Θ

20 Bayesian estimation (cont.) Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods we can compute P(x D) instead of P(x ˆθ) integrate the joint density P(x, θ D) = P(x θ)p(θ D) p(d theta) p(x theta) X P(x ˆθ) likelihood log likelihood

21 Bayesian estimation Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods we can compute P(x D) instead of P(x ˆθ) integrate the joint density P(x, θ D) = P(x θ)p(θ D) P(x D) = P(x θ)p(θ D)dθ join dist X integral

22 Bayesian estimation Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods we can compute P(x D) instead of P(x ˆθ) integrate the joint density P(x, θ D) = P(x θ)p(θ D) P(x D) = P(x θ)p(θ D)dθ join dist X integral

23 Bayesian estimation Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods we can compute P(x D) instead of P(x ˆθ) integrate the joint density P(x, θ D) = P(x θ)p(θ D) P(x D) = P(x θ)p(θ D)dθ join dist X integral

24 Bayesian estimation (cont.) Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods Pros: Cons: better use of the data makes a priori assumptions explicit can be implemented recursively (if conjugate prior) use posterior P(θ D) as new prior reduce overfitting definition of noninformative priors can be tricky often requires numerical integration not widely accepted by traditional statistics (frequentism)

25 Classification vs Clustering Heuristic Example: K-means Expectation Maximization Clustering vs Classification Classification x1 x2 Clustering x1 x2

26 Fitting complex distributions Classification vs Clustering Heuristic Example: K-means Expectation Maximization We can try to fit a mixture of K distributions: K P(x θ) = π k P(x θ k ), k=1 with θ = {π 1,..., π k, θ 1,..., θ K } Problem: We do not know which point has been generated by which component of the mixture We cannot optimize P(x θ) directly

27 Expectation Maximization Classification vs Clustering Heuristic Example: K-means Expectation Maximization Fitting model parameters with missing (latent) variables K P(x θ) = π k P(x θ k ), k=1 with θ = {π 1,..., π k, θ 1,..., θ K } very general idea (applies to many different probabilistic models) augment the data with the missing variables: h ik probability that each data point x i was generated by each component of the mixture k optimize the Likelihood of the complete data: P(x, h θ)

28 Heuristic Example: K-means Classification vs Clustering Heuristic Example: K-means Expectation Maximization describes each class with a centroid a point belongs to a class if the corresponding centroid is closest (Euclidean distance) iterative procedure guaranteed to converge not guaranteed to find the optimal solution used in vector quantization (since the 1950 s)

29 K-means: algorithm Classification vs Clustering Heuristic Example: K-means Expectation Maximization Data: k (number of desired clusters), n data points x i Result: k clusters initialization: assign initial value to k centroids c i ; repeat assign each point x i to closest centroid c j ; compute new centroids as mean of each group of points; until centroids do not change; return k clusters;

30 K-means: example Classification vs Clustering Heuristic Example: K-means Expectation Maximization iteration 20, update clusters

31 Classification vs Clustering Heuristic Example: K-means Expectation Maximization K-means: sensitivity to initial conditions iteration 20, update clusters

32 Classification vs Clustering Heuristic Example: K-means Expectation Maximization K-means: limits of Euclidean distance the Euclidean distance is isotropic (same in all directions in R p ) this favours spherical clusters the size of the clusters is controlled by their distance

33 K-means: non-spherical classes Classification vs Clustering Heuristic Example: K-means Expectation Maximization two non spherical classes

34 Expectation Maximization Classification vs Clustering Heuristic Example: K-means Expectation Maximization Fitting model parameters with missing (latent) variables K P(x θ) = π k P(x θ k ), k=1 with θ = {π 1,..., π k, θ 1,..., θ K } very general idea (applies to many different probabilistic models) augment the data with the missing variables: h ik probability of assignment of each data point x i to each component of the mixture k optimize the Likelihood of the complete data: P(x, h θ)

35 Mixture of Gaussians Classification vs Clustering Heuristic Example: K-means Expectation Maximization This distribution is a weight sum of K Gaussian distributions K P(x) = π k N (x; µ k, σk) 2 ure 7.6 Mixture of Gaussians el in 1D. A complex multimodal bability density function (black d curve) is created by taking a ghted sum or mixture of several stituent normal distributions with erent means and variances (red, n and blue dashed curves). To ure that the final distribution is aliddensity,theweightsmustbe itive and sum to one. k=1 where π π K = 1 7 Modeling complex data densities and π k > 0 (k = 1,..., K). This model can describe complex multi-modal probability distributions st function for the M-Step (equation 7.12) improves the bound. For now we ssume that by these combining things are simpler true andistributions. proceed with the main thrust of the er. We will return to these issues in section 7.8.

36 Mixture of Gaussians Classification vs Clustering Heuristic Example: K-means Expectation Maximization K P(x) = π k N (x; µ k, σk) 2 k=1 Learning the parameters of this model from training data x 1,..., x n is not trivial - using the usual straightforward maximum likelihood approach. Instead learn parameters using the Expectation-Maximization (EM) algorithm.

37 Classification vs Clustering Heuristic Example: K-means Expectation Maximization Mixture of Gaussians as a marginalization We can interpret the Mixture of Gaussians model with the introduction of a discrete hidden/latent variable h and P(x, h): P(x) = K P(x, h = k) = k=1 K P(x h = k)p(h = k) k=1 7.4 Mixture of Gaussians K = π k N (x; µ k, σk) k=1 Figure 7.7 Mixture of Gaussians as a marginalization. The mixture of Gaussians can also be thought of in terms of a joint distribution Pr(x, h) between the observed variable x and a discrete hidden variable h. To create the mixture density we marginalize over h. The hidden variable has a straightforward interpretation: it is the index of the constituent normal distribution. mixture density Figures taken must from be Computer positive definite. Vision: models, For alearning simplerand approach, inferencewe by express Simon Prince. the observed density as a marginalization and use the EM algorithm to learn the parameters.

38 EM for two Gaussians Classification vs Clustering Heuristic Example: K-means Expectation Maximization Assume: We know the pdf of x has this form: P(x) = π 1 N (x; µ 1, σ1) 2 + π 2 N (x; µ 2, σ2) 2 where π 1 + π 2 = 1 and π k > 0 for components k = 1, 2. Unknown: Values of the parameters (Many!) Θ = (π 1, µ 1, σ 1, µ 2, σ 2 ). Have: Observed n samples x 1,..., x n drawn from P(x). Want to: Estimate Θ from x 1,..., x n. How would it be possible to get them all???

39 EM for two Gaussians Classification vs Clustering Heuristic Example: K-means Expectation Maximization For each sample x i introduce a hidden variable h i { 1 if sample x i was drawn from N (x; µ 1, σ1) 2 h i = 2 if sample x i was drawn from N (x; µ 2, σ2) 2 and come up with initial values for each of the parameters. Θ (0) = (π (0) 1, µ(0) 1, σ(0) 1, µ(0) 2, σ(0) 2 ) EM is an iterative algorithm which updates Θ (t) using the following two steps...

40 EM for two Gaussians: E-step Classification vs Clustering Heuristic Example: K-means Expectation Maximization The responsibility of k-th Gaussian for each sample x (indicated by the size 112of the projected data point) 7 Modeling complex data densities Look at each sample x along hidden variable h in the E-step Figure 7.8 E-Step for fitting the mixture of Gaussians model. For each of the I data points x1...i, we calculate the posterior distribution Pr(hi xi) over the hidden variable hi. The posterior probability Pr(hi = k xi) thathi takes value k can be understood as the responsibility of normal distribution Figure from Computer Vision: k for data models, point xi. learning For example, and inference for data point by Simon x1 (magenta Prince. circle) the component 1 (red curve) is more than twice as likely to be responsible than component 2 (green Giampiero curve). Salvi Note that inlecture the joint4: distribution Probabilistic (left), Learning the

41 Classification vs Clustering Heuristic Example: K-means Expectation Maximization EM for two Gaussians: E-step (cont.) E-step: Compute the posterior probability that x i was generated by component k given the current estimate of the parameters Θ (t). (responsibilities) for i = 1,... n for k = 1, 2 γ (t) ik = P(h i = k x i, Θ (t) ) = π (t) k N (x i ; µ (t) k, σ(t) k ) π (t) 1 N (x i ; µ (t) 1, σ(t) 1 ) + π(t) 2 N (x i ; µ (t) 2, σ(t) 2 ) Note: γ (t) i1 + γ(t) i2 = 1 and π1 + π2 = 1

42 EM for two Gaussians: M-step Classification vs Clustering Heuristic Example: K-means Expectation Maximization Fitting the Gaussian model for each of k-th constinuetnt. 7.4 Mixture of Gaussians 113 Sample x i contributes according to the responsibility γ ik. (dashed and solid lines for fit before and after update) Figure 7.9 M-Step for fitting the mixture of Gaussians model. For the k th constituent Gaussian, we update the parameters {λ k, µ k, Σ k}. The i th data Look point x i along samples x for each h in the M-step contributes to these updates according to the responsibility r ik (indicated by size of point) Giampiero assigned Salvi inlecture the E-Step; 4: Probabilistic data points Learningthat are

43 Classification vs Clustering Heuristic Example: K-means Expectation Maximization EM for two Gaussians: M-step (cont.) M-step: Compute the Maximum Likelihood of the parameters of the mixture model given out data s membership distribution, the s: γ (t) i for k = 1, 2 µ (t+1) k = σ (t+1) k = n i=1 γ(t) ik x i, n i=1 γ(t) ik n i=1 γ(t) ik (x i µ (t+1) n i=1 γ(t) ik k ) 2, n π (t+1) i=1 k = γ(t) ik. n

44 Classification vs Clustering Heuristic Example: K-means Expectation Maximization EM in practice Modeling complex data densities Figure 7.10 a) Initial model. b) E-Step. For each data point the posterior probability that is was generated from each Gaussian is calculated (indicated by color of point). c) M-Step. The mean, variance and weight of each Gaussian is updated based on these posterior probabilities. Ellipse shows

45 EM properties Classification vs Clustering Heuristic Example: K-means Expectation Maximization Similar to K-means guaranteed to find a local maximum of the complete data likelihood somewhat sensitive to initial conditions Better than K-means Gaussian distributions can model clusters with different shapes all data points are smoothly used to update all parameters

46 Model Selection and Overfitting X

47 Overfitting f(x) f(x) x x

48 Overfitting: Phoneme Discrimination

49 Occam s Razor Choose the simplest explanation for the observed data Important factors: number of model parameters number of data points model fit to the data

50 Overfitting and Maximum Likelihood we can make the likelihood arbitrary large by increasing the number of parameters X

51 Occam s Razor and Bayesian Learning Remember that: P(x new D) = θ Θ P(x new θ)p(θ D)dθ Intuition: More complex models fit the data very well (large P(D θ)) but only for small regions of the parameter space Θ.

52 Summary 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 If you are interested in learning more take a look at: C. M. Bishop, Pattern Recognition and Machine Learning, Springer Verlag 2006.

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Probability Based Learning

Probability Based Learning Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

T Machine Learning: Basic Principles

T Machine Learning: Basic Principles Machine Learning: Basic Principles Bayesian Networks Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Bayesian Networks Structure Learning (cont.)

Bayesian Networks Structure Learning (cont.) Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Scribe to lecture Tuesday March

Scribe to lecture Tuesday March Scribe to lecture Tuesday March 16 2004 Scribe outlines: Message Confidence intervals Central limit theorem Em-algorithm Bayesian versus classical statistic Note: There is no scribe for the beginning of

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Lecture 18: Learning probabilistic models

Lecture 18: Learning probabilistic models Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information