Machine Learning Lecture 2
|
|
- Randell Doyle
- 6 years ago
- Views:
Transcription
1 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage Slides will be made available on the webpage Probability Density Estimation L2P electronic repository Eercises and supplementary materials will be posted on the L2P Bastian Leibe RWTH Aachen Please subscribe to the lecture on the Campus system! Important to get announcements and L2P access! leibe@vision.rwth-aachen.de Many slides adapted from B. Schiele 2 Course Outline Topics of This Lecture Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Discriminative Approaches (5 weeks) Linear Discriminant Functions Support Vector Machines Ensemble Methods & Boosting Randomized Trees, Forests & Ferns Generative Models (4 weeks) Bayesian Networks Markov Random Fields Recap: Bayes Decision Theory Basic concepts Minimizing the misclassification rate Minimizing the epected loss Discriminant functions Probability Density Estimation General concepts Gaussian distribution Parametric Methods Maimum Likelihood approach Bayesian vs. Frequentist views on probability Bayesian Learning 4 5 Recap: Bayes Decision Theory Concepts Concept : Priors (a priori probabilities) What we can tell about the probability before seeing the data. Eample: pc k Recap: Bayes Decision Theory Concepts Concept 2: Conditional probabilities Let be a feature vector. measures/describes certain properties of the input. E.g. number of black piels, aspect ratio, p( C k ) describes its likelihood for class C k. p Ck p a C C 2 a b pc pc p b In general: pck k 6 7
2 Bayes Decision Theory Concepts Concept 3: Posterior probabilities We are typically interested in the a posteriori probability, i.e. the probability of class C k given the measurement vector. Bayes Theorem: k p C p Ck p Ck p Ck p Ck p p Ci p Ci Interpretation Likelihood Prior Posterior Normalization Factor i k p C 8 Recap: Bayes Decision Theory p a p b p a p( a) p b p( b) pa pb Decision boundary Likelihood Likelihood Prior Likelihood Prior Posterior = NormalizationFactor 9 Recap: Bayes Decision Theory Bayes Decision Theory Optimal decision rule Decide for C if p(c j) > p(c 2 j) Decision regions: R, R 2, R 3, This is equivalent to p(jc )p(c ) > p(jc 2 )p(c 2 ) Which is again equivalent to (Likelihood-Ratio test) p(jc ) p(jc 2 ) > p(c 2) p(c ) Decision threshold 0 Recap: Minimizing the Epected Loss Recap: Minimizing the Epected Loss Eample: 2 Classes: C, C 2 2 Decision:, 2 Loss function: L( j jc k ) = L kj R( 2 j) > R( j) L 2 p(c j) + L 22 p(c 2 j) > L p(c j) + L 2 p(c 2 j) (L 2 L )p(c j) > (L 2 L 22 )p(c 2 j) Epected loss (= risk R) for the two decisions: (L 2 L ) (L 2 L 22 ) > p(c 2j) p(c j) = p(jc 2)p(C 2 ) p(jc )p(c ) Goal: Decide such that epected loss is minimized I.e. decide if p(jc ) p(jc 2 ) > (L 2 L 22 ) p(c 2 ) (L 2 L ) p(c ) Adapted decision rule taking into account the loss
3 The Reject Option Discriminant Functions Formulate classification in terms of comparisons Discriminant functions Classify as class C k if y (); : : : ; y K () y k () > y j () 8j 6= k Classification errors arise from regions where the largest posterior probability p(c kj) is significantly less than. These are the regions where we are relatively uncertain about class membership. For some applications, it may be better to reject the automatic decision entirely in such a case and e.g. consult a human epert. 4 Eamples (Bayes Decision Theory) y k () = p(c k j) y k () = p(jc k )p(c k ) y k () = log p(jc k ) + log p(c k ) 5 Different Views on the Decision Problem Topics of This Lecture y k () / p(jc k )p(c k ) First determine the class-conditional densities for each class individually and separately infer the prior class probabilities. Then use Bayes theorem to determine class membership. Generative methods y k () = p(c k j) First solve the inference problem of determining the posterior class probabilities. Then use decision theory to assign each new to its class. Discriminative methods Alternative Directly find a discriminant function y k () which maps each input directly onto a class label. Bayes Decision Theory Basic concepts Minimizing the misclassification rate Minimizing the epected loss Discriminant functions Probability Density Estimation General concepts Gaussian distribution Parametric Methods Maimum Likelihood approach Bayesian vs. Frequentist views on probability Bayesian Learning 6 7 Probability Density Estimation Probability Density Estimation Up to now Bayes optimal classification Based on the probabilities p(jc k )p(c k ) Data:, 2, 3, 4, How can we estimate (=learn) those probability densities? Supervised training case: data and class labels are known. Estimate the probability density for each class C k separately: p(jc k ) (For simplicity of notation, we will drop the class label C k in the following.) Estimate: p() Methods Parametric representations (today) Non-parametric representations (lecture 3) Miture models (lecture 4) 8 9 3
4 The Gaussian (or Normal) Distribution Gaussian Distribution Properties One-dimensional case Mean ¹ Variance ¾ 2 N(j¹; ¾ 2 ) = p ¾ ( ¹)2 ep ½ 2¼¾ 2¾ 2 Central Limit Theorem The distribution of the sum of N i.i.d. random variables becomes increasingly Gaussian as N grows. In practice, the convergence to a Gaussian can be very rapid. This makes the Gaussian interesting for many applications. Eample: N uniform [0,] random variables. Multi-dimensional case Mean ¹ Covariance N(j¹; ) = ½ ep ¾ (2¼) D=2 j j=2 2 ( ¹)T ( ¹) 20 2 Gaussian Distribution Properties Gaussian Distribution Properties Quadratic Form N depends on through the eponent Here, M is often called the Mahalanobis distance from to ¹. Special cases Full covariance matri = [¾ ij ] General ellipsoid shape Shape of the Gaussian is a real, symmetric matri. Diagonal covariance matri = diagf¾ i g We can therefore decompose it into its eigenvectors DX = iu i u T i i= and thus obtain with. Constant density on ellipsoids with main directions along the p i Ais-aligned ellipsoid Uniform variance = ¾ 2 I Hypersphere eigenvectors u i and scaling factors Gaussian Distribution Properties Topics of This Lecture The marginals of a Gaussian are again Gaussians: Bayes Decision Theory Basic concepts Minimizing the misclassification rate Minimizing the epected loss Discriminant functions Probability Density Estimation General concepts Gaussian distribution Parametric Methods Maimum Likelihood approach Bayesian vs. Frequentist views on probability Bayesian Learning
5 Parametric Methods Given Data X = f ; 2 ; : : : ; N g Parametric form of the distribution with parameters µ E.g. for Gaussian distrib.: µ = (¹; ¾) Learning Estimation of the parameters µ Likelihood of µ Probability that the data X have indeed been generated from a probability density with parameters µ L(µ) = p(xjµ) Slide adapted from Bernt Schiele 26 Maimum Likelihood Approach Computation of the likelihood Single data point: p( n jµ) Assumption: all data points are independent NY L(µ) = p(xjµ) = p( n jµ) Log-likelihood E(µ) = ln L(µ) = ln p( n jµ) Estimation of the parameters µ (Learning) Maimize the likelihood Minimize the negative log-likelihood 27 Maimum Likelihood Approach NY Likelihood: L(µ) = p(xjµ) = p( n jµ) We want to obtain ^µ such that L(^µ) is maimized. p(xjµ) Maimum Likelihood Approach Minimizing the log-likelihood How do we minimize a function? Take the derivative and set it E(µ) ln p( n @µ p( njµ) p( n jµ)! = 0 ^µ µ Log-likelihood for Normal distribution (D case) E(µ) = ln p( n j¹; ¾) µ = ln p ep ½ jj n ¹jj 2 ¾ 2¼¾ 2¾ Maimum Likelihood Approach Maimum Likelihood Approach Minimizing E(¹; ¾) = p( nj¹; ¾) p( n j¹; ¾) = 2( n ¹) 2¾ 2 X N = 2 ( ( n n ¹) ¹) ¾ Ã = N! X 2 n n E(¹; ¾) =! 0, ^¹ = N n p( n j¹; ¾) = p e jjn ¹jj2 2¾ 2 2¼¾ 30 We thus obtain ^¹ = N n In a similar fashion, we get ^¾ 2 = ( n ^¹) 2 N ^µ = (^¹; ^¾) is the Maimum Likelihood estimate for the parameters of a Gaussian distribution. This is a very important result. Unfortunately, it is wrong sample mean sample variance 3 5
6 Maimum Likelihood Approach Or not wrong, but rather biased Assume the samples, 2,, N come from a true Gaussian distribution with mean ¹ and variance ¾ 2 We can now compute the epectations of the ML estimates with respect to the data set values. It can be shown that E(¹ ML ) = ¹ µ N E(¾ML) 2 = ¾ 2 N The ML estimate will underestimate the true variance. Corrected estimate: ~¾ 2 = N N ¾2 ML = N ( n ^¹) 2 32 Maimum Likelihood Limitations Maimum Likelihood has several significant limitations It systematically underestimates the variance of the distribution! E.g. consider the case N = ; X = f g Maimum-likelihood estimate: We say ML overfits to the observed data. We will still often use ML, but it is important to know about this effect. Slide adapted from Bernt Schiele ^¹ ^¾ = 0! 33 Deeper Reason Bayesian vs. Frequentist View Maimum Likelihood is a Frequentist concept In the Frequentist view, probabilities are the frequencies of random, repeatable events. These frequencies are fied, but can be estimated more precisely when more data is available. This is in contrast to the Bayesian interpretation In the Bayesian view, probabilities quantify the uncertainty about certain states or events. This uncertainty can be revised in the light of new evidence. Bayesians and Frequentists do not like each other too well 34 To see the difference Suppose we want to estimate the uncertainty whether the Arctic ice cap will have disappeared by the end of the century. This question makes no sense in a Frequentist view, since the event cannot be repeated numerous times. In the Bayesian view, we generally have a prior, e.g. from calculations how fast the polar ice is melting. If we now get fresh evidence, e.g. from a new satellite, we may revise our opinion and update the uncertainty from the prior. Posterior / Likelihood Prior This generally allows to get better uncertainty estimates for many situations. Main Frequentist criticism The prior has to come from somewhere and if it is wrong, the result will be worse. 35 Bayesian Approach to Parameter Learning Bayesian Learning Approach Conceptual shift Maimum Likelihood views the true parameter vector µ to be unknown, but fied. In Bayesian learning, we consider µ to be a random variable. This allows us to use knowledge about the parameters µ i.e. to use a prior for µ Training data then converts this prior distribution on µ into a posterior probability density. Bayesian view: Consider the parameter vector µ as a random variable. When estimating the parameters, what we compute is p(jx) = p(; µjx)dµ Assumption: given µ, this doesn t depend on X anymore p(jx) = p(; µjx) = p(jµ; X)p(µjX) p(jµ)p(µjx)dµ The prior thus encodes knowledge we have about the type of distribution we epect to see for µ. Slide adapted from Bernt Schiele 36 Slide adapted from Bernt Schiele This is entirely determined by the parameter µ (i.e. by the parametric form of the pdf). 37 6
7 Bayesian Learning Approach p(jx) = p(jµ)p(µjx)dµ Bayesian Learning Approach Discussion Likelihood of the parametric form µ given the data set X. p(µjx) = p(xjµ)p(µ) p(x) p(x) = = p(µ) p(x) L(µ) p(xjµ)p(µ)dµ = L(µ)p(µ)dµ Estimate for based on parametric form µ p(jµ)l(µ)p(µ) p(jx) = R dµ L(µ)p(µ)dµ Prior for the parameters µ Inserting this above, we obtain p(jµ)l(µ)p(µ) p(jµ)l(µ)p(µ) p(jx) = dµ = R dµ p(x) L(µ)p(µ)dµ 38 Normalization: integrate over all possible values of µ If we now plug in a (suitable) prior p(µ), we can estimate from the data set X. p(jx) 39 Bayesian Density Estimation Bayesian Density Estimation Discussion p(jx) = p(jµ)l(µ)p(µ) p(jµ)p(µjx)dµ = R dµ L(µ)p(µ)dµ Problem In the general case, the integration over µ is not possible (or only possible stochastically). The probability p(µjx) makes the dependency of the estimate on the data eplicit. If p(µjx) is very small everywhere, but is large for one ^µ, then p(jx) ¼ p(j^µ) The more uncertain we are about µ, the more we average over all parameter values. Eample where an analytical solution is possible Normal distribution for the data, ¾ 2 assumed known and fied. Estimate the distribution of the mean: p(¹jx) = p(xj¹)p(¹) p(x) Prior: We assume a Gaussian prior over ¹, 40 4 Bayesian Learning Approach Summary: ML vs. Bayesian Learning Sample mean: Bayes estimate: Note: ¹ N = ¾2 ¹ 0 + N¾ 2 0 ¹ ¾ 2 + N¾ 2 0 ¾ 2 N = ¾0 2 + N ¾ 2 ¹ = N 42 Slide adapted from Bernt Schiele n p(¹jx) ¹ 0 = 0 Maimum Likelihood Simple approach, often analytically possible. Problem: estimation is biased, tends to overfit to the data. Often needs some correction or regularization. But: Approimation gets accurate for N!. Bayesian Learning General approach, avoids the estimation bias through a prior. Problems: Need to choose a suitable prior (not always obvious). Integral over µ often not analytically feasible anymore. But: Efficient stochastic sampling techniques available (see Adv. ML). (In this lecture, we ll use both concepts wherever appropriate) 43 7
8 References and Further Reading More information in Bishop s book Gaussian distribution and ML: Ch..2.4 and Bayesian Learning: Ch..2.3 and Nonparametric methods: Ch Additional information can be found in Duda & Hart ML estimation: Ch. 3.2 Bayesian Learning: Ch Nonparametric methods: Ch Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006 R.O. Duda, P.E. Hart, D.G. Stork Pattern Classification 2 nd Ed., Wiley-Interscience,
Machine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationMachine Learning Lecture 2
Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationLecture Machine Learning (past summer semester) If at least one English-speaking student is present.
Advanced Machine Learning Lecture 1 Organization Lecturer Prof. Bastian Leibe (leibe@vision.rwth-aachen.de) Introduction 20.10.2015 Bastian Leibe Visual Computing Institute RWTH Aachen University http://www.vision.rwth-aachen.de/
More informationMachine Learning Lecture 1
Many slides adapted from B. Schiele Machine Learning Lecture 1 Introduction 18.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer Prof.
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set MAS 6J/1.16J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationp(x ω i 0.4 ω 2 ω
p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 24.11.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI)
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationInformatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries
Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationLecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants
Advanced Machine Learning Lecture 2 Neural Networks 24..206 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI) 28..
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationExpect Values and Probability Density Functions
Intelligent Systems: Reasoning and Recognition James L. Crowley ESIAG / osig Second Semester 00/0 Lesson 5 8 april 0 Expect Values and Probability Density Functions otation... Bayesian Classification (Reminder...3
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationLecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1
Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationPattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationL20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs
L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationBayesian Inference of Noise Levels in Regression
Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop
More informationAdvanced statistical methods for data analysis Lecture 1
Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning
CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationApplication: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth
Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More information2. A Basic Statistical Toolbox
. A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationClassification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization
Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of
More informationStatistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart 1 Motivation Imaging is a stochastic process: If we take all the different sources of
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationMachine Learning Srihari. Probability Theory. Sargur N. Srihari
Probability Theory Sargur N. Srihari srihari@cedar.buffalo.edu 1 Probability Theory with Several Variables Key concept is dealing with uncertainty Due to noise and finite data sets Framework for quantification
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationStatistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests
Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics
More information