Bayesian Learning (II)
|
|
- Vincent Gaines
- 6 years ago
- Views:
Transcription
1 Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr
2 Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP hypothesis and regularized loss Bayesian Model Averaging (Bayesian) parameter estimation for probability distributions Bayesian linear regression, naive Bayes 2
3 Conceptual Model for Learning Many machine learning methods are based on probabilistic considerations. We want to learn models of the form y = f x from training data L = x 1, y 1,, x n, y n. conceptual model of the data generating process Someone draws the real model f from the ( prior ) distribution p f. f is not known, but p f reflects prior knowledge (what are the most probable models?) Training inputs x i are drawn (independent of θ ). Class labels y i are drawn from p y i x i, θ. Learning Question: given L and p θ, what is the most likely true model? Try to (approximately) reconstruct θ 3
4 Bayes Rule Bayes Rule: Proof is simple: p X Y p X Y = Definition of conditional distribution = p X, Y p Y p Y X p X p Y = p Y X p X p Y Product rule Important basic knowledge for machine learning: allows the inference of model probabilities given the probabilities of observations
5 Bayes Rule Model probability given data and prior knowledge p data is constant; it is independent of model Likelihood: how probable is the data, under the assumption that model is the true model? p model data = p data model p model p data Prior: how probable is a model, a priori? p data model p model
6 Maximum a Posteriori Hypothesis Most likely model given the data f MAP = argmax p f w L f w p L f w p f w = argmax f w p L = argmax p L f w p f w f w = argmax f w log p L f w p f w = argmin f w log p L f w log p f w Log-Likelihood Application of Bayes Rule Log-Prior Optimization criterion consists of log-likelihood and log-prior w parameterizes model f w x 6
7 Log-Likelihood How likely are the data given the model? log p L f w Assumption: Data points are independent Label y i doesn t depend on x j for j i. Product rule (given f w ) = log p y 1,, y n x 1,, x n, f w p x 1,, x n f w = log p y 1,, y n x 1,, x n, f w p x 1,, x n = log p y 1,, y n f w, x 1,, x n + const N = log p y i f w, x 1,, x n i=1 N = log p y i f w, x i i=1 = log p y i f w, x i i How do we model p y i f w, x i? Input x 1,, x n is independent of model f w + const + const + const Constant, independent of f w 7
8 Log-Likelihood Assumption for modeling p y i f w, x i : special exponential distribution based on a loss function. Probability that f w generates label y i from x i decreases exponentially in l f w x i, y i p y i f w, x i = 1 Z exp l f w x i, y i Model assumptions used in negative log-likelihood: log p y i f w, x i i Normalizer = l f w x i, y i + log Z i = l f w x i, y i Loss function l f w x i, y i measures the distance between f w x i and y i l f w x i, y i = 0 f w x i = y i c f w x i y i Negative Log-Likelihood corresponds to a loss term! i + const Constant, independent of f w 8
9 A Priori Probability (Prior) Distribution over models = distribution over model parameters Assumption: model parameter is normal with mean μ = 0 We prefer models with small attribute weights. p f w p f w w R m = N w 0, σ 2 I = 1 2πσ 2 m exp 1 2σ 2 w 2 0 Model assumptions used in negative Log-Prior: log p f w = 1 2σ 2 Negative Log-Prior = Regularizer! w 2 + const 0 w 2 w 1 Constant, independent of f w 9
10 A Posteriori Probability (Posterior) Most likely model given prior knowledge and data. f MAP = argmax f w = argmin w = argmin w p f w L log p L f w l f w x i, y i i log p f w + λ w 2 Argmin over a regularized loss function! λ = 1 2σ 2 Justification for this Optimization criterion? Mostly likely hypothesis (MAP-Hypothesis). 10
11 Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP Hypothesis and regularized loss Bayesian Model Averaging (Bayesian) parameter estimation for probability distributions Bayesian linear regression, naive Bayes 11
12 Learning and Prediction Previously: Learning problem separated from predictions Learning: f MAP = argmax f w Predictions: x f MAP x p f w L x is new test instance Most likely model given the data Prediction of the MAP Model If we must commit ourselves to a single model, then the MAP model is a sensible choice However the actual goal is the prediction of a class! It is better not to specify a model instead directly search for the optimal prediction. 12
13 Learning and Prediction: Example Model space with 4 models: H = f 1, f 2, f 3, f 4 Binary classification problem, Y = 0,1 Training data L We compute the a-posteriori probabilities of the models p f 1 L = 0.3 p f 3 L = 0.25 p f 2 L = 0.25 p f 4 L = 0.2 MAP Model is f 1 = argmax f i p f i L 13
14 p y = 1 x, w Learning and Prediction: Example Model f i is a probabilistic classifier: binary classification: p y = 1 x, f i 0,1 E.g., logistic regression (linear model): Parameter vector: Decision function: f w x = w T x w Logistic function: σ z = 1 1+exp z Class probability: p y = 1 x, w = σ w T x logistic Regression Decision function value w T x 14
15 Learning and Prediction: Example We want to classify a new test sample x p y = 1 x, f 1 = 0.6 p y = 1 x, f 3 = 0.2 p y = 1 x, f 2 = 0.1 p y = 1 x, f 4 = 0.3 Classification given by MAP model f 1 : y = 1 However (by the computation rules of probability!): p y = 1 x, L = p y = 1, f i x, L 4 i=1 4 = p y = 1 f i, x, L p f i x, L i=1 4 (Sum rule) (Product rule) (Independence) = p y = 1 f i, x p f i L i=1 = =
16 Learning and Prediction: Example If the goal is prediction, should we use p y = 1 x, L Do not specify a single model, as long as there is still uncertainty about the models This is the fundamental idea behind Bayesian Learning/Prediction! 16
17 Bayesian Learning and Prediction Problem setting: prediction Given: Training data L, New test instance x. Searching for: Distribution over labels y for a given x: p y x, L Bayesian prediction: y = argmax y p y x, L Minimizes risk of an incorrect prediction. Also called the Bayes optimal decision or the Bayes Hypothesis. 17
18 Bayesian Learning and Prediction Computation of Bayesian Prediction Sum rule Product rule Bayesian Model Averaging y = argmax y p y x, L Bayesian Learning: = argmax y p y, θ x, L dθ = argmax y p y θ, x, L p θ x, L dθ = argmax y p y θ, x p θ L dθ prediction, given the model Average of the predictions over all models. θ model posterior of the models Weighting: how well a model fits to the training data. 18
19 Bayesian Learning and Prediction Is Bayesian prediction practical? y = argmax y p y x, L = argmax y p y θ, x p θ L dθ Bayesian Model Averaging: implicitly averages over infinitely many models. How to compute? It is only sometimes practical to obtain a closed-form solution. In contrast on decision tree learning: Find a model that fits well to the data. Give predictions for new instances based on this model. There is a separation between learning of a model and using it for prediction. 19
20 p y = 1 x, θ Bayesian Learning and Prediction How is the Bayes-Hypothesis calculated? y = argmax y p y x, L We need: = argmax y p y θ, x p θ L dθ 1) Probability of a class label given model, p y θ, x. Follows from the model definition e.g., the linear probabilistic classifier (logistic regression) p y = 1 x, θ = σ θ T x Decision function value θ T x
21 Bayesian Learning and Prediction How is the Bayes-Hypothesis calculated? y = argmax y p y x, L We need: = argmax y p y θ, x p θ L dθ 2) Probability for model given data, the a posteriori probability, p θ L Calculated via Bayes Rule
22 Bayesian Learning and Prediction Computation of the a posteriori distribution over models Bayes Theorem Posterior, A posteriori distribution p θ L Bayes Rule: Posterior Likelihood x Prior = p L θ p θ = 1 Z p L p L θ p θ Likelihood, How well does the model fit data? Prior, A priori distribution Normalization constant 22
23 Bayes Rule Need: Likelihood p L θ. Labels y 1,, y N are generated depending only on model θ & data point x i How probable would the training data be, if θ would be the correct model. How well does the model fit to the data. L = x 1, y 1,, x N, y N p L θ = p y 1,, y N x 1,, x N, θ p x 1,, x N θ = p y 1,, y N x 1,, x N, θ p x 1,, x N = 1 Z p y 1,, y N x 1,, x N, θ N = 1 Z p y i x i, θ i=1 Input x 1,, x n is independent of model θ Follows from model definition (for example, logistic regression) 23
24 Bayes Rule Need: Prior p θ. How probable is model θ before we have seen any training data. Assumptions about p θ come from dataindependent prior knowledge about the problem. Linear model example: 24
25 Bayes Rule Need: Prior p θ. How probable is model θ before we have seen any training data. Assumptions about p θ come from dataindependent prior knowledge about the problem. Linear model example: θ 2 should be as low as possible 25
26 Bayes Rule Need: Prior p θ. How probable is model θ before we have seen any training data. Assumptions about p θ come from dataindependent prior knowledge about the problem. Decision tree learning example: 26
27 Bayes Rule Need: Prior p θ. How probable is model θ before we have seen any training data. Assumptions about p θ come from dataindependent prior knowledge about the problem. Decision tree learning example: Small trees are often better than complex trees. Learning algorithm hence prefers small trees 27
28 Summary of Bayesian/MAP/ML- Hypotheses To minimize the risk of an incorrect decision, choose Bayesian prediction y = argmax y p y x, L = argmax y p y θ, x p θ L dθ Problem: In many cases there is no closed-form solution and integration over all models is impractical. Maximum a posteriori (MAP) hypothesis: choose θ MAP = argmax θ p θ L y = argmax y p y x, θ MAP Corresponds to decision tree learning. Find the best model from the data, Classifies only with this model. 28
29 Summary of Bayesian/MAP/ML- Hypotheses To specify the MAP-Hypothesis we must be able to compute the posterior (likelihood x prior). Not possible, if no prior knowledge (prior) exists. Maximum likelihood (ML) Hypothesis: θ ML = argmax θ p L θ y = argmax y p y x, θ ML Based only on observations in L, no prior knowledge. Has a problem of overfitting to the data. 29
30 Overview Probabilities, expected values, variance Basic concepts of Bayesian learning (Bayesian) parameter estimation for probability distributions Bayesian linear regression, naive Bayes 30
31 Estimating the Distribution s Parameters Often we can assume that the data comes from a specified distribution E.g. a binomial distribution for N coin flips E.g. a Gaussian distribution for body size, IQ, These distributions are parameterized Binomial distribution: parameter μ is probability for heads Gaussian distribution: parameters μ, σ for mean value and standard deviation True probability / parameters are never known What conclusions can we make about the true probabilities given the data. 31
32 Estimating the Distribution s Parameters Problem: estimating the distribution s parameters: Given a parameterized family of distributions (e.g. Binomial, Gaussian) with parameter vector θ Given data L: Expressed as a random variable Desired Goal: a posteriori distribution p θ L or respectively the maximum a posteriori estimation θ = argmax p θ L θ Applying Bayes Rule: p θ L = p L θ p θ p L 32
33 Binomially Distributed Data Estimation Example: coin flips, estimated parameter θ = μ A coin is flipped N times Data L: N h times heads, N t times tails Best estimator θ given L? Bayes equation: Likelihood: how likely are N h heads and N t tails given parameter θ p θ L A posteriori distribution over Parameters; characterizes probable parameter values & remaining uncertainty = A priori distribution over parameters representing prior knowledge p L θ p θ p L Probability of the data, only serves as a normalizer 33
34 Binomially Distributed Data Estimation Likelihood of the data: p L θ (θ = μ is the probability of heads ) Likelihood is binomially distributed: p L θ = p N h, N t θ = Bin N h N, θ = N N h θ N h 1 θ N t N = N h + N t probability of seeing N h heads and N t tails in N coin flips given coin parameter θ. 34
35 Binomially Distributed Data Estimation What is the prior p θ for the coin flipping example? 1) Try: no prior knowledge p θ = 1 0 θ 1 0 otherwise Example: Data L = tails, tails, tails MAP model: θ = argmax θ 0,1 p θ L = argmax θ 0,1 p L θ = argmax θ 0,1 p L θ p θ = argmax θ 0,1 p L 3 0 θ0 1 θ 3 = 0 Inference: coin will never land on heads Bad, overfitting of data 35
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationBayesian Learning Extension
Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationStatistical Learning. Philipp Koehn. 10 November 2015
Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationModels, Data, Learning Problems
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationA Brief Review of Probability, Bayesian Statistics, and Information Theory
A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationBayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan
Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationInconsistency of Bayesian inference when the model is wrong, and how to repair it
Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationBayesian Learning Features of Bayesian learning methods:
Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationBayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.
Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationAlgorithms for Classification: The Basic Methods
Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten, E. Frank and M. A. Hall Statistical modeling Opposite of R: use all the attributes Two assumptions:
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationRelationship between Least Squares Approximation and Maximum Likelihood Hypotheses
Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More information