Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework


 Sophie Parsons
 8 months ago
 Views:
Transcription
1 HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Maximum Likelihood Principle A generative model for training data D = {(x i, y i )} n i= given a parameter vector θ: y i (π,..., π K ), x y i g yi (x) = p(x yi ) kth class conditional density assumed to have a parametric form for g k (x) = p(x k ) and all parameters are given by θ = (π,..., π K ;,..., K ) Generative process defines the likelihood function: the joint distribution of all the observed data p(d θ) given a parameter vector θ. Process of generative learning consists of computing the MLE θ of θ based on D: θ = argmax p(d θ) θ Θ We then use a plugin approach to perform classification f θ(x) = argmax P θ(y π k p(x k ) = k X = x) = argmax K k {,...,K} k {,...,K} j= π jp(x j ) The Framework Being Bayesian: treat parameter vector θ as a random variable: process of learning is then computation of the posterior distribution p(θ D). In addition to the likelihood p(d θ) need to specify a prior distribution p(θ). Posterior distribution is then given by the Bayes Theorem: Likelihood: p(d θ) Prior: p(θ) Summarizing the posterior: p(θ D) = p(d θ)p(θ) Posterior: p(θ D) Marginal likelihood: = Θ p(d θ)p(θ)dθ Posterior mode: θ MAP = argmax θ Θ p(θ D) (maximum a posteriori). Posterior mean: θ mean = E [θ D]. Posterior variance: Var[θ D].
2 A simple example: We have a coin with probability of coming up heads. Model coin tosses as i.i.d. Bernoullis, =head, =tail. Estimate given a dataset D = {x i } n i= of tosses n=, n =, n = n=, n =, n =.5 n=, n =4, n =6.5 p(d ) = n ( ) n with n j = n i= (x i = j). Maximum Likelihood estimate: ˆ ML = n n Bayesian approach: treat the unknown parameter as a random variable. Simple prior: Uniform[, ], i.e., p() = for [, ]. Posterior distribution: p( D) = p(d θ)p(θ) = n ( ) n, = n ( ) n d = Posterior is a Beta(n +, n + ) distribution: mean = n + n+. (n + )! n!n! n=, n =65, n = n=, n =686, n = n=, n =74, n = Posterior becomes behaves like the ML estimate as dataset grows and is peaked at true value =.7. All Bayesian reasoning is based on the posterior distribution. Posterior mode: MAP = n n Posterior mean: mean = n + n+ Posterior variance: Var[ D] = mean ( mean ) n+ ( α)credible regions: (l, r) [, ] s.t. r p(θ D)dθ = α. l Consistency: Assuming that the true parameter value is given a nonzero density under the prior, the posterior distribution concentrates around the true value as n. Rate of convergence? The posterior predictive distribution is the conditional distribution of x n+ given D = {x i } n i= : p(x n+ D) = = p(x n+, D)p( D)d p(x n+ )p( D)d = ( mean ) x n+ ( mean ) x n+ We predict on new data by averaging the predictive distribution over the posterior. Accounts for uncertainty about.
3 Beta Distributions In this example, the posterior distribution has a known analytic form and is in the same Beta family as the prior: Uniform[, ] Beta(, ). An example of a conjugate prior. A Beta distribution Beta(a, b) with parameters a, b > is an exponential family distribution with density (.,.) (.8,.8) (,) (,) (5,5) (,9) (,7) (5,5) (7,) (9,) p( a, b) = Γ(a + b) Γ(a)Γ(b) a ( ) b where Γ(t) = u t e u du is the gamma function. If the prior is Beta(a, b), then the posterior distribution is so is Beta(a + n, b + n ). p( D, a, b) = a+n ( ) b+n Hyperparameters a and b are pseudocounts, an imaginary initial sample that reflects our prior beliefs about Bayesian Inference on the Categorical Distribution Dirichlet Distributions Suppose we observe D = {y i } n i= with y i {,..., K}, and model them as i.i.d. with pmf π = (π,..., π K ): p(d π) = n π yi = i= with n k = n i= (y i = k) and π k >, K k= π k =. The conjugate prior on π is the Dirichlet distribution Dir(α,..., α K ) with parameters α k >, and density p(π) = Γ( K k= α k) K k= Γ(α k) k= k= π n k k π α k k on the probability simplex {π : π k >, K k= π k = }. The posterior is also Dirichlet Dir(α + n,..., α K + n K ). Posterior mean is π k mean = α k + n k K j= α. j + n j (A) Support of the Dirichlet density for K =. (B) Dirichlet density for α k =. (C) Dirichlet density for α k =..
4 Naïve Bayes Bayesian Inference on Naïve Bayes model Return to the spam classification example with twoclass naïve Bayes p p(x i k ) = x(j) i kj ( kj ) x(j) i. j= Set n k = n i= {y i = k}, n kj = n i= (y i = k, x (j) i ˆπ k = n k n, ˆkj = i:y i =k x(j) i n k = ). MLE is: = n kj n k. One problem: if the lth word did not appear in documents labelled as class k then ˆ kl = and P(Y = k X = x with lth entry equal to ) p ( ) x (j) ˆπ k ˆkj ( ˆ ) x (j) kj = j= i.e. we will never attribute a new document containing word l to class k (regardless of other words in it). Bayesian Inference on Naïve Bayes model Given D = {(x i, y i )} n i=, want to predict a label ỹ for a new document x. We can calculate with Predicted class is p( x, ỹ = k D) = p(ỹ = k D)p( x ỹ = k, D) p(ỹ = k D) = α k + n k K l= α l + n p( x (j) = ỹ = k, D) = a + n kj a + b + n k p(ỹ = k x, D) = p(ỹ = k D)p( x ỹ = k, D) p( x D) Compared to ML plugin estimator, pseudocounts help to regularize probabilities away from extreme values. Under the Naïve Bayes model, the joint distribution of labels y i {,..., K} and data vectors x i {, } p is n p(x i, y i ) = i= = n i= k= k= π k π n k k p j= p j= x(j) i kj ( kj ) x(j) i n kj kj ( kj) n k n kj (y i =k) where n k = n i= (y i = k), n kj = n i= (y i = k, x (j) i = ). For conjugate prior, we can use Dir((α k ) K k= ) for π, and Beta(a, b) for kj independently. Because the likelihood factorizes, the posterior distribution over π and ( kj ) also factorizes, and posterior for π is Dir((α k + n k ) K k= ), and for kj is Beta(a + n kj, b + n k n kj ). and Regularization Consider a Bayesian approach to logistic regression: introduce a multivariate normal prior for weight vector w R p, and a uniform (improper) prior for offset b R. The prior density ( is: p(b, w) = (πσ ) p exp ) σ w The posterior is p(b, w D) exp ( σ w ) n log( + exp( y i (b + w x i ))) The posterior mode is equivalent to minimizing the L regularized empirical risk. Regularized empirical risk minimization is (often) equivalent to having a prior and finding a MAP estimate of the parameters. L regularization  multivariate normal prior. L regularization  multivariate Laplace prior. From a Bayesian perspective, the MAP parameters are just one way to summarize the posterior distribution. i=
5 Bayesian Model Selection A model M with a given set of parameters θ M consists of both the likelihood p(d θ M ) and the prior distribution p(θ M ). One example model would consist of all Gaussian mixtures with K components and equal covariance (LDA): θ LDA = (π,..., π K ; µ,..., µ K ; Σ), along with a prior on θ; another would allow different covariances (QDA) θ QDA = (π,..., π K ; µ,..., µ K ; Σ,..., Σ K ). The posterior distribution p(θ M D, M) = p(d θ M, M)p(θ M M) p(d M) Marginal probability of the data under M (Bayesian model evidence): p(d M) = p(d θ M, M)p(θ M M)dθ Θ Bayesian Occam s Razor Occam s Razor: of two explanations adequate to explain the same set of observations, the simpler should be preferred. p(d M) = p(d θ M, M)p(θ M M)dθ Θ Model evidence p(d M) is the probability that a set of randomly selected parameter values inside the model would generate dataset D. Models that are too simple are unlikely to generate the observed dataset. Models that are too complex can generate many possible dataset, so again, they are unlikely to generate that particular dataset at random. Compare models using their Bayes factors p(d M) p(d M ) Bayesian model comparison: Occam s razor at work M = M = M = M = M = 4 M = 5 M = 6 M = P(Y M) Model Evidence M Discussion Use probability distributions to reason about uncertainties of parameters (latent variables and parameters are treated in the same way). Model consists of the likelihood function and the prior distribution on parameters: allows to integrate prior beliefs and domain knowledge. Bayesian computation most posteriors are intractable, and posterior needs to be approximated by: Monte Carlo methods (MCMC and SMC). Variational methods (variational Bayes, belief propagation etc). Prior usually has hyperparameters, i.e., p(θ) = p(θ ψ). How to choose ψ? Be Bayesian about ψ as well choose a hyperprior p(ψ) and compute p(ψ D). Maximum Likelihood II find ψ maximizing argmax ψ Ψ p(d ψ). p(d ψ) = p(d θ)p(θ ψ)dθ p(ψ D) = p(d ψ)p(ψ) figures by M.Sahani
6 Further Reading Videolectures by Zoubin Ghahramani: and Graphical models. Gelman et al. Bayesian Data Analysis. Kevin Murphy. Machine Learning: a Probabilistic Perspective. E. T. Jaynes. Probability Theory: The Logic of Science.
Density Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongamro, Namgu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 20160421 Agenda Bayesian Concept Learning BetaBinomial Model DirichletMultinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationData Analysis and Uncertainty Part 2: Estimation
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable
More information4: Parameter Estimation in Fully Observed BNs
10708: Probabilistic Graphical Models 10708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical
More informationNonparametric Bayesian Methods  Lecture I
Nonparametric Bayesian Methods  Lecture I Harry van Zanten Kortewegde Vries Institute for Mathematics CRiSM Masterclass, April 46, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationCS540 Machine learning L8
CS540 Machine learning L8 Announcements Linear algebra tutorial by Mark Schmidt, 5:30 to 6:30 pm today, in the CS Xwing 8th floor lounge (X836). Move midterm from Tue Oct 14 to Thu Oct 16? Hw3sol handed
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes BetaBernoulli 2 Outline Bayesian concept learning BetaBernoulli model (review) Dirichletmultinomial model Credible intervals 3 Bayesian
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationBayesian Mixtures of Bernoulli Distributions
Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions
More informationIntroduc)on to Bayesian methods (con)nued)  Lecture 16
Introduc)on to Bayesian methods (con)nued)  Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationIntroduction: exponential family, conjugacy, and sufficiency (9/2/13)
STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) Xprize: first $10k human genome technology
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Nonparametric Gaussian Process (GP) GP Regression
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationProbabilistic Modelling and Bayesian Inference
Probabilistic Modelling and Bayesian Inference Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ MLSS Tübingen Lectures
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and NonParametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling Kmeans and hierarchical clustering are nonprobabilistic
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationEstimating Parameters
Machine Learning 10601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 13, 2012 Today: Bayes Classifiers Naïve Bayes Gaussian Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationNonparametric Bayes Uncertainty Quantification
Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01ES017240, R01ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.145.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. NonParametric Models
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationMachine Learning, Fall 2012 Homework 2
060 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv BarJoseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Nonparametric
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationIntroduction to Bayesian Statistics
School of Computing & Communication, UTS January, 207 Random variables Preuniversity: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationPredictive Hypothesis Identification
Marcus Hutter  1  Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter  2  Predictive
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti  A.A. 2016/2017 Outline
More informationStochastic Backpropagation, Variational Inference, and SemiSupervised Learning
Stochastic Backpropagation, Variational Inference, and SemiSupervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationWhy Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.
Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful. Even if you aren t Bayesian, you can define an uninformative prior and everything
More informationEstimation of Operational Risk Capital Charge under Parameter Uncertainty
Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationBayesian Inference in Astronomy & Astrophysics A Short Course
Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to Kmeans Single link is sensitive to outliners We
More informationK. Nishijima. Definition and use of Bayesian probabilistic networks 1/32
The Probabilistic Analysis of Systems in Engineering 1/32 Bayesian probabilistic bili networks Definition and use of Bayesian probabilistic networks K. Nishijima nishijima@ibk.baug.ethz.ch 2/32 Today s
More informationMore Spectral Clustering and an Introduction to Conjugacy
CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral
More informationAPM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53
APM 541: Stochastic Modelling in Biology Bayesian Inference Jay Taylor Fall 2013 Jay Taylor (ASU) APM 541 Fall 2013 1 / 53 Outline Outline 1 Introduction 2 Conjugate Distributions 3 Noninformative priors
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: TTh 5:00pm  6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationLEARNING WITH BAYESIAN NETWORKS
LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang  2006, Jeremy Gould 2013, Chip Galusha 2014 Jeremy Gould 2013Chip Galus May 6th, 2016
More informationChapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at UrbanaChampaign
Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at UrbanaChampaign sun22@illinois.edu Hongbo Deng Department of Computer Science University
More informationUSEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*
USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate
More informationBayesian Classification Methods
Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More information