INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Similar documents
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Bayesian Models in Machine Learning

ECE521 week 3: 23/26 January 2017

Pattern Recognition and Machine Learning

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Bayesian Learning (II)

Point Estimation. Vibhav Gogate The University of Texas at Dallas

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

STA414/2104 Statistical Methods for Machine Learning II

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Linear Models for Regression CS534

Bayesian Decision Theory

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSC321 Lecture 18: Learning Probabilistic Models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Computer Vision Group Prof. Daniel Cremers. 3. Regression

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.

Bayesian Regression Linear and Logistic Regression

MLE/MAP + Naïve Bayes

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Bayesian RL Seminar. Chris Mansley September 9, 2008

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

CPSC 540: Machine Learning

Stat 5101 Lecture Notes

STA 4273H: Sta-s-cal Machine Learning

Nonparametric Bayesian Methods (Gaussian Processes)

Curve Fitting Re-visited, Bishop1.2.5

CPSC 340: Machine Learning and Data Mining

Linear Models for Regression CS534

Naïve Bayes classification

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

p L yi z n m x N n xi

Ch 4. Linear Models for Classification

Chris Bishop s PRML Ch. 8: Graphical Models

An Introduction to Statistical and Probabilistic Linear Models

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

PATTERN RECOGNITION AND MACHINE LEARNING

Bayesian Inference of Noise Levels in Regression

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Relevance Vector Machines

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Lecture 2: Conjugate priors

Bayesian Methods for Machine Learning

Machine Learning using Bayesian Approaches

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Bayesian Machine Learning

Linear Models for Regression CS534

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

CS 361: Probability & Statistics

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

CS 361: Probability & Statistics

Probability Theory for Machine Learning. Chris Cremer September 2015

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

COMP90051 Statistical Machine Learning

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Machine Learning

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Discrete Binary Distributions

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Announcements. Proposals graded

Contents. Part I: Fundamentals of Bayesian Inference 1

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Reliability Monitoring Using Log Gaussian Process Regression

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

(1) Introduction to Bayesian statistics

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Probability and Statistics (Continued)

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Machine Learning Overview

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Lecture 11: Probability Distributions and Parameter Estimation

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Transcription:

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology NIH: $1k by 2014 Microsoft Research Cambridge: PhD Scholarships Internships: 3 months Postdoctoral Fellowships

Why Probabilities? Class Image vector cancer or normal

Decisions One-step solution train a function to decide the class Two-step solution inference : infer posterior probabilities decision : use probabilities to decide the class

Minimum Misclassification Rate

Why Separate Inference and Decision? Minimizing risk (loss matrix may change over time) Reject option Unbalanced class priors Combining models

Loss Matrix Decision True class

Minimum Expected Loss Regions are chosen, at each x, to minimize

Reject Option

Unbalanced class priors In screening application, cancer is very rare Use balanced data sets to train models, then use Bayes theorem to correct the posterior probabilities

Combining models Image data and blood tests Assume independent for each class:

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution

Expectation and Variance In general For Bernoulli

Likelihood function Data set Likelihood function

Prior Distribution Simplification if prior has same functional form as likelihood function Called conjugate prior

Beta Distribution

Posterior Distribution

Posterior Distribution

Properties of the Posterior As the size N of the data set increases

Predictive Distribution What is the probability that the next coin flip will be heads?

The Exponential Family where is the natural parameter We can interpret g( ) as the normalization coefficient

Likelihood Function Give a data set, Depends on data through sufficient statistics

Expected Sufficient Statistics

Conjugate priors For the exponential family Combining with the likelihood function, we get Prior corresponds to º pseudo-observations with statistic Â

Bernoulli revisited The Bernoulli distribution Comparing with the general form we see that and so Logistic sigmoid

Bernoulli revisited The Bernoulli distribution in canonical form where

The Gaussian Distribution

Likelihood Function

Bayesian Inference unknown mean Assume ¾ 2 is known Data set Likelihood function for ¹

Bayesian Inference unknown mean Conjugate prior is a Gaussian which gives a Gaussian posterior

Bayesian Inference unknown precision Now assume ¹ is known Likelihood function for precision = 1/¾ 2

Conjugate prior Gamma distribution

Unknown Mean and Precision Likelihood function Gaussian-gamma distribution

Gaussian-gamma Distribution

Linear Regression (1) Noisy sinusoidal data

Linear Regression (2) Linear combination of basis functions Noise model Likelihood function

Linear Regression (3) Polynomial basis functions

Linear Regression (4) Define a conjugate prior over w Combining with likelihood function gives the posterior where

Simple Example (1) Data from straight line with Gaussian noise First order polynomial model

Simple Example (2) 0 data points observed Prior Data Space

Simple Example (3) 1 data point observed Likelihood Posterior Data Space

Simple Example (4) 2 data points observed Likelihood Posterior Data Space

Simple Example (5) 20 data points observed Likelihood Posterior Data Space

Predictive Distribution (1) Predict t for new values of x by integrating over w: where

Predictive Distribution (3) Example: Sinusoidal data, 9 Gaussian basis functions, 1 data point

Predictive Distribution (4) Example: Sinusoidal data, 9 Gaussian basis functions, 2 data points

Predictive Distribution (5) Example: Sinusoidal data, 9 Gaussian basis functions, 4 data points

Predictive Distribution (6) Example: Sinusoidal data, 9 Gaussian basis functions, 25 data points

Bayesian Model Comparison (1) Alternative models M i, i=1,,l Predictive distribution is a mixture Model selection: keep only most probable model

Bayesian Model Comparison (2) From Bayes theorem posterior model evidence (marginal likelihood) prior For equal priors, models ranked by marginal likelihood

Bayesian Model Comparison (4) For a model with parameters w Note that

Bayesian Model Comparison (5) Consider model with a single parameter w

Bayesian Model Comparison (6) Taking logarithms, we obtain Negative With M parameters, all assumed to have the same ratio, we get

Linear Regression revisited Marginal likelihood

Linear Regression revisited Noisy sinusoidal data

Linear Regression revisited Polynomial of order M,

Bayesian Model Comparison Matching data and model complexity