Computational Perception. Bayesian Inference

Similar documents
Probability and Estimation. Alan Moses

Bayesian Models in Machine Learning

Computational Cognitive Science

Comparison of Bayesian and Frequentist Inference

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Computational Cognitive Science

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

CSC321 Lecture 18: Learning Probabilistic Models

Lecture 3. Univariate Bayesian inference: conjugate analysis

Compute f(x θ)f(θ) dθ

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

COMP90051 Statistical Machine Learning

Introduction to Machine Learning

CS 361: Probability & Statistics

Bayesian Inference. Introduction

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Conjugate Priors, Uninformative Priors

(1) Introduction to Bayesian statistics

A primer on Bayesian statistics, with an application to mortality rate estimation

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Discrete Binary Distributions

Probabilistic Learning

Bayesian Inference. p(y)

Time Series and Dynamic Models

CS 361: Probability & Statistics

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Introduction to Probabilistic Machine Learning

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Frequentist Statistics and Hypothesis Testing Spring

CS-E3210 Machine Learning: Basic Principles

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Linear Models A linear model is defined by the expression

10. Exchangeability and hierarchical models Objective. Recommended reading

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Introduction to Bayesian Inference

Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.

Bayesian Methods: Naïve Bayes

What are the Findings?

Naïve Bayes classification

Bayesian hypothesis testing

Probability Theory and Simulation Methods

Class 26: review for final exam 18.05, Spring 2014

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Data Analysis and Monte Carlo Methods

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Lecture 5: Bayes pt. 1

Foundations of Statistical Inference

EXERCISES FOR SECTION 1 AND 2

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Introduction to Bayesian Statistics

Lecture 2: Priors and Conjugacy

CS 630 Basic Probability and Information Theory. Tim Campbell

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

P (E) = P (A 1 )P (A 2 )... P (A n ).

Bayesian Analysis for Natural Language Processing Lecture 2

Applied Bayesian Statistics STAT 388/488

It applies to discrete and continuous random variables, and a mix of the two.

Classical and Bayesian inference

CS540 Machine learning L9 Bayesian statistics

Chapter 5. Bayesian Statistics

Lecture 13 Fundamentals of Bayesian Inference

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

5. Conditional Distributions

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Introduction into Bayesian statistics

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

CPSC 340: Machine Learning and Data Mining

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Hierarchical Models & Bayesian Model Selection

Advanced Probabilistic Modeling in R Day 1

Probability Review - Bayes Introduction

Bayesian Inference and MCMC

CPSC 540: Machine Learning

Introductory Econometrics. Review of statistics (Part II: Inference)

Confidence Intervals and Hypothesis Tests

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Intro to Probability. Andrei Barbu

The Naïve Bayes Classifier. Machine Learning Fall 2017

6.867 Machine Learning

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

an introduction to bayesian inference

Lecture 18: Bayesian Inference

Density Estimation. Seungjin Choi

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Lecture 2: Conjugate priors

Transcription:

Computational Perception 15-485/785 January 24, 2008 Bayesian Inference

The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters from data 4. evaluate model accuracy 2

Simple Bayesian inference of known probabilities Using the facts: P (A B) =P (A B)P (B) P (B A) =P (B A)P (A) Bayes rule follows trivially: P (A B) =P (B A) P (B A)P (A) =P (A B)P (B) P (B A) = P (A B)P (B) P (A) 3

Example: binomial distribution In Bernoulli trials, each sample is either 1 (e.g. heads) with probability θ, or 0 (tails) with probability 1 θ. The binomial distribution specifies the probability of the total # of heads, y, out of n trials: p(y θ, n) = ( n y ) θ y (1 θ) n y p(y =0.5, n=10) 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 y 4

Example: binomial distribution In Bernoulli trials, each sample is either 1 (e.g. heads) with probability θ, or 0 (tails) with probability 1 θ. The binomial distribution specifies the probability of the total # of heads, y, out of n trials: p(y θ, n) = ( n y ) θ y (1 θ) n y p(y =0.25, n=10) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 y 5

Example: binomial distribution In Bernoulli trials, each sample is either 1 (e.g. heads) with probability θ, or 0 (tails) with probability 1 θ. The binomial distribution specifies the probability of the total # of heads, y, out of n trials: p(y θ, n) = ( n y ) θ y (1 θ) n y p(y =0.25, n=20) 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 y How do we determine θ from a set of trials? 6

Applying Bayes rule Given n trials with k heads, what do we know about θ? We can apply Bayes rule to see how our knowledge changes as we acquire new observations: p(θ y, n) = posterior p(y θ, n)p(θ n) p(y n) = We know the likelihood, what about the prior? Uniform on [0, 1] is a reasonable assumption, i.e. we don t know anything. What is the form of the posterior? In this case, the posterior is just proportional to the likelihood: p(θ y, n) likelihood ( n y normalizing constant prior ) θ y (1 θ) n y p(y θ, n)p(θ n)dθ 7

Updating our knowledge with new information Now we can evaluate the poster just by plugging in different values of y and n. p(θ y, n) ( n y Check: What goes on the axes? ) θ y (1 θ) n y 8

Evaluating the posterior What do we know initially, before observing any trials? p( y=0, n=0) 9

Coin tossing What is our belief about θ after observing one tail? How would you bet? Is the p(θ >0.5) less or greater than 0.5? What about p(θ >0.3)? p( y=0, n=1) 10

Coin tossing Now after two trials we observe 1 head and 1 tail. p( y=1, n=2) 11

Coin tossing 3 trials: 1 head and 2 tails. p( y=1, n=3) 12

Coin tossing 4 trials: 1 head and 3 tails. p( y=1, n=4) 13

Coin tossing 5 trials: 1 head and 4 tails. Do we have good evidence that this coin is biased? How would you quantify this statement? p(θ > 0.5) = 1.0 0.5 p(θ y, n)dθ Can we substitute the expression above? No It s not normalized. p( y=1, n=5) 14

Evaluating the normalizing constant To get proper probability density functions, we need to evaluate p(y n): p(θ y, n) = p(y θ, n)p(θ n) p(y n) Bayes in his original paper in 1763 showed that: 1 p(y n) = 0 p(y θ, n)p(θ n)dθ = 1 n +1 ( ) n p(θ y, n) = θ y (1 θ) n y (n + 1) y 15

More coin tossing After 50 trials: 17 heads and 33 tails. What s a good estimate of θ? There are many possibilities. p( y=17, n=50) 16

A ratio estimate Intuitive estimate: just take ratio θ = 17/50 = 0.34 p( y=17, n=50) y/n = 0.34 17

The maximum a posteriori (MAP) estimate This just picks the location of maximum value of the posterior In this case, maximum is also at θ = 0.34. MAP estimate = 0.34 p( y=17, n=50) 18

A different case What about after just one trial: 0 heads and 1 tail? MAP and ratio estimate would say 0. Does this make sense? What would a better estimate be? y/n = 0 p( y=0, n=1) * 19

The expected value estimate We defined the expected value of a pdf in the previous lecture: E(θ y, n) = 1 0 = y +1 n +2 θp(θ y, n)dθ E(θ y =0,n= 1) = 1 3 p( y=0, n=1) What happens for zero trials? 20

Much more coin tossing After 500 trials: 184 heads and 316 tails. What s your guess of θ? p( y=184, n=500) 21

Much more coin tossing After 5000 trials: 1948 heads and 3052 tails. Posterior contains true estimate. True value is 0.4. Is this always the case? NO Only if the assumptions are correct. p( y=1948, n=5000) How could our assumptions be wrong? 22

Laplace s example: proportion female births A total of 241,945 girls and 251,527 boys were born in Paris from 1745-1770. Laplace was able to evaluate the following p(θ > 0.5) = 1.0 0.5 p(θ y, n)dθ 1.15 10 42 p( y=241945, n=493472) He was morally certain θ < 0.5. But could he have been wrong? 0.484 0.486 0.488 0.49 0.492 0.494 0.496 0.498 23

Laplace and the mass of Saturn Laplace used Bayesian inference to estimate the mass of Saturn and other planets. For Saturn he said: It is a bet of 11000 to 1 that the error in this result is not within 1/100th of its value Mass of Saturn as a fraction of the mass of the Sun Laplace (1815) NASA (2004) 3512 3499.1 (3512-3499.1) / 3499.1 = 0.0037 Laplace is still wining. 24

Applying Bayes rule with an informative prior What if we already know something about θ? We can still apply Bayes rule to see how our knowledge changes as we acquire new observations: p(θ y, n) = p(y θ, n)p(θ n) p(y n) But now the prior becomes important. Assume we know biased coins are never below 0.3 or above 0.7. To describe this we can use a beta distribution for the prior. 25

A beta prior In this case, before observing any trials our prior is not uniform: Beta(a=20,b=20) p( y=0, n=0) 26

Coin tossing revisited What is our belief about θ after observing one tail? With a uniform prior it was: What will it look like with our prior? p( y=0, n=1) 27

Coin tossing with prior knowledge Our belief about θ after observing one tail hardly changes. p( y=0, n=1) 28

Coin tossing After 50 trials, it s much like before. p( y=17, n=50) 29

Coin tossing After 5,000 trials, it s virtually identical to the uniform prior. What did we gain? p( y=1948, n=5000) 30

remainder of lecture was given on the board