Compute f(x θ)f(θ) dθ

Similar documents
Conjugate Priors: Beta and Normal Spring January 1, /15

Continuous Data with Continuous Priors Class 14, Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Conjugate Priors: Beta and Normal Spring 2018

Comparison of Bayesian and Frequentist Inference

Conjugate Priors: Beta and Normal Spring 2018

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Confidence Intervals for Normal Data Spring 2014

Bayesian Updating: Discrete Priors: Spring 2014

Exam 2 Practice Questions, 18.05, Spring 2014

Choosing Priors Probability Intervals Spring 2014

Choosing Priors Probability Intervals Spring January 1, /25

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Computational Perception. Bayesian Inference

Frequentist Statistics and Hypothesis Testing Spring

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Class 26: review for final exam 18.05, Spring 2014

Bayesian Updating: Discrete Priors: Spring

18.05 Problem Set 5, Spring 2014 Solutions

18.05 Problem Set 6, Spring 2014 Solutions

Choosing priors Class 15, Jeremy Orloff and Jonathan Bloom

Conjugate Priors, Uninformative Priors

(1) Introduction to Bayesian statistics

Bayesian Updating: Discrete Priors: Spring

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Lecture 18: Bayesian Inference

18.05 Practice Final Exam

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Bayesian RL Seminar. Chris Mansley September 9, 2008

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Post-exam 2 practice questions 18.05, Spring 2014

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Introduction to Bayesian Statistics

Studio 8: NHST: t-tests and Rejection Regions Spring 2014

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Discrete Binary Distributions

Bayesian Inference for Normal Mean

CS 361: Probability & Statistics

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

CS 361: Probability & Statistics

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Massachusetts Institute of Technology

Basics on Probability. Jingrui He 09/11/2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Probability Intro Part II: Bayes Rule

Computational Cognitive Science

Introduction to Machine Learning. Lecture 2

Classical and Bayesian inference

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

COMP90051 Statistical Machine Learning

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Lecture Slides - Part 1

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Class 8 Review Problems 18.05, Spring 2014

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009

Bayesian Models in Machine Learning

The Random Variable for Probabilities Chris Piech CS109, Stanford University

18.05 Exam 1. Table of normal probabilities: The last page of the exam contains a table of standard normal cdf values.

Bayesian Inference. p(y)

Intro to Bayesian Methods

Bayesian Methods: Naïve Bayes

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

6.041/6.431 Fall 2010 Quiz 2 Solutions

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Foundations of Statistical Inference

Class 8 Review Problems solutions, 18.05, Spring 2014

Lecture 2: Conjugate priors

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Quantitative Understanding in Biology 1.7 Bayesian Methods

Probability and Estimation. Alan Moses

Lecture 1: Probability Fundamentals

Discrete Probability Distributions

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 28

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Advanced Herd Management Probabilities and distributions

Hierarchical Models & Bayesian Model Selection

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 23

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Conditional distributions

Modeling Environment

Inference for a Population Proportion

ST 740: Model Selection

7 Gaussian Discriminant Analysis (including QDA and LDA)

Bayesian Learning (II)

Chapter 5. Bayesian Statistics

Suggested solutions to written exam Jan 17, 2012

Introduction to Bayesian Learning. Machine Learning Fall 2018

Transcription:

Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26

Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/ Observation: The coefficient is a normalizing factor, so if we have a pdf f (θ) = cθ a 1 (1 θ) b 1 then and θ beta(a, b) (a + b 1)! c = (a 1)!(b 1)! January 1, 2017 2 /26

Board question preamble: beta priors Suppose you are testing a new medical treatment with unknown probability of success θ. You don t know that θ, but your prior belief is that it s probably not too far from 0.5. You capture this intuition with a beta(5,5) prior on θ. Beta(5,5) for θ 0.0 1.0 2.0 0.0 0.2 0.4 0.6 0.8 1.0 To sharpen this distribution you take data and update the prior. Question on next slide. January 1, 2017 3 /26

Board question: beta priors (a + b 1)! Beta(a, b): f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! Treatment has prior f (θ) beta(5, 5) 1. Suppose you test it on 10 patients and have 6 successes. Find the posterior distribution on θ. Identify the type of the posterior distribution. 2. Suppose you recorded the order of the results and got S S S F F S S S F F. Find the posterior based on this data. 3. Using your answer to (2) give an integral for the posterior predictive probability of success with the next patient. 4. Use what you know about pdf s to evaluate the integral without computing it directly January 1, 2017 4 /26

Solution 9! 1. Prior pdf is f (θ) = θ 4 (1 θ) 4 = c 1 θ 4 (1 θ) 4. 4! 4! hypoth. prior likelihood Bayes numer. posterior 10 θ c 1 θ 4 (1 θ) 4 C ) dθ θ 6 (1 θ) 4 c 3 θ 10 (1 θ) 8 dθ beta(11, 9) 6 We know the normalized posterior is a beta distribution because it has the form of a beta distribution (cθ a (1 θ) b 1 on [0,1]) so by our earlier observation it must be a beta distribution. 2. The answer is the same. The only change is that the likelihood has a coefficient of 1 instead of a binomial coefficent. 3. The posterior on θ is beta(11, 9) which has density 19! f (θ, data) = θ 10 (1 θ) 8. 10! 8! Solution to (3) continued on next slide January 1, 2017 5 /26

Solution continued The law of total probability says that the posterior predictive probability of success is P(success data) = 1 1 0 f (success θ) f (θ data) dθ 1 1 1 19! 1 19! = θ θ 10 (1 θ) 8 dθ = θ 11 (1 θ) 8 dθ 0 10! 8! 0 10! 8! 4. We compute the integral in (3) by relating it to the pdf of beta(12, 9): 20! 11! 8! θ 11 (1 θ) 7. Since the pdf of beta(12, 9) integrates to 1 we have 1 1 1 20! 1 θ 11 (1 θ) 7 = 1 θ 11 (1 θ) 7 = 11! 8!. 0 11! 8! 0 20! Thus 1 1 19! θ 11 (1 θ) 8 19! dθ = 11! 8!. = 11. 0 10! 8! 10! 8! 20! 20 January 1, 2017 6 /26

Conjugate priors We had Prior f (θ) dθ: beta distribution Likelihood p(x θ): binomial distribution Posterior f (θ x) dθ: beta distribution The beta distribution is called a conjugate prior for the binomial likelihood. That is, the beta prior becomes a beta posterior and repeated updating is easy! January 1, 2017 7 /26

Concept Question Suppose your prior f (θ) in the bent coin example is Beta(6, 8). You flip the coin 7 times, getting 2 heads and 5 tails. What is the posterior pdf f (θ x)? 1. Beta(2,5) 2. Beta(3,6) 3. Beta(6,8) 4. Beta(8,13) We saw in the previous board question that 2 heads and 5 tails will update a beta(a, b) prior to a beta(a + 2, b + 5) posterior. answer: (4) beta(8, 13). January 1, 2017 8 /26

Reminder: predictive probabilities Continuous hypotheses θ, discrete data x 1, x 2,... (Assume trials are independent given the hypothesis θ.) Prior predictive probability 1 p(x 1 ) = p(x 1 θ)f (θ) dθ Posterior predictive probability 1 p(x 2 x 1 ) = p(x 2 θ)f (θ x 1 ) dθ Analogous to discrete hypotheses: H 1, H 2,.... n n p(x 1 ) = 1 p(x 1 H i )P(H i ) p(x 2 x 1 ) = 1 p(x 2 H i )p(h i x 1 ). i=1 i=1 January 1, 2017 9 /26

Continuous priors, continuous data Bayesian update tables: hypoth. prior likelihood Bayes numerator posterior f (θ x) dθ θ f (θ) dθ f (x θ) f (x θ)f (θ) dθ f (x θ)f (θ) dθ f (x) total 1 f (x) 1 1 f (x) = f (x θ)f (θ) dθ January 1, 2017 10 /26

Normal prior, normal data N(µ, σ 2 ) has density 1 (y µ) 2 /2σ 2 f (y) = e. σ 2π Observation: The coefficient is a normalizing factor, so if we have a pdf (y µ) 2 /2σ f (y) = ce 2 then and y N(µ, σ 2 ) 1 c = σ 2π January 1, 2017 11 /26

Board question: normal prior, normal data (y µ) 2 /2σ N(µ, σ 2 1 ) has pdf: f (y) = e 2. σ 2π Suppose our data follows a N(θ, 4) distribution with unknown mean θ and variance 4. That is f (x θ) = pdf of N(θ, 4) Suppose our prior on θ is N(3, 1). Suppose we obtain data x 1 = 5. 1. Use the data to find the posterior pdf for θ. Write out your tables clearly. Use (and understand) infinitesimals. You will have to remember how to complete the square to do the updating! January 1, 2017 12 /26

Solution We have: Prior: θ N(3, 1): f (θ) = c 1 e (θ 3)2 /2 Likelihood x N(θ, 4): f (x θ) = c 2 e (x θ)2 /8 For x = 5 the likelihood is c 2 e (5 θ)2 /8 hypoth. prior likelihood Bayes numer. θ c 1 e (θ 3)2 /2 dθ c 2 e (5 θ)2 /8 dx c 3 e (θ 3)2 /2 e (5 θ)2 /8 dθ dx A bit of algebraic manipulation of the Bayes numerator gives (θ 3) 2 /2 5 [θ 2 34 θ+61] 5 [(θ 17/5) 2 +61 (17/5) 2 ] c 3 e e (5 θ)2 /8 dθ dx = c 3 e 8 5 = c 3 e 8 5 (61 (17/5) 2 ) 5 (θ 17/5) 2 = c 3 e 8 e 8 5 (θ 17/5) 2 (θ 17/5) 2 2 4 = c 4 e 8 = c 4 e 5 The last expression shows the posterior is N C 17 5, 5 4 ). January 1, 2017 13 /26

Solution graphs prior = blue; posterior = purple; data = red Data: x 1 = 5 Prior is normal: µ prior = 3; σ prior = 1 Likelihood is normal: µ = θ; σ = 2 Posterior is normal µ posterior = 3.4; σ posterior = 0.894 Will see simple formulas for doing this update next time. January 1, 2017 14 /26

Board question: Romeo and Juliet Romeo is always late. How late follows a uniform distribution uniform(0, θ) with unknown parameter θ in hours. Juliet knows that θ 1 hour and she assumes a flat prior for θ on [0, 1]. On their first date Romeo is 15 minutes late. Use this data to update the prior distribution for θ. (a) Find and graph the prior and posterior pdfs for θ. (b) Find the prior predictive pdf for how late Romeo will be on the first date and the posterior predictive pdf of how late he ll be on the second date (if he gets one!). Graph these pdfs. See next slides for solution January 1, 2017 15 /26

Solution Parameter of interest: θ = upper bound on R s lateness. Data: x 1 = 0.25. Goals: (a) Posterior pdf for θ (b) Predictive pdf s requires pdf s for θ In the update table we split the hypotheses into the two different cases θ < 0.25 and θ 0.25 : prior likelihood Bayes posterior hyp. f (θ) f (x 1 θ) numerator f (θ x 1 ) θ < 0.25 dθ 0 0 0 1 dθ c θ 0.25 dθ θ θ θ dθ Tot. 1 T 1 The normalizing constant c must make the total posterior probability 1, so Continued on next slide. 1 1 dθ 1 c = 1 c =. θ ln(4) 0.25 January 1, 2017 16 /26

Solution graphs Prior and posterior pdf s for θ. January 1, 2017 17 /26

Solution graphs continued (b) Prior prediction: The likelihood function falls into cases: { 1 θ if θ x 1 f (x 1 θ) = 0 if θ < x 1 Therefore the prior predictive pdf of x 1 is 1 1 1 1 f (x 1 ) = f (x 1 θ)f (θ) dθ = dθ = ln(x 1 ). x 1 θ continued on next slide January 1, 2017 18 /26

Solution continued Posterior prediction: The likelihood function is the same as before: { 1 θ if θ x 2 f (x 2 θ) = 0 if θ < x 2. 1 The posterior predictive pdf f (x 2 x 1 ) = f (x 2 θ)f (θ x 1 ) dθ. The integrand is 0 unless θ > x 2 and θ > 0.25. There are two cases: 1 1 c If x 2 < 0.25 : f (x 2 x 1 ) = dθ = 3c = 3/ ln(4). 0.25 θ 2 1 1 c 1 If x 2 0.25 : f (x 2 x 1 ) = dθ = ( 1)/ ln(4) θ 2 x 2 x2 Plots of the predictive pdf s are on the next slide. January 1, 2017 19 /26

Solution continued Prior (red) and posterior (blue) predictive pdf s for x 2 January 1, 2017 20 /26

From discrete to continuous Bayesian updating Bent coin with unknown probability of heads θ. Data x 1 : heads on one toss. Start with a flat prior and update: hyp. prior likelihood Bayes numerator numerator θ dθ θ θ dθ 2θ dθ Total 1 θ dθ = 1/2 1 J 1 0 Posterior pdf: f (θ x 1 ) = 2θ. January 1, 2017 21 /26

Approximate continuous by discrete approximate the continuous range of hypotheses by a finite number of hypotheses. create the discrete updating table for the finite number of hypotheses. consider how the table changes as the number of hypotheses goes to infinity. January 1, 2017 22 /26

Chop [0, 1] into 4 intervals hypothesis prior likelihood Bayes num. posterior θ = 1/8 1/4 1/8 (1/4) (1/8) 1/16 θ = 3/8 1/4 3/8 (1/4) (3/8) 3/16 θ = 5/8 1/4 5/8 (1/4) (5/8) 5/16 θ = 7/8 1/4 7/8 (1/4) (7/8) 7/16 Total 1 n θ i θ 1 i=1 January 1, 2017 23 /26

Chop [0, 1] into 12 intervals hypothesis prior likelihood Bayes num. posterior θ = 1/24 1/12 1/24 (1/12) (1/24) 1/144 θ = 3/24 1/12 3/24 (1/12) (3/24) 3/144 θ = 5/24 1/12 5/24 (1/12) (5/24) 5/144 θ = 7/24 1/12 7/24 (1/12) (7/24) 7/144 θ = 9/24 1/12 9/24 (1/12) (9/24) 9/144 θ = 11/24 1/12 11/24 (1/12) (11/24) 11/144 θ = 13/24 1/12 13/24 (1/12) (13/24) 13/144 θ = 15/24 1/12 15/24 (1/12) (15/24) 15/144 θ = 17/24 1/12 17/24 (1/12) (17/24) 17/144 θ = 19/24 1/12 19/24 (1/12) (19/24) 19/144 θ = 21/24 1/12 21/24 (1/12) (21/24) 21/144 θ = 23/24 1/12 23/24 (1/12) (23/24) 23/144 n Total 1 θ i θ 1 i=1 January 1, 2017 24 /26

Density historgram Density historgram for posterior pmf with 4 and 20 slices. 2 density 2 density 1.5 1.5 1 1.5.5 1/8 3/8 5/8 7/8 x x The original posterior pdf is shown in red. January 1, 2017 25 /26

MIT OpenCourseWare https://ocw.mit.edu 18.05 Introduction to Probability and Statistics Spring 2014 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.