Conjugate Priors: Beta and Normal Spring 2018

Similar documents
Conjugate Priors: Beta and Normal Spring 2018

Conjugate Priors: Beta and Normal Spring January 1, /15

Choosing Priors Probability Intervals Spring 2014

Choosing Priors Probability Intervals Spring January 1, /25

Compute f(x θ)f(θ) dθ

Frequentist Statistics and Hypothesis Testing Spring

(1) Introduction to Bayesian statistics

Continuous Data with Continuous Priors Class 14, Jeremy Orloff and Jonathan Bloom

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Probabilistic Machine Learning

Bayesian Learning (II)

Chapter 5. Bayesian Statistics

Bayesian Inference: Posterior Intervals

Computational Perception. Bayesian Inference

Bayesian Updating: Discrete Priors: Spring

Swarthmore Honors Exam 2012: Statistics

Exam 2 Practice Questions, 18.05, Spring 2014

Confidence Intervals for Normal Data Spring 2018

Applied Bayesian Statistics STAT 388/488

MAS3301 Bayesian Statistics

18.05 Practice Final Exam

Bayesian Inference. p(y)

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Bayesian Methods: Naïve Bayes

Bayesian Inference for Normal Mean

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Conjugate Priors, Uninformative Priors

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Classical and Bayesian inference

Introduction to Bayesian Methods

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Bayesian Inference. Chapter 2: Conjugate models

Foundations of Statistical Inference

CS 361: Probability & Statistics

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Bayesian Models in Machine Learning

STAT J535: Chapter 5: Classes of Bayesian Priors

What are the Findings?

Bayesian Updating: Discrete Priors: Spring

Classical and Bayesian inference

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

CSC321 Lecture 18: Learning Probabilistic Models

Point Estimation. Vibhav Gogate The University of Texas at Dallas

COMP90051 Statistical Machine Learning

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Class 26: review for final exam 18.05, Spring 2014

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

18.05 Problem Set 6, Spring 2014 Solutions

Statistics & Data Sciences: First Year Prelim Exam May 2018

Chapter 3.3 Continuous distributions

MAS3301 Bayesian Statistics

CS 361: Probability & Statistics

Confidence Intervals for Normal Data Spring 2014

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Prior Choice, Summarizing the Posterior

Comparison of Bayesian and Frequentist Inference

Part 3: Parametric Models

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Lecture 2: Conjugate priors

Introduc)on to Bayesian Methods

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon


Intro to Bayesian Methods

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Post-exam 2 practice questions 18.05, Spring 2014

Probability and Estimation. Alan Moses

Introduction to Bayesian Statistics

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Bayesian hypothesis testing

Lecture 3. Univariate Bayesian inference: conjugate analysis


Bayesian Computation

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Nonparametric Bayesian Methods - Lecture I

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Choosing among models

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Modeling Environment

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Learning P-maps Param. Learning

Introduction to Machine Learning. Lecture 2

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

INTRODUCTION TO BAYESIAN METHODS II

Suggested solutions to written exam Jan 17, 2012

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Transcription:

Conjugate Priors: Beta and Normal 18.05 Spring 018

Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = θ on [0,1]. Data: heads on one toss. Question: Find the posterior pdf to this data. Bayes hypoth. prior likelihood numerator posterior θ θ dθ θ θ dθ 3θ dθ Total 1 T = 1 0 θ dθ = /3 1 Posterior pdf: f (θ x) = 3θ. April, 018 / 18

Review: Continuous priors, continuous data Bayesian update table Bayes hypoth. prior likeli. numerator posterior θ f (θ) dθ φ(x θ) φ(x θ)f (θ) dθ φ(x θ)f (θ) dθ f (θ x) dθ = φ(x) total 1 φ(x) 1 φ(x) = φ(x θ)f (θ) dθ = probability of data x April, 018 3 / 18

Updating with normal prior and normal likelihood A normal prior is conjugate to a normal likelihood with known σ. Data: x 1, x,..., x n Normal likelihood. x 1, x,..., x n N(θ, σ ) Assume θ is our unknown parameter of interest, σ is known. Normal prior. θ N(µ prior, σ prior ). Normal Posterior. θ N(µ post, σ post). We have simple updating formulas that allow us to avoid complicated algebra or integrals (see next slide). hypoth. prior likelihood posterior θ f (θ) N(µ prior, σprior ) φ(x θ) N(θ, σ ) f (θ x) N(µ post, σpost) ( ) ( ( ) (θ µprior ) = c 1 exp = c exp (x θ) (θ µpost) = c 3 exp σ prior σ ) σ post April, 018 4 / 18

Board question: Normal-normal updating formulas a = 1 σ prior b = n σ, µ post = aµ prior + b x, σpost = 1 a + b a + b. Suppose we have one data point x = drawn from N(θ, 3 ) Suppose θ is our parameter of interest with prior θ N(4, ). 0. Identify µ prior, σ prior, σ, n, and x. 1. Make a Bayesian update table, but leave the posterior as an unsimplified product.. Use the updating formulas to find the posterior. 3. By doing enough of the algebra, understand that the updating formulas come by using the updating table and doing a lot of algebra. April, 018 5 / 18

Solution 0. µ prior = 4, σ prior =, σ = 3, n = 1, x =. 1 ḣypoth. prior likelihood posterior θ f (θ) N(4, ) f (x θ) N(θ, 3 ) f (θ x) N(µ post, σpost) ( ) ( ) ( ) θ c 1 exp (θ 4) c exp ( θ) c 3 exp (θ 4) exp 8. We have a = 1/4, b = 1/9, a + b = 13/36. Therefore µ post = (1 + /9)/(13/36) = 44/13 = 3.3846 σ post = 36/13 =.769 The posterior pdf is f (θ x = ) N(3.3846,.769). 3. See the reading class15-prep-a.pdf example. 18 8 ( ( θ) 18 ) April, 018 6 / 18

Concept question: normal priors, normal likelihood 0.0 0. 0.4 0.6 0.8 Plot Plot 1 Prior Plot 3 Plot 4 Plot 5 0 4 6 8 10 1 14 Blue graph = prior Red lines = data in order: 3, 9, 1 (a) Which plot is the posterior to just the first data value? (Solution in slides) April, 018 7 / 18

Concept question: normal priors, normal likelihood 0.0 0. 0.4 0.6 0.8 Plot Plot 1 Prior Plot 3 Plot 4 Plot 5 0 4 6 8 10 1 14 Blue graph = prior Red lines = data in order: 3, 9, 1 (b) Which graph is posterior to all 3 data values? (Solution on next slide) April, 018 8 / 18

Solution to concept question (a) Plot : The first data value is 3. Therefore the posterior must have its mean between 3 and the mean of the blue prior. The only possibilites for this are plots 1 and. We also know that the variance of the posterior is less than that of the posterior. Between the plots 1 and graphs only plot has smaller variance than the prior. (b) Plot 3: The average of the 3 data values is 8. Therefore the posterior must have mean between the mean of the blue prior and 8. Therefore the only possibilities are the plots 3 and 4. Because the posterior is posterior to the magenta graph (plot ) it must have smaller variance. This leaves only the Plot 3. April, 018 9 / 18

Board question: normal/normal For data x 1,..., x n with data mean x = x 1+...+x n n a = 1 σ prior b = n σ, µ post = aµ prior + b x, σpost = 1 a + b a + b. Question. On a basketball team the players are drawn from a pool in which the career average free throw percentage follows a N(75, 6 ) distribution. In a given year individual players free throw percentage is N(θ, 4 ) where θ is their career average. This season Sophie Lie made 85 percent of her free throws. What is the posterior expected value of her career percentage θ? answer: Solution on next frame April, 018 10 / 18

Solution This is a normal/normal conjugate prior pair, so we use the update formulas. Parameter of interest: θ = career average. Data: x = 85 = this year s percentage. Prior: θ N(75, 36) Likelihood x N(θ, 16). So f (x θ) = c 1 e (x θ) / 16. The updating weights are Therefore a = 1/36, b = 1/16, a + b = 5/576 = 13/144. µ post = (75/36 + 85/16)/(5/576) = 81.9, σ post = 36/13 = 11.1. The posterior pdf is f (θ x = 85) N(81.9, 11.1). April, 018 11 / 18

Conjugate priors A prior is conjugate to a likelihood if the posterior is the same type of distribution as the prior. Updating becomes algebra instead of calculus. hypothesis data prior likelihood posterior Bernoulli/Beta θ [0, 1] x beta(a, b) Bernoulli(θ) beta(a + 1, b) or beta(a, b + 1) θ x = 1 c 1θ a 1 (1 θ) b 1 θ c 3θ a (1 θ) b 1 θ x = 0 c 1θ a 1 (1 θ) b 1 1 θ c 3θ a 1 (1 θ) b Binomial/Beta θ [0, 1] x beta(a, b) binomial(n, θ) beta(a + x, b + N x) (fixed N) θ x c 1θ a 1 (1 θ) b 1 c θ x (1 θ) N x c 3θ a+x 1 (1 θ) b+n x 1 Geometric/Beta θ [0, 1] x beta(a, b) geometric(θ) beta(a + x, b + 1) θ x c 1θ a 1 (1 θ) b 1 θ x (1 θ) c 3θ a+x 1 (1 θ) b Normal/Normal θ (, ) x N(µ prior, σprior ) N(θ, σ ) N(µ post, σpost) (fixed σ ) θ x ( ) ( ) ( ) (θ µprior) c 1 exp c exp (x θ) (θ µpost) σ c 3 exp σ prior σ post There are many other likelihood/conjugate prior pairs. April, 018 1 / 18

Concept question: conjugate priors Which are conjugate priors? hypothesis data prior likelihood a) Exponential/Normal θ [0, ) x N(µ prior, σprior ) exp(θ) ) θ x c 1 exp ( (θ µprior) θe θx σ prior b) Exponential/Gamma θ [0, ) x Gamma(a, b) exp(θ) θ x c 1 θ a 1 e bθ θe θx c) Binomial/Normal θ [0, 1] x N(µ prior, σprior ) binomial(n, θ) ) (fixed N) θ x c 1 exp ( (θ µprior) c θ x (1 θ) N x σ prior 1. none. a 3. b 4. c 5. a,b 6. a,c 7. b,c 8. a,b,c April, 018 13 / 18

Answer: 3. b We have a conjugate prior if the posterior as a function of θ has the same form as the prior. Exponential/Normal posterior: f (θ x) = c 1 θe (θ µprior) σ prior θ x The factor of θ before the exponential means this is not the pdf of a normal distribution. Therefore it is not a conjugate prior. Exponential/Gamma posterior: Note, we have never learned about Gamma distributions, but it doesn t matter. We only have to check if the posterior has the same form: f (θ x) = c 1 θ a e (b+x)θ The posterior has the form Gamma(a + 1, b + x). This is a conjugate prior. Binomial/Normal: It is clear that the posterior does not have the form of a normal distribution. April, 018 14 / 18

Variance can increase Normal-normal: variance always decreases with data. Beta-binomial: variance usually decreases with data. 0 1 3 4 5 6 beta(,1) beta(1,19) beta(1,1) beta(1,1) 0.0 0. 0.4 0.6 0.8 1.0 Variance of beta(,1) (blue) is smaller than that of beta(1,1) (magenta), but beta(1,1) can be a posterior to beta(,1) April, 018 15 / 18

Table discussion: likelihood principle Suppose the prior has been set. Let x 1 and x be two sets of data. Which of the following are true? (a) If the likelihoods φ(x 1 θ) and φ(x θ) are the same then they result in the same posterior. (b) If x 1 and x result in the same posterior then their likelihood functions are the same. (c) If the likelihoods φ(x 1 θ) and φ(x θ) are proportional (as functions of θ) then they result in the same posterior. (d) If two likelihood functions are proportional then they are equal. answer: (4): a: true; b: false, the likelihoods are proportional. c: true, scale factors don t matter d: false April, 018 16 / 18

Concept question: strong priors Say we have a bent coin with unknown probability of heads θ. We are convinced that θ 0.7. Our prior is uniform on [0, 0.7] and 0 from 0.7 to 1. We flip the coin 65 times and get 60 heads. Which of the graphs below is the posterior pdf for θ? 0 0 40 60 80 A B C D E F 0.0 0. 0.4 0.6 0.8 1.0 April, 018 17 / 18

Solution to concept question answer: Graph C, the blue graph spiking near 0.7. Sixty heads in 65 tosses indicates the true value of θ is close to 1. Our prior was 0 for θ > 0.7. So no amount of data will make the posterior non-zero in that range. That is, we have foreclosed on the possibility of deciding that θ is close to 1. The Bayesian updating puts θ near the top of the allowed range. April, 018 18 / 18