Introduction to Bayesian Methods

Similar documents
STAT 425: Introduction to Bayesian Analysis

A Very Brief Summary of Bayesian Inference, and Examples

HPD Intervals / Regions

Chapter 5. Bayesian Statistics

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

INTRODUCTION TO BAYESIAN ANALYSIS

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Approximate Bayesian Computation for Astrostatistics

Introduction to Probabilistic Machine Learning

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Applied Bayesian Modeling. ICPSR Day 4

A Very Brief Summary of Statistical Inference, and Examples

Bayesian inference: what it means and why we care

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Predictive Distributions

Inference for a Population Proportion

Beta statistics. Keywords. Bayes theorem. Bayes rule


One-parameter models

Checking for Prior-Data Conflict

Data Analysis and Uncertainty Part 2: Estimation

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Invariant HPD credible sets and MAP estimators

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Foundations of Statistical Inference

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Overall Objective Priors

Principles of Bayesian Inference

Bayesian Inference: Posterior Intervals

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

Department of Statistics

Hierarchical Models & Bayesian Model Selection

Statistical Theory MT 2006 Problems 4: Solution sketches

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

David Giles Bayesian Econometrics

Foundations of Statistical Inference

Part 2: One-parameter models

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Statistical Theory MT 2007 Problems 4: Solution sketches

Principles of Bayesian Inference

COS513 LECTURE 8 STATISTICAL CONCEPTS


ST 740: Model Selection

Probability Theory. Patrick Lam

Learning Bayesian network : Given structure and completely observed data

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

ST 740: Multiparameter Inference

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

CS 361: Probability & Statistics

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Principles of Bayesian Inference

New Bayesian methods for model comparison

Default priors and model parametrization

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

STAT J535: Chapter 5: Classes of Bayesian Priors

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Checking for Prior-Data Conßict. by Michael Evans Department of Statistics University of Toronto. and

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Approximate Bayesian Computation

Linear Models A linear model is defined by the expression

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

Theory of Maximum Likelihood Estimation. Konstantin Kashin

CS 361: Probability & Statistics

Bayesian Inference. Chapter 1. Introduction and basic concepts

10. Exchangeability and hierarchical models Objective. Recommended reading

MIT Spring 2016

Bayesian Methods for Machine Learning

Bayes via forward simulation. Approximate Bayesian Computation

Part III. A Decision-Theoretic Approach and Bayesian testing

Bayesian Inference. Chapter 2: Conjugate models

Introduction to Bayesian Statistics 1

Introduction to Bayesian Statistics

Bayesian Regression Linear and Logistic Regression

Bayesian RL Seminar. Chris Mansley September 9, 2008

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

MAS3301 Bayesian Statistics

9 Bayesian inference. 9.1 Subjective probability

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Probability and Estimation. Alan Moses

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

The comparative studies on reliability for Rayleigh models

Bernoulli and Poisson models

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

1 Hypothesis Testing and Model Selection

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Principles of Bayesian Inference

CPSC 540: Machine Learning

ST 740: Linear Models and Multivariate Normal Inference

Lecture 6. Prior distributions

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Bayesian Inference in Astronomy & Astrophysics A Short Course

Transcription:

Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016

Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative priors Posteriors Related topics covered this week Markov chain Monte Carlo (MCMC) Selecting priors Bayesian modeling comparison Hierarchical Bayesian modeling Some material is from Tom Loredo, Sayan Mukherjee, Beka Steorts 1

Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. 2

Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. The data are modeled by a likelihood function. 2

Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. The data are modeled by a likelihood function. Not all statistical paradigms agree with this principle. 2

Likelihood functions Consider a random sample of size n = 1 from a Normal(µ = 3, σ = 2): X 1 N(3, 2) Probability density function (pdf) the function f (x, θ), where θ is fixed and x is variable f (x, µ, σ) = = 1 (x µ)2 e 2σ 2 2πσ 2 1 2π2 2 (x 3) 2 e (2)(2 2 ) Density 0.00 0.05 0.10 0.15 0.20-5 0 5 10 X The data are drawn from this Likelihood the function f (x, θ), where θ is variable and x is fixed 3

Likelihood functions Consider a random sample of size n = 1 from a Normal(µ = 3, σ = 2): X 1 N(3, 2) Probability density function (pdf) the function f (x, θ), where θ is fixed and x is variable Likelihood the function f (x, θ), where θ is variable and x is fixed f (x, µ, σ) = = 1 (x µ)2 e 2σ 2 2πσ 2 1 (1.747 µ)2 e 2σ 2 2πσ 2 Likelihood 0.00 0.05 0.10 0.15 0.20 x 1 µ -5 0 5 10 µ 3

Consider a random sample of size n = 50 (assuming independence, and a known σ): X 1,..., X 50 N(3, 2) f (x, µ, σ) = f (x 1,..., x 50, µ, σ)= 50 i=1 1 (x 2πσ 2 e i µ) 2σ 2 2 sample mean µ Likelihood 0.0e+00 1.0e-47 2.0e-47 1.0 1.5 2.0 2.5 3.0 3.5 4.0 µ 4

Consider a random sample of size n = 50 (assuming independence, and a known σ): X 1,..., X 50 N(3, 2) f (x, µ, σ) = f (x 1,..., x 50, µ, σ)= 50 i=1 1 (x 2πσ 2 e i µ) 2σ 2 2 (normalized) Likelihood 0 1 2 3 4 5 n = 50 n = 100 n = 250 n = 500 µ 1.0 1.5 2.0 2.5 3.0 3.5 4.0 µ 4

Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. The data are modeled by a likelihood function. How do we infer θ? 5

Maximum likelihood estimation The parameter value, θ, that maximizes the likelihood: ˆθ = max f (x 1,..., x n, θ) θ Minimizing χ 2 statistic (under the Gaussian assumption) Likelihood 0e+00 2e-43 4e-43 6e-43 MLE µ 1.0 2.0 3.0 4.0 µ max µ f (x 1,..., x n, µ, σ) = max µ n i=1 Hence, ˆµ = 1 (x 2πσ 2 e i µ)2 2σ 2 n i=1 x i n = x 6

Bayesian framework Classical or Frequentist methods for inference consider θ to be fixed and unknown performance of methods evaluated by repeated sampling consider all possible data sets Bayesian methods consider θ to be random only considers observed data set and prior information 7

Bayes Rule Let A and B be two events in the sample space. Then P(A B) = P(AB) P(B) = P(B A)P(A) P(B) Note P(B A) = P(AB) P(A) = P(AB) = P(B A)P(A) It is really just about conditional probabilities. Sample space 8

Posterior distribution Likelihood Prior {}}{{}}{ Data {}}{ f (x θ) π(θ) π(θ x ) = = f (x) Θ f (x θ)π(θ) f (x θ)π(θ) dθf (x θ)π(θ) The prior distribution allows you to easily incorporate your beliefs about the parameter(s) of interest Posterior is a distribution on the parameter space given the observed data 9

Gaussian example Consider y 1:n = y 1,..., y n drawn from a Gaussian(µ, σ), µ unknown Likelihood: f (y 1:n µ) = ( ) n 1 i=1 e (y i µ)2 2σ 2 2πσ 2 Prior: π(µ) N(µ 0, σ 0 ) Posterior: π(µ Y 1:n ) f (Y 1:n µ)π(µ) n ( 1 = e (y i µ) 2 ) 2σ 2 1 e i=1 2πσ 2 2πσ0 2 N(µ 1, σ 1 ) (µ µ 0 ) 2 2σ 2 0 where µ 1 = µ 0 σ0 2 + yi σ 2 1 + n, σ 1 = σ0 2 σ 2 ( 1 σ 2 0 + n σ 2 ) 1/2 10

Data: y 1,..., y 4 N(µ = 3, σ = 2), ȳ = 1.278 Prior: N(µ 0 = 3, σ 0 = 5) Posterior: N(µ 1 = 1.114, σ 1 = 0.981) Density 0.0 0.1 0.2 0.3 0.4 Likelihood Prior Posterior -10-5 0 5 10 µ 11

Data: y 1,..., y 4 N(µ = 3, σ = 2), ȳ = 1.278 Prior: N(µ 0 = 3, σ 0 = 1) Posterior: N(µ 1 = 0.861, σ 1 = 0.707) Density 0.0 0.2 0.4 0.6 Likelihood Prior Posterior -10-5 0 5 10 µ 12

Data: y 1,..., y 200 N(µ = 3, σ = 2), ȳ = 2.999 Prior: N(µ 0 = 3, σ 0 = 1) Posterior: N(µ 1 = 2.881, σ 1 = 0.140) Density 0.0 1.0 2.0 3.0 Likelihood Prior Posterior -6-4 -2 0 2 4 6 µ 13

Example 2 - on your own Consider the following model: Y θ U(0, θ) θ Pareto(α, β) π(θ) = αβα θ α+1 I (β, ) (θ) where I (a,b) (x) = 1 if a < x < b and 0 otherwise Find the posterior distribution of θ y π(θ y) 1 θ I (0,θ)(y) αβα θ α+1 I (β, )(θ) 1 θ I 1 (y, )(θ) θ α+1 I (β, )(θ) 1 θ α+2 I (max{y,β}, )(θ) = Pareto(α + 1, max{y, β}) 14

Prior distribution The prior distribution allows you to easily incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don t actually have prior information? 15

Choosing a prior Informative/Subjective prior: choose a prior that reflects our belief/uncertainty about the unknown parameter - Based on experience of the researcher from previous studies, scientific or physical considerations, other sources of information Example: For a prior on the mass of a star in a Milky Way-type galaxy, you likely would not use an infinite interval Objective, non-informative, vague, default priors Hierarchical models: put a prior on the prior Conjugate priors: priors selected for convenience 16

Conjugate priors The posterior distribution is from the same family of distributions as the prior We saw this with a Gaussian prior on µ resulted in a Gaussian posterior µ Y 1:n (Gaussian priors are conjugate with Gaussian likelihoods resulting in a Gaussian posterior) 17

Some conjugate priors Normal - normal: normal priors are conjugate with normal likelihoods Beta - binomial: beta priors are conjugate with binomial likelihoods Gamma - Poisson: gamma priors are conjugate with Poisson likelihoods Dirichlet - multinomial: Dirichlet priors are conjugate with multinomial likelihoods 18

Beta-Binomial Suppose we have an iid sample, x 1,..., x n, from a Bernoulli(θ) X = 1, X = 0, with probability θ with probability 1 θ Let y = n i=1 x i = y is a draw from a Binomial(n, θ) ( ) n p(y = k) = θ k (1 θ) n k k We want the posterior distribution for θ 19

Beta-Binomial We have a binomial likelihood, and need to specify a prior on θ Note that θ [0, 1] If prior π(θ) Beta(α, β), then posterior π(θ y) Beta(y + α, n y + β) The beta distribution is the conjugate prior for binomial likelihoods 20

Beta - Binomial posterior derivation {( n f (y, θ) = y ( n = y ) } { Γ(α + β) θ y (1 θ) n y ) Γ(α + β) Γ(α)Γ(β) θy+α 1 (1 θ) n y+β 1 } Γ(α)Γ(β) θα 1 (1 θ) β 1 f (y) = 1 0 f (y, θ)dθ = ( ) n Γ(α + β) y Γ(α)Γ(β) ( ) Γ(y + α)γ(n y + β) Γ(n + α + β) π(θ y) = f (y, θ) f (y) = Γ(n + α + β) Γ(y + α)γ(n y + β) θy+α 1 (1 θ) n y+β 1 Beta(y + α, n y + β) 21

Beta priors and posteriors π(θ α, β) = Γ(α + β) Γ(α)Γ(β) θα 1 (1 θ) β 1 Density 0.0 1.0 2.0 3.0 Post(1,1) Prior(1,1) 0.0 0.2 0.4 0.6 0.8 1.0 θ 22

Beta priors and posteriors π(θ α, β) = Γ(α + β) Γ(α)Γ(β) θα 1 (1 θ) β 1 Density 0.0 1.0 2.0 3.0 Post(1,1) Prior(1,1) Post(3,2) Prior(3,2) Post(1/2, 1/2) Prior(1/2,1/2) 0.0 0.2 0.4 0.6 0.8 1.0 θ 22

Poisson distribution Y Poisson(λ) = P(Y = y) = e λ λ y y! Probability 0.00 0.05 0.10 0.15 0.20 λ : 4 0 5 10 15 X Astronomical example Mean = Variance = λ Photons from distant quasars, cosmic rays Bayesian inference on λ: Y λ Poisson(λ) What prior to use for λ? (λ > 0) For more details see Feigelson and Babu (2012), Section 4.2 23

Gamma density f (y) = βα Γ(α) y α 1 e βy, y > 0 Density 0.0 0.5 1.0 1.5 α 1 2 3 4 0 1 2 3 4 5 6 Often written as Y Γ(α, β) α > 0 (shape parameter), β > 0 (rate parameter) Note: sometimes θ = 1/β is used instead Mean = α/β, Variance = α/β 2 Y Γ(1, β) Exponential(β), Γ(d/2, 1/2) χ 2 d 24

Poisson - Gamma Posterior f (y 1:n λ) π(λ y 1:n ) π(λ) = n i=1 e λ λ y i y i! ni=1 = e nλ y λ i n i=1 (y i!) = βα Γ(α) λα 1 e βλ (Prior) ni=1 e nλ y λ i n i=1 (y i!) βα Γ(α) λα 1 e βλ e nλ λ n i=1 y i λ α 1 e βλ = e λ(n+β) λ n i=1 y i +α 1 Γ ( y i + α, n + β) (Likelihood) The gamma distribution is the conjugate prior for Poisson likelihoods 25

Poisson - Gamma Posterior illustrations alpha=1 and beta=1 alpha=3 and beta=0.5 Density 0.0 0.2 0.4 0.6 0.8 1.0 Likelihood Prior Posterior Input value Density 0.0 0.2 0.4 0.6 0.8 1.0 Likelihood Prior Posterior Input value 0 2 4 6 8 10 λ 0 2 4 6 8 10 λ Same dataset 26

Hierarchical priors A prior is put on the parameters of the prior distribution = the prior on the parameter of interest, θ, has additional parameters Y θ, γ f (y θ, γ) (Likelihood) Θ γ π(θ γ) (Prior) Γ φ(γ) (Hyper prior) It is assumed that φ(γ) is fully known, and γ is called a hyper parameter More layers can be added, but of course that makes the model more complex posterior may require computational techniques (e.g. MCMC) 27

Simple illustration Y (µ, φ) N(µ, 1) Likelihood µ φ N(φ, 2) Prior φ N(0, 1) Hyperprior Maybe we want to put a hyperhyperprior on φ? Posterior µ Y f (y µ, φ)π 1 (µ φ)π 2 (φ) 28

Non-informative priors What to do if we don t have relevant prior information? What if our model is too complex to know what reasonable priors are? Desire is for a prior that does not favor any particular value on the parameter space Side note: some may have philosophical issues with this (e.g. R.A. Fisher, which lead to fiducial inference) We will discuss some methods for finding non-informative priors. It turns out these priors can be improper (i.e. they integrate to rather than 1), so you need to verify that the resulting posterior distribution is proper 29

Example of improper prior with proper posterior: Data: x 1,..., x n N(θ, 1) (Improper) prior: π(θ) 1 (Proper) posterior: π(θ x 1:n ) N(ȳ, n 1/2 ) Example of improper prior with improper posterior Data: x 1,..., x n Bernouilli(θ), y = n i=1 x i Binomial(n, θ) (Improper) prior: π(θ) Beta( 1, 1) (Improper) posterior: π(θ x 1:n ) θ y 1 (1 θ) n y 1 This is improper for y = 0 or n If you use improper priors, you have to check that the posterior is proper 30

Uniform prior This is what many astronomers use for non-informative priors, and is often what comes to mind when we think of flat priors θ U(0, 1) Density 0.0 0.2 0.4 0.6 0.8 1.0-0.5 0.0 0.5 1.0 1.5 θ What if we consider a transformation of θ, such as θ 2? 31

Uniform prior Prior for θ 2 : θ U(0, 1) Density 0 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0 θ Notice that the above is not Uniform - the prior on θ 2 is informative. This is an undesirable property of Uniform priors. We would like the un-informativeness to be invariant under transformations. 32

There are a number of reasons why you may not have prior information: 1 your work may be the first of its kind 2 you are skeptical about previous results that would have informed your priors 3 the parameter space is too high dimensional to understand how your informative priors work together 4 If this is the case, then you may like the priors to have little effect on the resulting posterior 33

Objective priors Jeffreys prior Uses Fisher information Reference priors Select priors that maximize some measure of divergence between the posterior and prior (hence minimizing the impact a prior has on the posterior) The Formal Definition of Reference Priors by Berger et al. (2009) More about selecting priors can be found here: Kass and Wasserman (1996) 34

Jeffrey s Prior π J (θ) I (θ) where I(θ) is the Fisher information ( d I(θ) = E dθ ( d 2 = E ) 2 log L(θ Y ) log L(θ Y ) dθ2 ) (for exponential family) Some intuition 1 I(θ) is understood to be a proxy for the information content in the model about θ high values of I(θ) correspond with likely values of θ. This reduces the effect of the prior on the posterior Most useful in single-parameter setting; not recommended with multiple parameters 1 For more details see Robert (2007): The Bayesian Choice 35

Exponential example Calculate the Fisher Information: log(f (y θ)) = log(θ) θy d dθ log(f (y θ)) = 1 θ y d 2 dθ 2 log(f (y θ)) = 1 θ 2 E d2 dθ 2 log(f (y θ)) = 1 θ 2 Hence, f (y θ) = θe θy π J (θ) 1 θ 2 = 1 θ 36

Exponential example, continued π J (θ) 1 θ Suppose we consider φ = f (θ) = θ 2 dθ dφ = 1 2 φ Hence, π J (φ) = π J( φ) dθ = 1 dφ = θ = φ 1 φ 2 φ 1 φ π J (φ) 1 φ We see here that Jeffreys prior is invariant to the transformation f (θ) = θ 2 37

Binomial example f (Y θ) = ( ) n θ y (1 θ) n y y Calculate the Fisher Information: log(f (Y θ)) = log( ( n y) ) + y log(θ) + (n y) log(1 θ) d dθ log(f (Y θ)) = y θ (n y) 1 θ d 2 log(f (Y θ)) = y (n y) dθ 2 θ 2 (1 θ) 2 Note that E(y) = nθ E d2 log(f (Y θ)) = nθ + (n nθ) = n dθ 2 θ 2 (1 θ) 2 θ + n Hence, π J (θ) 1 θ = n θ(1 θ) θ 1 2 (1 θ) 1 2 n θ(1 θ) Beta( 1 2, 1 2 ) 38

π J (θ) θ 1 2 (1 θ) 1 2 Density 0.0 0.5 1.0 1.5 2.0 Beta(1/2, 1/2) Beta(1, 1) 0.0 0.2 0.4 0.6 0.8 1.0 θ 39

If we use Jeffreys prior π J (θ) θ 1 2 (1 θ) 1 2, what is the posterior for θ Y? π(θ Y ) ( ) n y θ y (1 θ) n y θ 1 2 (1 θ) 1 2 π(θ Y ) θ y (1 θ) n y θ 1 2 (1 θ) 1 2 π(θ Y ) θ y 1/2 (1 θ) n y 1/2 π(θ Y ) Beta(y + 1/2, n y + 1/2) The posterior distribution is proper (which we knew would be the case since the prior is proper) It is just a coincidence that the Jeffreys prior is conjugate. 40

Gaussian with unknown µ - on your own Calculate the Fisher Information: log(f (Y θ)) = (y µ)2 2σ 2 d dθ log(f (Y θ)) = 2 (y µ) 2σ 2 d 2 dθ 2 log(f (Y θ)) = 1 σ 2 E d2 dθ 2 log(f (Y θ)) = 1 σ 2 Hence, f (Y µ, σ 2 ) e (y µ)2 2σ 2 = (y µ) σ 2 π J (µ) 1 41

Inference with a posterior Now that we have a posterior, what do we want to do with it? Point estimation: posterior mean: < θ >= dθp(θ Y, model) Θ posterior mode (MAP = maximum a posteriori) Credible regions: posterior probability p that θ falls in regions R p = P(θ R Y, model) = R dθp(θ Y, model) highest posterior density (HPD) region Posterior predictive distributions: predict new ỹ given data y 42

Posterior predictive distributions: predict new ỹ given data y f (ỹ, y) f (ỹ, y, θ)dθ f (ỹ y, θ)f (y, θ)dθ f (ỹ y) = = = f (y) f (y) f (y) = f (ỹ y, θ)π(θ y)dθ = f (ỹ θ)π(θ y)dθ (if y and ỹ are independent) 43

Confidence intervals credible intervals A 95% confidence interval is based on repeated sampling of datasets - about 95% of the confidence intervals will capture the true parameter value Parameters are not random http://www.nature.com A 95% credible interval is defined using the posterior distribution Density 0.0 0.1 0.2 0.3 0.4 Posterior 95% Prob -3-2 -1 0 1 2 3 θ Parameters are random 44

Summary We discussed some basics of Bayesian methods Bayesian inference relies on the posterior distribution Likelihood Prior {}}{{}}{ Data {}}{ f (x θ) π(θ) π(θ x ) = f (x) There are different ways to select priors: subjective, conjugate, non-informative Credible intervals and confidence intervals have different interpretations We ll be hearing a lot more about Bayesian methods throughout the week. 45

Summary We discussed some basics of Bayesian methods Bayesian inference relies on the posterior distribution Likelihood Prior {}}{{}}{ Data {}}{ f (x θ) π(θ) π(θ x ) = f (x) There are different ways to select priors: subjective, conjugate, non-informative Credible intervals and confidence intervals have different interpretations We ll be hearing a lot more about Bayesian methods throughout the week. Thank you! 45

Bibliography I Berger, J. O., Bernardo, J. M., and Sun, D. (2009), The formal definition of reference priors, The Annals of Statistics, 905 938. Feigelson, E. D. and Babu, G. J. (2012), Modern Statistical Methods for Astronomy: With R Applications, Cambridge University Press. Kass, R. E. and Wasserman, L. (1996), The selection of prior distributions by formal rules, Journal of the American Statistical Association, 91, 1343 1370. Robert, C. (2007), The Bayesian choice: from decision-theoretic foundations to computational implementation, Springer Science & Business Media. 46