STAT J535: Chapter 5: Classes of Bayesian Priors

Similar documents
Introduction to Applied Bayesian Modeling. ICPSR Day 4

Classical and Bayesian inference

Chapter 8.8.1: A factorization theorem

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Introduction to Probabilistic Machine Learning

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Exponential Families

Probability Distributions Columns (a) through (d)

Stats 579 Intermediate Bayesian Modeling. Assignment # 2 Solutions

Lecture 2: Priors and Conjugacy

PMR Learning as Inference

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Probability and Estimation. Alan Moses

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

STAT J535: Introduction

STAT 730 Chapter 4: Estimation

10. Exchangeability and hierarchical models Objective. Recommended reading

Remarks on Improper Ignorance Priors

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Other Noninformative Priors

Introduction to Bayesian Methods

Chapter 8: Sampling distributions of estimators Sections

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Introduc)on to Bayesian Methods

1 Introduction. P (n = 1 red ball drawn) =

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

One Parameter Models

Bayesian Models in Machine Learning

Predictive Distributions

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Conjugate Priors, Uninformative Priors

Chapter 5. Bayesian Statistics

Multiparameter models (cont.)

COS513 LECTURE 8 STATISTICAL CONCEPTS

One-parameter models

Learning Bayesian network : Given structure and completely observed data

Generalized Linear Models

David Giles Bayesian Econometrics

Patterns of Scalable Bayesian Inference Background (Session 1)

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Part 2: One-parameter models

More Spectral Clustering and an Introduction to Conjugacy

Generating Random Variates 2 (Chapter 8, Law)

Linear Models A linear model is defined by the expression

Parameter Estimation

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

The exponential family: Conjugate priors

Beta statistics. Keywords. Bayes theorem. Bayes rule

Chapter 5 continued. Chapter 5 sections

Foundations of Statistical Inference

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Advanced topics from statistics

Computational Cognitive Science

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Lecture 2: Conjugate priors

STAT 425: Introduction to Bayesian Analysis

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS

Bayesian Regression Linear and Logistic Regression

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Classical and Bayesian inference

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

Introduction to Machine Learning

Computational Perception. Bayesian Inference

GAUSSIAN PROCESS REGRESSION

INTRODUCTION TO BAYESIAN METHODS II

Conjugate Priors: Beta and Normal Spring 2018

Bayesian statistics, simulation and software

Stat 5101 Lecture Notes

Lecture 3. Univariate Bayesian inference: conjugate analysis

STAT 430/510 Probability

Data Analysis and Uncertainty Part 2: Estimation

1. (Regular) Exponential Family

The joint posterior distribution of the unknown parameters and hidden variables, given the

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

CS 361: Probability & Statistics

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Chapter 5. Chapter 5 sections

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Parametric Techniques Lecture 3

Bayesian Inference. p(y)

Bayesian Inference in a Normal Population

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Bayesian Regression (1/31/13)

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Estimation of Quantiles

Lecture 1: August 28

Department of Large Animal Sciences. Outline. Slide 2. Department of Large Animal Sciences. Slide 4. Department of Large Animal Sciences

STATISTICS SYLLABUS UNIT I

1. Fisher Information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Parametric Techniques

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

HPD Intervals / Regions

Transcription:

STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012

The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice of prior can substantially affect posterior conclusions, especially when the sample size is not large. We now examine several broad methods of determining prior distributions.

Conjugate Priors We know that conjugacy is a property of a prior along with a likelihood that implies the posterior distribution will have the same distributional form as the prior (just with different parameter(s)). We have seen some examples of conjugate priors: Data/Likelihood Prior 1. Bernoulli Beta for p 2. Poisson Gamma for λ 3. Normal Normal for µ 4. Normal Inverse gamma for σ 2

Conjugate Priors Other examples: 1. Multinomial Dirichlet for p 1, p 2,..., p k 2. Negative Binomial Beta for p 3. Uniform(0, θ) Pareto for upper limit 4. Exponential Gamma for β 5. Gamma (β unknown) Gamma for β 6. Pareto (α unknown) Gamma for α 7. Pareto (β unknown) Pareto for β

Conjugate Priors: Exponential Family Consider the family of distributions known as the one-parameter exponential family. This family consists of any distribution whose p.d.f. (or p.m.f.) can be written as: f (x θ) = e [t(x)u(θ)] r(x)s(θ) where t(x) and r(x) do not depend on the parameter θ and u(θ) and s(θ) do not depend on x. Note that any such density can be written as f (x θ) = e {t(x)u(θ)+ln[r(x)]+ln[s(θ)]}

Conjugate Priors: Exponential Family If we observe an iid sample X 1,..., X n, the joint density of the data is thus f (x θ) = e {u(θ) np P t(x i )+ n ln[r(x i )]+n ln[s(θ)]} i=1 i=1 Consider a prior for θ (with the prior parameters k and γ) having the form: {ku(θ)γ+k ln[s(θ)]} p(θ) = c(k, γ)e

Conjugate Priors: Exponential Family Then the posterior is π(θ x) f (x θ)p(θ) { exp = exp { u(θ) t(x i ) + n ln[s(θ)] + ku(θ)γ + k ln[s(θ)] u(θ)[ t(xi ) + kγ ] + (n + k) ln[s(θ)] { [ ] } t(xi ) + kγ = exp (n + k)u(θ) + (n + k) ln[s(θ)] n + k which is of the same form as the prior, except with k = n + k t(xi ) + kγ and γ =. n + k If our data are iid from a one-parameter exponential family, then a conjugate prior will exist. } }

Conjugate Priors Conjugate priors are mathematically convenient. Sometimes they are quite flexible, depending on the specific hyperparameters we use. But they reflect very specific prior knowledge, so we should be wary of using them unless we truly possess that prior knowledge.

Uninformative Priors These priors intentionally provide very little specific information about the parameter(s). A classic uninformative prior is the uniform prior. A proper uniform prior integrates to a finite quantity. Example 1: For Bernoulli(θ) data, a uniform prior on θ is p(θ) = 1, 0 θ 1. This makes sense when θ has bounded support.

Uninformative Priors Example 2: Consider N(0, σ 2 ) data. If it is reasonable to assume, that, say σ 2 < 100, we could use the uniform prior p(σ 2 ) = 1 100, 0 σ2 100 (even though σ 2 is not intrinsically bounded). An improper uniform prior integrates to : Example 3: N(µ, 1) data with p(µ) = 1, < µ <. This is fine as long as the resulting posterior is proper. But be careful: Sometimes an improper prior will yield an improper posterior.