INTRODUCTION TO BAYESIAN METHODS II

Similar documents
A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples

SUFFICIENT STATISTICS

Statistical Theory MT 2007 Problems 4: Solution sketches

March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

9 Bayesian inference. 9.1 Subjective probability

1. Fisher Information

Bayesian Inference: Posterior Intervals

STA 260: Statistics and Probability II

Patterns of Scalable Bayesian Inference Background (Session 1)

Chapter 8: Sampling distributions of estimators Sections

Lecture 8 October Bayes Estimators and Average Risk Optimality

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Lecture 13 and 14: Bayesian estimation theory

Chapter 5. Bayesian Statistics

Statistical Theory MT 2006 Problems 4: Solution sketches

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Homework 1: Solution

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Mathematical statistics

David Giles Bayesian Econometrics

4 Invariant Statistical Decision Problems

Principles of Statistics

1 Complete Statistics

Statistics. Statistics

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Lecture 1: Bayesian Framework Basics

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

MIT Spring 2016

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Beta statistics. Keywords. Bayes theorem. Bayes rule

Introduction to Bayesian Statistics

A few basics of credibility theory

Brief Review on Estimation Theory

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Classical and Bayesian inference

Mathematical statistics

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Part III. A Decision-Theoretic Approach and Bayesian testing

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Linear Models A linear model is defined by the expression

One-parameter models

Probability and Estimation. Alan Moses

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Lecture notes on statistical decision theory Econ 2110, fall 2013

STAT J535: Chapter 5: Classes of Bayesian Priors

Parameter Estimation

ST5215: Advanced Statistical Theory

Method of Feldman and Cousins for the construction of classical confidence belts

Foundations of Statistical Inference

Hypothesis Testing - Frequentist

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Bayesian Regression Linear and Logistic Regression

Review. December 4 th, Review

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

7. Estimation and hypothesis testing. Objective. Recommended reading

Bayes and Empirical Bayes Estimation of the Scale Parameter of the Gamma Distribution under Balanced Loss Functions

STAT 730 Chapter 4: Estimation

Lecture 2: Basic Concepts of Statistical Decision Theory

MAS3301 Bayesian Statistics

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

A Very Brief Summary of Statistical Inference, and Examples

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

STAT215: Solutions for Homework 1

STAT215: Solutions for Homework 2

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

1 Probability Model. 1.1 Types of models to be discussed in the course

Introduction to Bayesian Methods

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Mathematical Statistics

HPD Intervals / Regions

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Bayesian statistics: Inference and decision theory

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Statistics Masters Comprehensive Exam March 21, 2003

Mathematical statistics

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Introduction to Bayesian Statistics 1

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian Ingredients. Hedibert Freitas Lopes

Statistical Inference

Qualifying Exam in Probability and Statistics.

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Math 494: Mathematical Statistics

The exponential family: Conjugate priors

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems

STAT 3610: Review of Probability Distributions

Lecture 7 Introduction to Statistical Decision Theory

Statistical Inference

Foundations of Statistical Inference

Lecture 2: Statistical Decision Theory (Part I)

Transcription:

INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ is given by f(x θ) and Θ has prior pdf r(θ). Suppose that we observe X = x and have calculated the posterior distribution s(θ x). We can now compute δ(x) := E(Θ X = x). In the case that s(θ x) is continuous, we have δ(x) = θs(θ x)dθ. One natural point estimator for θ is δ(x) and the associated point estimator is given by δ(x) = E(Θ X). This is just one important example of a Bayes estimator. From a variation of the first exercise on the first homework, you can verify that δ(x) is the value a for which E(Θ a) 2 X = x) is minimized. Thus in the case where L(θ, θ ) = θ θ 2, we have that δ(x) minimizes E[L(Θ, δ(x)) X = x)]. In general, we can consider other choices of L. The function L is called a loss function, and we aim to find a δ(x) which minimizes the conditional expected loss. The function δ is called a decision function and is a Bayes estimate of θ if it is a minimizer. More, generally, if we are interested in estimating a function of θ, given by g(θ), a Bayes estimator of g(θ) is a decision function δ(x) which minimizes E[L(g(Θ)), δ(x) X = x]. In this course we will mostly be concerned with the squared loss function L(θ, θ ) = θ θ 2. There are many other reasonable choices of loss function to consider, for example, the absolute loss given by L(θ, θ ) = θ θ.

2 INTRODUCTION TO BAYESIAN METHODS II Given a decision function δ and loss function L the risk function is defined to be R δ (θ) = EL(θ, δ(x)); the expectation here is with respect the the conditional likelihood of X so that in the continuous case EL(θ, δ(x)) = L(θ, δ(x))l(x θ)dx. An application of Fubni s theorem also shows that a Bayes estimator minimizes the expected risk. Since s(θ x)f X (x) = L(x θ)r(θ), we have [ ER δ (Θ) = L(θ, δ(x))l(x θ)dx] r(θ)dθ [ = L(θ, δ(x))s(θ x)f X (x)dx] dθ [ ] = L(θ, δ(x))s(θ x)dθ f X (x)dx = E[L(Θ, δ(x)) X = x)]f X (x)dx. We see that if δ(x) is a Bayes estimate, then it does more than minimize the expected risk, since it is minimizes E[L(Θ, δ(x)) X = x)] for every x! Exercise. Let Y be a random variable. Set g(a) = E(Y a) 2. Minimize g. Exercise 2. Let Y be a continuous random variable. Set g(a) = E Y a. Minimize g. Exercise 3. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ, where X θ Unif(0, θ) and Θ has the Pareto distribution with scale parameter b > 0 and shape parameter α > 0. Find the Bayes estimator (with respect to the squared loss function) for θ. Solution. We compute the posterior distribution, and if we recognize it, then we will know E(Θ X). Let r be the prior pdf. Recall that r(θ) = αbα [θ > b]. θα+ Let x (0, θ) n, t(x) = max {x,..., x n }, and s(θ x) denote the posterior distribution. Recall that t(x) is a sufficient statistic for θ, and conditional on Θ = θ, we have the pdf for t(x) is given by g(t; θ) = ntn [t (0, θ)]. θ n

We have that for INTRODUCTION TO BAYESIAN METHODS II 3 s(θ t) g(t; θ)r(θ) ntn αb α [θ > b][θ > t] θ n θα+ [θ > max {t, b}] θα+n+ So that s(θ t) is the pdf of a Pareto distribution with posterior hyperparameters α = α + n and b = max {t, b}. Thus the family Pareto distribution is a conjugate family for the uniform scale family. Recall that a Pareto random variable with parameters α and b has mean α α b, from which it follows that E(Θ X) = α + n max {t(x), b}. α + n So, we see from Exercise 3, the key to computing a Bayes estimator boils down to computing the posterior distribution. In the next Exercise we will do some computations with the inverse gamma distribution. 2. Examples with the inverse gamma distribution Exercise 4. We say that a positive real-valued random variable X has the inverse gamma distribution with parameters α > 0 and β > 0 if it has pdf given by f(x; α, β) = βα Γ(α) x α e β x [x > 0]. Prove that if W has the gamma distribution with parameters α = α > 0 and β = /β, then W = d X. Deduce that EX n = Solution. Let x > 0. We have P(W x) = P(x W ) = β n (α ) (α n). /x g(w; α, /β)dw, where g is the pdf for W. The chain rule and the fundamental theorem of calculus give that the pdf for W is given by x 2 β α Γ(α) x β e α x = f(x; α, β), as required. The moment result now follows from a previous exercise.

4 INTRODUCTION TO BAYESIAN METHODS II Exercise 5. Let α > 0 be known. Let X = (X,..., X n ) be a random sample from the inverse gamma distribution with parameters α and β, where β > 0 is unknown. Show that n T = is a sufficient statistic for β. Solution. Let x (0, ) n. Let t = x + + x n. We have that n β α L(x; β) = Γ(α) e β n x i x α i = β αn e tβ x α i Γ(α) ; i= i= so that we can apply the Neyman factorization with and X i g(t; β) = β αn e tβ H(x) = n i= x α i Γ(α). Exercise 6. Let α, α 0, β 0 be known. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ, where X θ InvGamma(α, θ) and Θ Gamma(α 0, β 0 ). Find the posterior distribution. Solution. From the previous exercise, we have that L(x θ) = g(t; θ)h(x); notice that g may not be the pdf of the sufficient statistic T. Let r(θ) be the prior distribution and s be the posterior. We have s(θ t) g(t; θ)r(θ) i= θ αn e tθ Γ(α 0 )β α θ α0 e θ β 0 0 θ αn+α 0 e θ(t+ β 0 ) We recognize that s(θ t) is the pdf of a gamma distribution with posterior hyperparameters α = αn + α 0 and /β = t + β 0. 0

INTRODUCTION TO BAYESIAN METHODS II 5 Exercise 7. Let α, β be known. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ, where X θ is exponential with mean θ and Θ InvGamma(α, β). Find the posterior distribution. Solution. Let x (0, ) n. Let t = x + + x n. We have s(θ x) L(x; θ)r(θ) βα θ n e t/θ Γ(α) θ α e β θ θ α n e β+t θ. Thus we recognize that s(θ x) = s(θ t) is the pdf of a inverse gamma distribution with posterior hyperparameters α = α + n and β = β + t. Exercise 8. Let α, β be known. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ, where X θ is normal with mean 0 and variance θ and Θ InvGamma(α, β). Find the posterior distribution. Solution. Let x R n. Let t = 2 n i= x2 i. We have that s(θ x) L(x; θ)r(θ) βα e t/θ θn/2 Γ(α) θ α e β θ θ α n/2 e t+β θ. So we recognize that s(θ x) = s(θ t) has the inverse gamma distribution with hyperparameters α = α + n/2 and β = β + t. 3. Credible intervals First, let us recall confidence intervals in the classical setting. Let X = (X,..., X n ) be a random sample from f θ. Let u(x) < v(x). Suppose that P θ [θ (u(x), v(x))] = α; that is the random interval (u(x), v(x)) contains θ with probability α. If we observe X = x, then we call (u(x), v(x)) a 00( α) percent confidence interval for θ. Note that θ is either in CI or not in the CI; there is no probability once we observe X = x. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ is given by f(x θ) and Θ has prior pdf r(θ). Suppose u and v are functions of x such that ( P Θ ( u(x), v(x) ) ) X = x = α

6 INTRODUCTION TO BAYESIAN METHODS II Then we say that the interval (u(x), v(x)) is a 00( α) percent crediable interval for θ. Notice in the Bayesian setting, the deterministic interval (u(x), v(x)) really does contain θ with probability α. 4. Bayesian hypothesis testing Let be given by the disjoint union = N A. Suppose we want to test H 0 : θ N. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ is given by f(x θ) and Θ has prior pdf r(θ). Consider the critical function [ ] φ(x) = P(Θ Θ A X = x) > P(Θ Θ N X = x).